## Support #2125

### UKHLS pooled OLS weights

90%

**Description**

Dear UKHLS team,

I am currently working on a study where I am looking at the impact of perceived neighbourhood social cohesion on life satisfaction. For this, I would like to start my analysis by pooling the data from waves a,c,f,i,l (1,3,6,9,12) for a cross-sectional analysis.

To do this I have 3 questions I was wondering if you could please provide guidance on:

1. From what I have read in previous posts and this help document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf) if I pool data from all waves in the "long format", treating the same individuals across different waves as different persons, I can use the corresponding cross-sectional weight from each wave. Is that correct?

2b. If so, how should I apply the weights?

Is it enough to just create a new variable for the new weight (e.g. new_weight)? Or, as each wave has a different number of observations/individuals do I need to create a new scaled weighted variable (e.g.weightscaled)? Or do I need to do both of these steps?

Please can you advise me on whether the below code from STATA is correct and if I need all of it or just the code for the scaled weighted variable.

- create a new weight variable new_weight *

// give it a value of the cross-sectional weight for the wave from which the observation comes (e.g. new_weight_xw =a_indinus_xw if wave==1;

gen new_weight_xw

new_weight_xw =a_indscus_xw if wave==1

new_weight_xw =c_indscub_xw if wave==3

new_weight_xw =f_indscui_xw if wave==6

new_weight_xw =i_indscui_xw if wave==9

new_weight_xw =l_indscui_xw if wave==12

- create a scaled weighted variable *

gen weightscaled=0

replace weightscaled=a_indscus_xw if wave==1

gen ind=1

sum ind [aw=a_indscus_xw] if wave==1

gen awtdtot=r(sum_w)

sum ind [aw=c_indscub_xw] if wave==3

gen cwtdtot=r(sum_w)

sum ind [aw=f_indscui_xw] if wave==6

gen fwtdtot=r(sum_w)

sum ind [aw=i_indscui_xw] if wave==9

gen iwtdtot=r(sum_w)

sum ind [aw=l_indscui_xw] if wave==12

gen lwtdtot=r(sum_w)

replace weightscaled=c_indscub_xw*(awtdtot/cwtdtot) if wave==3

replace weightscaled=f_indscui_xw*(awtdtot/fwtdtot) if wave==6

replace weightscaled=i_indscui_xw*(awtdtot/iwtdtot) if wave==9

replace weightscaled=l_indscui_xw*(awtdtot/lwtdtot) if wave==12

//You can double check by looking at the sum of ind with weightscaled for each wave – it should be the same.

sum ind [aw=weightscaled] if wave==1

sum ind [aw=weightscaled] if wave==3

sum ind [aw=weightscaled] if wave==6

sum ind [aw=weightscaled] if wave==9

sum ind [aw=weightscaled] if wave==12

3. Finally, If I am using the entire sample from waves that do not overlap (a,c,f,i,l), does this mean the entire 24 month sample will be included in the analysis base an equal number of times and I don't need to do anything further?

Thank you in advance for any advice and guidance you can provide on these questions.

Kind regards,

Emma

#### Updated by Understanding Society User Support Team about 2 months ago

**Category**set to*Weights***Assignee**changed from*Understanding Society User Support Team*to*Olena Kaminska*

#### Updated by Olena Kaminska about 2 months ago

Emma,

Thank you for your questions. The answers to all of your questions are yes. A few additional notes.

You don't assume these are different people but should include pidp as a second level after psu in your multilevel levels.

You should create scaled weights under one name.

And full wave with our _xw weights represent the population at that time point. You are then pooling multiple cross-sectional samples from different time points. You don't need any further adjustments.

Hope this helps,

Olena

#### Updated by Emma Kemp about 2 months ago

Hi Olena,

Thank you very much for your response.

In regard to including pidp as a second level after psu in multilevel levels - how would I go about doing this?

Currently my do-file looks like this (see below). Would I add pidp into the svyset set up? e.g.

svyset, clear

svyset l_psu pidp [pweight = weightscaled], strata(l_strata) singleunit(scaled)

CURRENT DO-FILE

_*__**_*___________________________________________________________

`PART 5: DECLARE COMPLEX SURVEY DESIGN for POOLED OLS `

___*_*____________________________________________________________*/

- set correct weights

svyset, clear

svyset l_psu [pweight = weightscaled], strata(l_strata) singleunit(scaled)

/*_______________________________________________________________________

`PART 4: POOLED OLS `

___*_*____________________________________________________________*/

- Running a simple OLS regression - controls inspired by powdthavee 2008 *

asdoc svy: regress lfsato NSC_index i.age_group_destr age2 male_dummy jbstat_simple edu_simple mastat_simple tenure_dummy hhsize_simple nchild_simple scsf1_combined_r i.gor_dv i.wave ,

// title(Table 3: Pooled OLS) save(RQ1_results_POLS.doc), replace

Thanks again Olena,

Emma

#### Updated by Olena Kaminska about 1 month ago

Emma,

You aren't using multilevel model. Your set up is fine as it is, and you don't need to include pidp in this setup.

Best,

Olena

#### Updated by Understanding Society User Support Team about 1 month ago

**Status**changed from*New*to*Feedback***Assignee**deleted ()*Olena Kaminska***% Done**changed from*0*to*90***Private**changed from*Yes*to*No*