Support #2125
openUKHLS pooled OLS weights
90%
Description
Dear UKHLS team,
I am currently working on a study where I am looking at the impact of perceived neighbourhood social cohesion on life satisfaction. For this, I would like to start my analysis by pooling the data from waves a,c,f,i,l (1,3,6,9,12) for a cross-sectional analysis.
To do this I have 3 questions I was wondering if you could please provide guidance on:
1. From what I have read in previous posts and this help document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf) if I pool data from all waves in the "long format", treating the same individuals across different waves as different persons, I can use the corresponding cross-sectional weight from each wave. Is that correct?
2b. If so, how should I apply the weights?
Is it enough to just create a new variable for the new weight (e.g. new_weight)? Or, as each wave has a different number of observations/individuals do I need to create a new scaled weighted variable (e.g.weightscaled)? Or do I need to do both of these steps?
Please can you advise me on whether the below code from STATA is correct and if I need all of it or just the code for the scaled weighted variable.
- create a new weight variable new_weight *
// give it a value of the cross-sectional weight for the wave from which the observation comes (e.g. new_weight_xw =a_indinus_xw if wave==1;
gen new_weight_xw
new_weight_xw =a_indscus_xw if wave==1
new_weight_xw =c_indscub_xw if wave==3
new_weight_xw =f_indscui_xw if wave==6
new_weight_xw =i_indscui_xw if wave==9
new_weight_xw =l_indscui_xw if wave==12
- create a scaled weighted variable *
gen weightscaled=0
replace weightscaled=a_indscus_xw if wave==1
gen ind=1
sum ind [aw=a_indscus_xw] if wave==1
gen awtdtot=r(sum_w)
sum ind [aw=c_indscub_xw] if wave==3
gen cwtdtot=r(sum_w)
sum ind [aw=f_indscui_xw] if wave==6
gen fwtdtot=r(sum_w)
sum ind [aw=i_indscui_xw] if wave==9
gen iwtdtot=r(sum_w)
sum ind [aw=l_indscui_xw] if wave==12
gen lwtdtot=r(sum_w)
replace weightscaled=c_indscub_xw*(awtdtot/cwtdtot) if wave==3
replace weightscaled=f_indscui_xw*(awtdtot/fwtdtot) if wave==6
replace weightscaled=i_indscui_xw*(awtdtot/iwtdtot) if wave==9
replace weightscaled=l_indscui_xw*(awtdtot/lwtdtot) if wave==12
//You can double check by looking at the sum of ind with weightscaled for each wave – it should be the same.
sum ind [aw=weightscaled] if wave==1
sum ind [aw=weightscaled] if wave==3
sum ind [aw=weightscaled] if wave==6
sum ind [aw=weightscaled] if wave==9
sum ind [aw=weightscaled] if wave==12
3. Finally, If I am using the entire sample from waves that do not overlap (a,c,f,i,l), does this mean the entire 24 month sample will be included in the analysis base an equal number of times and I don't need to do anything further?
Thank you in advance for any advice and guidance you can provide on these questions.
Kind regards,
Emma
Updated by Understanding Society User Support Team 7 months ago
- Category set to Weights
- Assignee changed from Understanding Society User Support Team to Olena Kaminska
Updated by Olena Kaminska 7 months ago
Emma,
Thank you for your questions. The answers to all of your questions are yes. A few additional notes.
You don't assume these are different people but should include pidp as a second level after psu in your multilevel levels.
You should create scaled weights under one name.
And full wave with our _xw weights represent the population at that time point. You are then pooling multiple cross-sectional samples from different time points. You don't need any further adjustments.
Hope this helps,
Olena
Updated by Emma Kemp 7 months ago
Hi Olena,
Thank you very much for your response.
In regard to including pidp as a second level after psu in multilevel levels - how would I go about doing this?
Currently my do-file looks like this (see below). Would I add pidp into the svyset set up? e.g.
svyset, clear
svyset l_psu pidp [pweight = weightscaled], strata(l_strata) singleunit(scaled)
CURRENT DO-FILE
_______________________________________________________________
PART 5: DECLARE COMPLEX SURVEY DESIGN for POOLED OLS
________________________________________________________________*/
- set correct weights
svyset, clear
svyset l_psu [pweight = weightscaled], strata(l_strata) singleunit(scaled)
/*_______________________________________________________________________
PART 4: POOLED OLS
________________________________________________________________*/
- Running a simple OLS regression - controls inspired by powdthavee 2008 *
asdoc svy: regress lfsato NSC_index i.age_group_destr age2 male_dummy jbstat_simple edu_simple mastat_simple tenure_dummy hhsize_simple nchild_simple scsf1_combined_r i.gor_dv i.wave ,
// title(Table 3: Pooled OLS) save(RQ1_results_POLS.doc), replace
Thanks again Olena,
Emma
Updated by Olena Kaminska 7 months ago
Emma,
You aren't using multilevel model. Your set up is fine as it is, and you don't need to include pidp in this setup.
Best,
Olena
Updated by Understanding Society User Support Team 7 months ago
- Status changed from New to Feedback
- Assignee deleted (
Olena Kaminska) - % Done changed from 0 to 90
- Private changed from Yes to No