Project

General

Profile

Support #2125

UKHLS pooled OLS weights

Added by Emma Kemp 23 days ago. Updated 10 days ago.

Status:
Feedback
Priority:
Urgent
Assignee:
-
Category:
Weights
Start date:
06/27/2024
% Done:

90%


Description

Dear UKHLS team,

I am currently working on a study where I am looking at the impact of perceived neighbourhood social cohesion on life satisfaction. For this, I would like to start my analysis by pooling the data from waves a,c,f,i,l (1,3,6,9,12) for a cross-sectional analysis.

To do this I have 3 questions I was wondering if you could please provide guidance on:

1. From what I have read in previous posts and this help document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf) if I pool data from all waves in the "long format", treating the same individuals across different waves as different persons, I can use the corresponding cross-sectional weight from each wave. Is that correct?

2b. If so, how should I apply the weights?

Is it enough to just create a new variable for the new weight (e.g. new_weight)? Or, as each wave has a different number of observations/individuals do I need to create a new scaled weighted variable (e.g.weightscaled)? Or do I need to do both of these steps?

Please can you advise me on whether the below code from STATA is correct and if I need all of it or just the code for the scaled weighted variable.

  • create a new weight variable new_weight *
    // give it a value of the cross-sectional weight for the wave from which the observation comes (e.g. new_weight_xw =a_indinus_xw if wave==1;
    gen new_weight_xw
    new_weight_xw =a_indscus_xw if wave==1
    new_weight_xw =c_indscub_xw if wave==3
    new_weight_xw =f_indscui_xw if wave==6
    new_weight_xw =i_indscui_xw if wave==9
    new_weight_xw =l_indscui_xw if wave==12
  • create a scaled weighted variable *
    gen weightscaled=0
    replace weightscaled=a_indscus_xw if wave==1

gen ind=1
sum ind [aw=a_indscus_xw] if wave==1
gen awtdtot=r(sum_w)

sum ind [aw=c_indscub_xw] if wave==3
gen cwtdtot=r(sum_w)

sum ind [aw=f_indscui_xw] if wave==6
gen fwtdtot=r(sum_w)

sum ind [aw=i_indscui_xw] if wave==9
gen iwtdtot=r(sum_w)

sum ind [aw=l_indscui_xw] if wave==12
gen lwtdtot=r(sum_w)

replace weightscaled=c_indscub_xw*(awtdtot/cwtdtot) if wave==3
replace weightscaled=f_indscui_xw*(awtdtot/fwtdtot) if wave==6
replace weightscaled=i_indscui_xw*(awtdtot/iwtdtot) if wave==9
replace weightscaled=l_indscui_xw*(awtdtot/lwtdtot) if wave==12

//You can double check by looking at the sum of ind with weightscaled for each wave – it should be the same.
sum ind [aw=weightscaled] if wave==1
sum ind [aw=weightscaled] if wave==3
sum ind [aw=weightscaled] if wave==6
sum ind [aw=weightscaled] if wave==9
sum ind [aw=weightscaled] if wave==12

3. Finally, If I am using the entire sample from waves that do not overlap (a,c,f,i,l), does this mean the entire 24 month sample will be included in the analysis base an equal number of times and I don't need to do anything further?

Thank you in advance for any advice and guidance you can provide on these questions.

Kind regards,
Emma

Also available in: Atom PDF