Project

General

Profile

Support #2125

UKHLS pooled OLS weights

Added by Emma Kemp 23 days ago. Updated 9 days ago.

Status:
Feedback
Priority:
Urgent
Assignee:
-
Category:
Weights
Start date:
06/27/2024
% Done:

90%


Description

Dear UKHLS team,

I am currently working on a study where I am looking at the impact of perceived neighbourhood social cohesion on life satisfaction. For this, I would like to start my analysis by pooling the data from waves a,c,f,i,l (1,3,6,9,12) for a cross-sectional analysis.

To do this I have 3 questions I was wondering if you could please provide guidance on:

1. From what I have read in previous posts and this help document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf) if I pool data from all waves in the "long format", treating the same individuals across different waves as different persons, I can use the corresponding cross-sectional weight from each wave. Is that correct?

2b. If so, how should I apply the weights?

Is it enough to just create a new variable for the new weight (e.g. new_weight)? Or, as each wave has a different number of observations/individuals do I need to create a new scaled weighted variable (e.g.weightscaled)? Or do I need to do both of these steps?

Please can you advise me on whether the below code from STATA is correct and if I need all of it or just the code for the scaled weighted variable.

  • create a new weight variable new_weight *
    // give it a value of the cross-sectional weight for the wave from which the observation comes (e.g. new_weight_xw =a_indinus_xw if wave==1;
    gen new_weight_xw
    new_weight_xw =a_indscus_xw if wave==1
    new_weight_xw =c_indscub_xw if wave==3
    new_weight_xw =f_indscui_xw if wave==6
    new_weight_xw =i_indscui_xw if wave==9
    new_weight_xw =l_indscui_xw if wave==12
  • create a scaled weighted variable *
    gen weightscaled=0
    replace weightscaled=a_indscus_xw if wave==1

gen ind=1
sum ind [aw=a_indscus_xw] if wave==1
gen awtdtot=r(sum_w)

sum ind [aw=c_indscub_xw] if wave==3
gen cwtdtot=r(sum_w)

sum ind [aw=f_indscui_xw] if wave==6
gen fwtdtot=r(sum_w)

sum ind [aw=i_indscui_xw] if wave==9
gen iwtdtot=r(sum_w)

sum ind [aw=l_indscui_xw] if wave==12
gen lwtdtot=r(sum_w)

replace weightscaled=c_indscub_xw*(awtdtot/cwtdtot) if wave==3
replace weightscaled=f_indscui_xw*(awtdtot/fwtdtot) if wave==6
replace weightscaled=i_indscui_xw*(awtdtot/iwtdtot) if wave==9
replace weightscaled=l_indscui_xw*(awtdtot/lwtdtot) if wave==12

//You can double check by looking at the sum of ind with weightscaled for each wave – it should be the same.
sum ind [aw=weightscaled] if wave==1
sum ind [aw=weightscaled] if wave==3
sum ind [aw=weightscaled] if wave==6
sum ind [aw=weightscaled] if wave==9
sum ind [aw=weightscaled] if wave==12

3. Finally, If I am using the entire sample from waves that do not overlap (a,c,f,i,l), does this mean the entire 24 month sample will be included in the analysis base an equal number of times and I don't need to do anything further?

Thank you in advance for any advice and guidance you can provide on these questions.

Kind regards,
Emma

#1

Updated by Understanding Society User Support Team 23 days ago

  • Category set to Weights
  • Assignee changed from Understanding Society User Support Team to Olena Kaminska
#2

Updated by Olena Kaminska 22 days ago

Emma,

Thank you for your questions. The answers to all of your questions are yes. A few additional notes.
You don't assume these are different people but should include pidp as a second level after psu in your multilevel levels.
You should create scaled weights under one name.
And full wave with our _xw weights represent the population at that time point. You are then pooling multiple cross-sectional samples from different time points. You don't need any further adjustments.

Hope this helps,
Olena

#3

Updated by Emma Kemp 22 days ago

Hi Olena,

Thank you very much for your response.
In regard to including pidp as a second level after psu in multilevel levels - how would I go about doing this?

Currently my do-file looks like this (see below). Would I add pidp into the svyset set up? e.g.
svyset, clear
svyset l_psu pidp [pweight = weightscaled], strata(l_strata) singleunit(scaled)

CURRENT DO-FILE
_______________________________________________________________

PART 5: DECLARE COMPLEX SURVEY DESIGN for POOLED OLS 
________________________________________________________________*/
  • set correct weights
    svyset, clear
    svyset l_psu [pweight = weightscaled], strata(l_strata) singleunit(scaled)

/*_______________________________________________________________________

PART 4: POOLED OLS 
________________________________________________________________*/
  • Running a simple OLS regression - controls inspired by powdthavee 2008 *

asdoc svy: regress lfsato NSC_index i.age_group_destr age2 male_dummy jbstat_simple edu_simple mastat_simple tenure_dummy hhsize_simple nchild_simple scsf1_combined_r i.gor_dv i.wave ,
// title(Table 3: Pooled OLS) save(RQ1_results_POLS.doc), replace

Thanks again Olena,
Emma

#4

Updated by Olena Kaminska 19 days ago

Emma,

You aren't using multilevel model. Your set up is fine as it is, and you don't need to include pidp in this setup.

Best,
Olena

#5

Updated by Understanding Society User Support Team 9 days ago

  • Status changed from New to Feedback
  • Assignee deleted (Olena Kaminska)
  • % Done changed from 0 to 90
  • Private changed from Yes to No

Also available in: Atom PDF