Support #2125: UKHLS pooled OLS weights - Understanding Society User Support

Actions

Copy link

Support #2125

open

UKHLS pooled OLS weights

Added by Emma Kemp about 1 year ago. Updated about 1 year ago.

Status:

Feedback

Priority:

Urgent

Assignee:

Category:

Weights

Start date:

06/27/2024

% Done:

90%

Description

Dear UKHLS team,

I am currently working on a study where I am looking at the impact of perceived neighbourhood social cohesion on life satisfaction. For this, I would like to start my analysis by pooling the data from waves a,c,f,i,l (1,3,6,9,12) for a cross-sectional analysis.

To do this I have 3 questions I was wondering if you could please provide guidance on:

1. From what I have read in previous posts and this help document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf) if I pool data from all waves in the "long format", treating the same individuals across different waves as different persons, I can use the corresponding cross-sectional weight from each wave. Is that correct?

2b. If so, how should I apply the weights?

Is it enough to just create a new variable for the new weight (e.g. new_weight)? Or, as each wave has a different number of observations/individuals do I need to create a new scaled weighted variable (e.g.weightscaled)? Or do I need to do both of these steps?

Please can you advise me on whether the below code from STATA is correct and if I need all of it or just the code for the scaled weighted variable.

create a new weight variable new_weight *
// give it a value of the cross-sectional weight for the wave from which the observation comes (e.g. new_weight_xw =a_indinus_xw if wave==1;
gen new_weight_xw
new_weight_xw =a_indscus_xw if wave==1
new_weight_xw =c_indscub_xw if wave==3
new_weight_xw =f_indscui_xw if wave==6
new_weight_xw =i_indscui_xw if wave==9
new_weight_xw =l_indscui_xw if wave==12

create a scaled weighted variable *
gen weightscaled=0
replace weightscaled=a_indscus_xw if wave==1

gen ind=1
sum ind [aw=a_indscus_xw] if wave==1
gen awtdtot=r(sum_w)

sum ind [aw=c_indscub_xw] if wave==3
gen cwtdtot=r(sum_w)

sum ind [aw=f_indscui_xw] if wave==6
gen fwtdtot=r(sum_w)

sum ind [aw=i_indscui_xw] if wave==9
gen iwtdtot=r(sum_w)

sum ind [aw=l_indscui_xw] if wave==12
gen lwtdtot=r(sum_w)

replace weightscaled=c_indscub_xw*(awtdtot/cwtdtot) if wave==3
replace weightscaled=f_indscui_xw*(awtdtot/fwtdtot) if wave==6
replace weightscaled=i_indscui_xw*(awtdtot/iwtdtot) if wave==9
replace weightscaled=l_indscui_xw*(awtdtot/lwtdtot) if wave==12

//You can double check by looking at the sum of ind with weightscaled for each wave – it should be the same.
sum ind [aw=weightscaled] if wave==1
sum ind [aw=weightscaled] if wave==3
sum ind [aw=weightscaled] if wave==6
sum ind [aw=weightscaled] if wave==9
sum ind [aw=weightscaled] if wave==12

3. Finally, If I am using the entire sample from waves that do not overlap (a,c,f,i,l), does this mean the entire 24 month sample will be included in the analysis base an equal number of times and I don't need to do anything further?

Thank you in advance for any advice and guidance you can provide on these questions.

Kind regards,
Emma

Actions

Copy link

Updated by Understanding Society User Support Team about 1 year ago

Category set to Weights
Assignee changed from Understanding Society User Support Team to Olena Kaminska

Actions

Copy link

Updated by Olena Kaminska about 1 year ago

Emma,

Thank you for your questions. The answers to all of your questions are yes. A few additional notes.
You don't assume these are different people but should include pidp as a second level after psu in your multilevel levels.
You should create scaled weights under one name.
And full wave with our _xw weights represent the population at that time point. You are then pooling multiple cross-sectional samples from different time points. You don't need any further adjustments.

Hope this helps,
Olena

Actions

Copy link

Updated by Emma Kemp about 1 year ago

Hi Olena,

Thank you very much for your response.
In regard to including pidp as a second level after psu in multilevel levels - how would I go about doing this?

Currently my do-file looks like this (see below). Would I add pidp into the svyset set up? e.g.
svyset, clear
svyset l_psu pidp [pweight = weightscaled], strata(l_strata) singleunit(scaled)

CURRENT DO-FILE
_______________________________________________________________

PART 5: DECLARE COMPLEX SURVEY DESIGN for POOLED OLS 
________________________________________________________________*/

set correct weights
svyset, clear
svyset l_psu [pweight = weightscaled], strata(l_strata) singleunit(scaled)

/*_______________________________________________________________________

PART 4: POOLED OLS 
________________________________________________________________*/

Running a simple OLS regression - controls inspired by powdthavee 2008 *

asdoc svy: regress lfsato NSC_index i.age_group_destr age2 male_dummy jbstat_simple edu_simple mastat_simple tenure_dummy hhsize_simple nchild_simple scsf1_combined_r i.gor_dv i.wave ,
// title(Table 3: Pooled OLS) save(RQ1_results_POLS.doc), replace

Thanks again Olena,
Emma

Actions

Copy link