Support #2056
openWhich weights to use when combining the mainstage and Covid-19 waves of UKHLS
100%
Description
Hi there,
I have been reading through the forum support on weighting when using the Covid-19 data and Mainstage data combined, but I was hoping to just get some clarification on a couple of points.
I am conducting a study which looks at trends over time in adult mental health across UKHLS Mainstage surveys 2, 3, 4, 5, 6, 7, 8 and 9, and the Covid-19 survey 1, 2, 3, 4, 5, 6, 7, 8 (only the web survey). I end at Mainstage survey wave 9 (survey period Jan 2017-May 2019) as that appears to be the last Mainstage wave that doesn’t cover the pandemic period. My data is in long format and I’m using Stata.
However, I am unsure of which weights to use precisely for my aims.
The first thing I would like to do is to treat the data as pooled cross-sectional data to look at how mental health changes over time (between wave 2 of the mainstage and wave 8 of the Covid survey). So, I’d like to see the level of mental health in the UK at each wave of the UKHLS. Ideally, I would like to model all the waves together. Am I right in thinking that to do so I would need to create a new ‘weight’ variable, which is the self-completion cross-sectional weight for each wave? I’ll explain below…
So, for wave 2 of the Mainstage survey, the new ‘weight’ variable would have the value of b_indscub_xw
For wave 3 it would have the value of: c_indscub_xw
For wave 4 it would have the value of: d_indscub_xw
For wave 5 it would have the value of: e_indscub_xw
For wave 6, it seems there are two self-completion, cross-sectional weights (_ub and _ui): would it be
f_indscub_xw or f_indscui_xw?
For wave 7 it would have the value of: g_indscui_xw (as there is no _ub version)
For wave 8 it would have the value of: h_indscui_xw (again, as there is no _ub version)
For wave 9 it would have the value of: i_indscui_xw (again, as there is no _ub version)
Then, turning to filling in the COVID-19 survey values of the new ‘weight’ variable it would be:
For wave 1 of the COVID-19 survey it would have the value of: ca_betaindin_xw
For wave 2 of the COVID-19 survey it would have the value of: cb_betaindin_xw
…
For wave 8 of the COVID-19 survey it would have the value of: ch_betaindin_xw
For wave 9 of the COVID-19 survey it would have the value of: ci_betaindin_xw
So, the Stata code would look something like this to give you an idea of what I mean:
gen weight = b_indscub_xw if mainstage_wave==2
replace weight = c_indscub_xw if mainstage_wave==3
replace weight = d_indscub_xw if mainstage_wave==4
…
replace weight = i_indscui_xw if mainstage_wave==9
replace weight = ca_betaindin_xw fi cv19survey_wave==1
…
replace weight = ch_betaindin_xw fi cv19survey_wave==8
replace weight = ci_betaindin_xw fi cv19survey_wave==9
So, it would be one new ‘weight’ variable, where each wave within each pidp had a weight value which corresponds to the cross-sectional weight for that wave.
- Is that the correct approach to take in to treat the data as repeated cross-section data and look at levels of mental health in each wave?
- Am I handling the COVID-19 weights correctly, and can I combine the Mainstage (waves 1-9) and Covid-19 surveys (waves 1-9) in this way?
- I’m not sure I fully understand the switch between _ub (waves 1-6 mainstage) and _ui (waves 7-9 mainstage). I can only use _ub up to wave 5 and only _ui from waves 7 to 9. Is it correct to take the approach I’ve outlined above, looking at _ui in some waves and _wub in others?
- Also, for wave 6, which self-completion cross-sectional weights should I use? The _ui or _ub?
I hope this makes sense and please do let me know if you require any further clarifications.
Best wishes,
James