Support #1786

Weights for longitudinal study

Added by Connor Gascoigne 5 months ago. Updated 5 months ago.

Start date:
% Done:



Hi Olena,

I am currently looking to see the effect of government policy on the mental health of people in England, Scotland, and Wales. I drop Northern Ireland (NI) since a dataset I combine with at a later stage does not include information for NI. As part of that, I am performing a longitudinal analysis using the Waves 1-11 from the UKHLS to produce an estimate for each individual in the survey. I wish to take the individual level estimates and aggregate them to produce national and regional level estimates. To make sure the aggregated estimates properly account for the survey design, non-response, and any additional stratification, I plan on using the survey weights to produce the initial estimates.

I have seen there are two types of weights I can use: longitudinal and cross sectional.

For the longitudinal weights, I believe I would take the most recent surveys weight. For me, this would be k_indinus_lw. I then attached this weight to all the individuals for all waves (i.e., the weight for an individual is their k_indinus_lw weight for all waves). From this arises my first questions:

(Q1). I believe the weighting is weighted to include those individuals in NI as well. Since I do not consider these individuals in my analysis, will the fact I remove these individuals from the data set affect the weighting? Alternatively, do I have to alter the weights when I remove the respondents from NI?

(Q2). If a respondent lives in, say, Scotland, then I will include their response in the survey. If between waves 6 and 7, they move to NI, then due to the way I sort the data I will remove their responses from wave 7 onwards. Much like in (Q1), do I need to account for this by altering the longitudinal weights.

A main benefit of the longitudinal weights is the creation of a balanced dataset. From (Q2) (and similar examples where an individual’s change in response means I drop them), I create an unbalanced dataset. This got me wondering if I would be better to use the cross-sectional weight and then pool them to create my own set of weights. This is because it would be useful to still include an individual’s response even if they do not respond to all the surveys and this is essentially what I am doing for the example in (Q2). If this is the case:

(Q3). Because the naming convention changes for Wave 1, 2-5 and 6+, could I confirm if the weights I would need to make a pooled weight would be a_indinus_xw, b_indinub_xw, c_indinub_xw , d_indinub_xw, e_indinub_xw, f_indinui_xw, g_indinui_xw, h_indinui_xw, i_indinui_xw, j_indinui_xw, and k_indinui_xw'

(Q4). If these are the correct cross-sectional weights, what is the best way to go about making a pooled weight? I work in R and have my data organised into long format where each individual has one row per wave and a column for each of the above weights. Due to this, in each weighting column there is an NA for all rows except the rows relating to the wave for that weight.

I apologise for such an involved question, and I hope I have managed to explain myself in a manner that is understandable - if I need to better explain myself, then I am more than happy to do so. If you can give me any guidance at all, I would really appreciate it!

Thank you in advance for any help!

Kind regards,

Also available in: Atom PDF