paired balanced panel in UKHLS/BHPS
I am modelling data from both the UKHLS and BHPS and my dataset is constructed so that I can compare pairs of waves.
The format of my dataset when simplified has the following columns (jpeg is attached also):
- ID (BHPS pid,UKHLS pidp)
- var A at wave t
- var A at wave t-1
- Other covariates
- I have used the cross sectional weight at wave t as the weight for each individual row. I have used the weights for the original sample until wave 8 then used those for inclusion of the boost of Scotland and Wales from wave 9 until wave 10, then used that for including Northern Ireland
- I am however, wondering whether it is better to use a longitudinal weight for each row - considering the dataset as a paired balanced panel.
What are your thoughts on this?
- I have started considering constructing special weights for this - for comparing pairs of waves in the BHPS and UKHLS. Any suggestions on this are very welcome. Hoping to make my code available to others who might be interested
With best regards and thanks,
Updated by Glenna Nightingale about 4 years ago
I have used the cross sectional weight at wave t-1 as the weight for each individual row. I have used the weights for the original sample until wave 8 then used those for inclusion of the boost of Scotland and Wales from wave 9 until wave 10, then used that for including Northern Ireland
Updated by Victoria Nolan about 4 years ago
- Status changed from New to In Progress
- Assignee changed from Olena Kaminska to Glenna Nightingale
- % Done changed from 0 to 10
- Private changed from Yes to No
Many thanks for your enquiry - I will pass this on to our weighting team for a reply.
Best wishes, Victoria.
Updated by Peter Lynn about 4 years ago
I think that using the cross-sectional weight for either t or t-1 is reasonable (but not a mixture of them), for reasons that are explained in this note about pooloing: https://www.understandingsociety.ac.uk/support/issues/494.
Most of the issues discussed in that note are relevant to your analysis. The one additional point is that you are using pairs of waves, so you can think of this as pooled cross-sectional analysis with one extra step, which is that some of the cross-sectional respondents at each wave t are dropped from your analysis due to not being observed at t-1. You could therefore model this as an additional step of non-response and use the model-predicted probabilities to be observed at t-1 as an adjustment to the CS weight, i.e.:
Create the data set of all wave t respondents (pooled waves) and create a 0/1 indicator of whether or not they also responded at t-1 (i.e. whether the record can be used in your analysis). Model this indicator based on relevant respondent characteristics (e.g. a logistic regression). This will give you a predicted probability for every respondent of the t-1 information being present. Call this P. You then need to adjust the wave t CS weight by multiplying it by 1/P for all the records that can be included in your analysis.
Hope that helps,