Support #2267
openQuestion on Merging and Weighting with R - Understanding Society Calendar Year 2022
80%
Description
Hello,
I am conducting a comparative study on household conditions in London and the South West, using the calendar year 2022 dataset available via the UK Data Service (Open Access version). I would like to double-check that I have correctly implemented the merging and survey weighting procedures to ensure a representative sample.
My unit of analysis is the individual, but I also need to incorporate household income to so I attempted to merge the indresp and hhresp files using the below:
merged_data <- merge(individual_data, household_data, by = "lmn_hidp")
I then constructed the survey design object in R using the survey package as follows:
design <- svydesign(
id = ~lmn_psu, #this is to account for clustering
strata = ~lmn_strata, #stratification
weights = ~lmn_inding2_xw, #the only cross sectional weight I found for the main individual interview
data = mydata,
nest = TRUE
)
I would be grateful if you could confirm:
Is this the correct approach for merging and weighting when conducting individual-level analysis that includes household-level variables?
Is the use of lmn_inding2_xw appropriate for generating representative estimates for calendar year 2022?
Can I assume that the results produced using svytable() or svymean() with this design object are representative of the UK population for 2022?
As an example, I am using the following line to get the weighted sample distribution across regions (with regional_breakdown being a recode of lmn_gor_dv):
svytable(~regional_breakdown, design)
I appreciate any feedback you can provide. Thank you in advance!
Updated by Balsam Gharib 5 days ago
Hello,
I am conducting a comparative study on household conditions in London and the South West, using the calendar year 2022 dataset available via the UK Data Service (Open Access version). I would like to double-check that I have correctly implemented the merging and survey weighting procedures to ensure a representative sample.
My unit of analysis is the individual, but I also need to incorporate household income to so I attempted to merge the indresp and hhresp files in R using the below:
merged_data <- merge(individual_data, household_data, by = "lmn_hidp")
I then constructed the survey design object in R using the survey package as follows:
design <- svydesign(
id = ~lmn_psu, #this is to account for clustering
strata = ~lmn_strata, #stratification
weights = ~lmn_inding2_xw, #the only cross sectional weight I found for the main individual interview
data = mydata,
nest = TRUE
)
I would be grateful if you could confirm:
- Is this the correct approach for merging the files when conducting individual-level analysis that includes household-level variables?
- Is the use of lmn_inding2_xw appropriate for generating representative estimates for calendar year 2022? This weight is only for wave 14 eventhough the calendar year 2022 includes wave 13 and a few respondents from wave 12.
- Can I assume that the results produced using svytable() or svymean() with this design object are representative of the UK population for 2022?
As an example, I am using the following line to get the weighted sample distribution across regions (with regional_breakdown being a recode of lmn_gor_dv):
svytable(~regional_breakdown, design)
I appreciate any feedback you can provide. Thank you in advance!
Updated by Understanding Society User Support Team about 5 hours ago
- Category set to Weights
- Status changed from New to Feedback
- % Done changed from 0 to 80
Hello,
The approach you described sounds correct. About cross-sectional weights - there are 3 _xw waves available in the calendar year 2022 indresp file: lmn_indpxg2_xw, lmn_inding2_xw, lmn_indscg2_xw. When you want to include proxies in the analysis use lmn_indpxg2_xw (other two exclude proxies altogether), if your analysis includes questions that come from the self-completion questionnaire and the main questionnaire use lmn_indscg2_xw. If you're using only questions from the main questionnaire with no self-completion questions use lmn_inding2_xw. In principle, the same rules as described below apply to picking the weights for the calendar year dataset: https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/selecting-the-correct-weight-for-your-analysis/.
Best wishes,
Piotr Marzec
UKHLS User Support
Updated by Understanding Society User Support Team about 5 hours ago
- Private changed from Yes to No
Updated by Balsam Gharib about 5 hours ago
Understanding Society User Support Team wrote in #note-2:
Hello,
The approach you described sounds correct. About cross-sectional weights - there are 3 _xw waves available in the calendar year 2022 indresp file: lmn_indpxg2_xw, lmn_inding2_xw, lmn_indscg2_xw. When you want to include proxies in the analysis use lmn_indpxg2_xw (other two exclude proxies altogether), if your analysis includes questions that come from the self-completion questionnaire and the main questionnaire use lmn_indscg2_xw. If you're using only questions from the main questionnaire with no self-completion questions use lmn_inding2_xw. In principle, the same rules as described below apply to picking the weights for the calendar year dataset: https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/selecting-the-correct-weight-for-your-analysis/.
Best wishes,
Piotr Marzec
UKHLS User Support
Hello Piotr,
Thank you very much for you response. Yes, I am using the main questionnaire so I'll depend on lmn_inding2_xw. Can I now assume, that following this weighting process, the results I obtain from the analysis constitute a representative sample of the UK population for the calendar year 2022?
(I am bit new to working with weights and want to make sure I did not miss anything else)
Updated by Understanding Society User Support Team about 3 hours ago
Hello,
Yes, the results will be representative of the UK population. See the User Guide, section 4.1 (https://doc.ukdataservice.ac.uk/doc/9333/mrdoc/pdf/9333_main_survey_calendar_year_user_guide_2022.pdf)
Best wishes,
Piotr