Support #2292
openextract annual data from a UKHLS wave
50%
Description
Dear Roberto Cavazos,
I am trying to use w_month to extract data regarding the year 2016.
First, as I use STATA11, I applied the command use13 to open the UKHLS data regarding the wave 8 2016-2018, but for some strange reason it did not work. Hence, I used the command insheet to open the .tab files, but now when I use the command codebook the variables lost their definitions, what is the definition of the variables h_month and h_country? Can I find the definition of these variables in the questionnaire? Under which name?
The values of the variable h_month ranges from 1 to 24, does this mean that 1-12 months refers to the year 2016, the year I am interested in? Is it enough to use wave 8 and h_month and the adjustment to extract the data of 2016 or I have to use the same adjustment procedure for the wave 7?
Regarding the wave 8, the file h_income does not have the variables h_month and h_country, does this mean that I cannot use this file and the variables related to income?
I am comparing the years 2016 and 2019 using cross-sectional data, I do not understand why I have to make that adjustment and why at the end there is the creation of a longitudinal weight, I need to create and use a cross-sectional weight.
This is the adjustment procedure:
gen adj=1
replace adj=0.5 if w_country==4
gen weight=w_xxxyyus_lw*adj 8
What does this mean?
How should I use adj? Should I multiply adj for which other variables?
How should I generate cross-sectional weights?
Thanks a lot for your help and patience;
Best Wishes,
Raffaele Ciula
Updated by Understanding Society User Support Team about 13 hours ago
- Category set to Data documentation
- Status changed from New to Feedback
- % Done changed from 0 to 50
- Private changed from Yes to No
Hello Raffaele,
I’m sorry to hear you lost the labels when opening the dataset, maybe try again? Sometimes the software plays tricks on us. If you still can’t see the labels, you can use the Mainstage Variables search (https://www.understandingsociety.ac.uk/documentation/mainstage/variables/). You can search by variable name or concept, and it will return all matching results. You’ll also be able to see which datafile the variable belongs to, along with its frequencies or statistics per wave, among other details.
You can merge all the files using the pidp variable. This is the unique cross-wave person identifier that allows you to match individuals across different files, both within and across waves. In this way, you can include all the variables you need for your analysis.
I also recommend checking the section on Data management syntax files (https://www.understandingsociety.ac.uk/documentation/mainstage/syntax/), where you’ll find example syntax files showing how to perform common data management tasks, such as merging two files or matching information between respondents and their partners.
Regarding the use of weights, my apologies for not explaining it clearly before. Adjustments are needed because the Northern Ireland and BHPS samples are available only for months 1–12, while the IEMB sample is available only for months 13–24. If you take only the first year (months 1–12), you’ll overrepresent Northern Ireland and BHPS; if you take only the second year, you’ll overrepresent IEMB. The adjustment uses the longitudinal weight to make the sample period you’re using representative.
It is also necessary to combine information from year 1 of Wave 8 with year 2 of Wave 7, since both refer to data from 2016. This is known as pooling data from different waves for cross-sectional analysis. You can find a very similar example in item 14 of this document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf), which also includes the corresponding Stata code in Box 1 to perform the required adjustment.
I hope this information is helpful.
Best wishes,
Roberto Cavazos
Understanding Society User Support Team