Support #2292
openextract annual data from a UKHLS wave
50%
Description
Dear Roberto Cavazos,
I am trying to use w_month to extract data regarding the year 2016.
First, as I use STATA11, I applied the command use13 to open the UKHLS data regarding the wave 8 2016-2018, but for some strange reason it did not work. Hence, I used the command insheet to open the .tab files, but now when I use the command codebook the variables lost their definitions, what is the definition of the variables h_month and h_country? Can I find the definition of these variables in the questionnaire? Under which name?
The values of the variable h_month ranges from 1 to 24, does this mean that 1-12 months refers to the year 2016, the year I am interested in? Is it enough to use wave 8 and h_month and the adjustment to extract the data of 2016 or I have to use the same adjustment procedure for the wave 7?
Regarding the wave 8, the file h_income does not have the variables h_month and h_country, does this mean that I cannot use this file and the variables related to income?
I am comparing the years 2016 and 2019 using cross-sectional data, I do not understand why I have to make that adjustment and why at the end there is the creation of a longitudinal weight, I need to create and use a cross-sectional weight.
This is the adjustment procedure:
gen adj=1
replace adj=0.5 if w_country==4
gen weight=w_xxxyyus_lw*adj 8
What does this mean?
How should I use adj? Should I multiply adj for which other variables?
How should I generate cross-sectional weights?
Thanks a lot for your help and patience;
Best Wishes,
Raffaele Ciula
Files
Updated by Understanding Society User Support Team 2 months ago
- Category set to Data documentation
- Status changed from New to Feedback
- % Done changed from 0 to 50
- Private changed from Yes to No
Hello Raffaele,
I’m sorry to hear you lost the labels when opening the dataset, maybe try again? Sometimes the software plays tricks on us. If you still can’t see the labels, you can use the Mainstage Variables search (https://www.understandingsociety.ac.uk/documentation/mainstage/variables/). You can search by variable name or concept, and it will return all matching results. You’ll also be able to see which datafile the variable belongs to, along with its frequencies or statistics per wave, among other details.
You can merge all the files using the pidp variable. This is the unique cross-wave person identifier that allows you to match individuals across different files, both within and across waves. In this way, you can include all the variables you need for your analysis.
I also recommend checking the section on Data management syntax files (https://www.understandingsociety.ac.uk/documentation/mainstage/syntax/), where you’ll find example syntax files showing how to perform common data management tasks, such as merging two files or matching information between respondents and their partners.
Regarding the use of weights, my apologies for not explaining it clearly before. Adjustments are needed because the Northern Ireland and BHPS samples are available only for months 1–12, while the IEMB sample is available only for months 13–24. If you take only the first year (months 1–12), you’ll overrepresent Northern Ireland and BHPS; if you take only the second year, you’ll overrepresent IEMB. The adjustment uses the longitudinal weight to make the sample period you’re using representative.
It is also necessary to combine information from year 1 of Wave 8 with year 2 of Wave 7, since both refer to data from 2016. This is known as pooling data from different waves for cross-sectional analysis. You can find a very similar example in item 14 of this document (https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf), which also includes the corresponding Stata code in Box 1 to perform the required adjustment.
I hope this information is helpful.
Best wishes,
Roberto Cavazos
Understanding Society User Support Team
Updated by Raffaele Ciula 2 months ago
Dear Roberto Cavazos,
thanks or your email,
as I do not understand very well the procedure to extract single years, I ask you again if you can describe in simple and detailed way what I have to do to extract the years 2016 and 2019.
First, as for h_month, the values between 1-12 represent the year 2016, the values between 13-24 represent the year 2017, is this right?
Second, I have to use 2016-2018 wave and apply these commands:
gen adj=1
replace adj=0.5 if w_country==4
gen weight=w_xxxyyus_lw*adj 8
This should give me a longitudinal weight that I can apply to the 2016 data (January 2016-December 2016), is this correct?
Afterwards, I have to use wave7 and apply the same commands:
gen adj=1
replace adj=0.5 if w_country==4
gen weight=w_xxxyyus_lw*adj 8
This should give me a longitudinal weight that I can apply to the 2016 data (January 2016-December 2016), is this correct?
Hence, I have two longitudinal weights, what should I do with both of them? Specifically, should I use both weights to combine the information regarding the year 2016 in both waves? How?
Can you please explain step by step, in a simple way, in a detailed way the procedure to get the data regarding the year 2016? Please do not refer to manuals or pdfs in the Understanding Society website because they are not clear, confusing, and fragmented.
Also, I thought that merging household level file (such as w_hhresp) with an individual level file (such as w_indresp) within the same wave would be enough to get individual information and household information, that is individual 1 has in a column information regarding a household variable, such as homeownership and in another column an individual variable, such as age or education achievement, is this the case or after the merge I have to use other commands to obtain this information?
As for matching individuals within a household, I thought that there should be a household identifier such as 1 for all members of family 1, an individual identifier for each member, such as 1 to 5 if there are 5 members (which includes all parents, spouses, siblings, grandchild, and grandparents etc....), and a relation(al) variable which gives the relation within families, is this the case or I should use other commands to obtain this information?
Why does the command use13 not work to convert the dataset to a stata11 format? It works for other datasets but not for the UKHLS, is it because I made a mistake or for other reasons?
Finally, what command should I use to open the UKHLS .tab files? Is insheet correct, or there is a more appropriate command that let labels stay in the dataset after importing the .tab data? Thanks for your kind help and patience;
Best Wishes,
Raffaele Ciula
Updated by Understanding Society User Support Team 2 months ago
Hello Raffaele,
Thank you for your feedback on the documentation, it’s important for us to know that it’s not as clear as we initially thought. I’ll pass your comments on to the team that produces these documents so they can make updates to improve readability wherever possible.
Attached, you’ll find an example (code) of the adjustment that needs to be made for the year 2016, in the context of pooling data from different waves to conduct cross-sectional comparisons. This is exactly the same as item 14 of https://www.understandingsociety.ac.uk/wp-content/uploads/working-papers/2024-01.pdf
I’ve also included an example (code) using the variable sex, where you can see the proportions of men and women in 2016. You’ll need to repeat this same process for 2019, that is, year 2 of Wave 10 (letter j) and year 1 of Wave 11 (letter k).
Regarding merging household-level files (such as w_hhresp) with individual-level files (such as w_indresp) within the same wave: you are correct. When you merge them, you’ll have all the information for both individuals and the households they belong to. If you use indresp as the master file, the household information will be repeated for each household member.
As for matching individuals within a household, you can use the egoalt file if you only want to identify relationships within the household, or the xhhrel (family matrix) file if you want to capture relationships outside the household, for example, when a relative has moved to a different household but remains in the study. You can find more details about these two files in the User Guide.
Regarding your Stata question, I’m not an expert and I’m not familiar with the use13 command. I recommend checking forums like Statalist.org. My understanding is that the insheet command was replaced by import delimited starting in Stata 13. This command reads raw text files, meaning it does not automatically apply labels or formats.
I hope this information is helpful.
Best wishes,
Roberto Cavazos
Understanding Society User Support Team