Project

General

Profile

Actions

Support #2260

open

Request for explaining the different missing rates of variables across all waves

Added by Evan Zhang 27 days ago. Updated 7 days ago.

Status:
Feedback
Priority:
Urgent
Category:
Data documentation
Start date:
06/14/2025
% Done:

90%


Description

Dear Understanding Society Team,

I am currently working with data from the UKHLS and have some questions regarding response patterns across different waves.
Specifically:

1. For the GHQ12 variable (scghq2_dv), there appear to be large differences in the rates of missing, inapplicable, and proxy responses across waves. For example, the total rate of these responses in Wave 1 is about 20%, but the total rate in Wave 14 is about 5%.

2. Similarly, for the variables jbsoc00, jbsoc10, and jbsoc20, we observed significant differences in the rates of missing responses across waves. For example, the missing rate of jbsoc20 in Wave 14 is about 27%, the missing rate of jbsoc10 in Wave 14 is about 6%, and the missing rate of jbsoc00 in Wave 14 is about 1%.

We have reviewed the "Main Survey User Guide" but could not find specific explanations for these discrepancies. We would appreciate it if you could clarify the reasons for these differences. Thank you for your time and assistance.

Best regards,
Evan Zhang


Files

clipboard-202506181601-dnhew.png (14.2 KB) clipboard-202506181601-dnhew.png Understanding Society User Support Team, 06/18/2025 04:01 PM
Actions #1

Updated by Understanding Society User Support Team 23 days ago

Hello Evan

In Waves 1 and 2, the self-completion questionnaire, including the GHQ questions, was administered on paper. We’ve observed a relatively high level of missing responses for certain GHQ items in these early waves. However, the proportion of missing data decreases steadily in later waves, as shown in the table below:

Variable: scghq2_dv

To identify valid responses, it’s important to review the Question Universe, which specifies eligibility for each question. In this case, scghq2_dv is a derived variable created using the following items:
scghqa to scghql.

You can find information on the universe for each question using the Mainstage Variable Search. For example, here is the link for scghqa, and if you scroll the page, under “Question asked in the latest wave,” you will see the universe specification:
https://www.understandingsociety.ac.uk/documentation/mainstage/variables/scghqa/

If you're interested in the syntax used to construct the scghq2_dv variable, it’s available here:
https://www.understandingsociety.ac.uk/wp-content/uploads/documentation/main-survey/syntax/stata/ghq_dv.do

Regarding occupation variables like jbsocXX, these are provided by the fieldwork agency and are already coded. At the moment, there is no official process for mapping SOC00 codes to SOC10 or SOC20, but it is something that could be explored further. If you'd like to carry out the conversion yourself, you can use the coding index provided by the ONS here: https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2020/soc2020volume2codingrulesandconventions

Lastly, if you're interested in survey response rates, further details can be found in the following resources:
• Main survey user guide: https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/response-rates/
• Response tables: https://understandingsociety.ac.uk/wp-content/uploads/documentation/user-guides/6614_main_survey_user_guide_response_tables.pdf

I hope this information is helpful.

Best wishes,

Roberto Cavazos
Understanding Society User Support Team

Actions #2

Updated by Evan Zhang 17 days ago

Hi Roberto,

Thanks for your reply. I have the further question about the reasons behind the steadily decreasing proportion of missing data in later waves. I would like to further understand what factors have contributed to this decline in missing responses. Thank you very much.

Best wishes,
Evan

Actions #3

Updated by Understanding Society User Support Team 17 days ago

Dear Evan,

Is there a particular element you'd like us to explain further?

Best wishes,
UKHLS User Support

Actions #4

Updated by Evan Zhang 12 days ago

Hi,

Sorry for the late reply. Regarding the variable scghq2_dv, you previously explained that “In Waves 1 and 2, the self-completion questionnaire, including the GHQ questions, was administered on paper. We’ve observed a relatively high level of missing responses for certain GHQ items in these early waves.”

I would like to further understand why the rate of missing responses gradually decreased in later waves. Was this due to specific policies or measures taken to encourage responses, or were there other reasons? Thank you very much.

Best wishes,
Evan

Actions #5

Updated by Understanding Society User Support Team 7 days ago

  • % Done changed from 50 to 90

Hi Evan,

The gradually increasing missing values in these variables are due to the mode effect. The highest number appears in waves 1 and 2, because that section of the questionnaire was administered on paper (as Roberto pointed out above). The next noticeable decrease happens in wave 6, when web interviews were introduced. If you tabulate w_scghq2_dv by w_indmode you'll see fewer missing values for web interviews and more for face-to-face. Since the share of web interviews has been gradually growing, the number of missing values in scghq2_dv has gradually declined. This trend is driven by two factors: 1) web routing was more liberal – unlike in face-to-face mode, web respondents didn’t need to give separate consent to complete the self-completion part (web is self-completion by default; you can check the universe/routing changes by wave in the PDFs of the questionnaires: https://www.understandingsociety.ac.uk/documentation/mainstage/questionnaires/); 2) there are no proxy web interviews.

I hope this clarifies the issue.

Best wishes,
Piotr Marzec
UKHLS User Support

Actions

Also available in: Atom PDF