Project

General

Profile

Support #1685

creating a disabled status variable

Added by William Shufflebottom 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Category:
Data management
Start date:
04/20/2022
% Done:

100%


Description

Hi USoC team,

Could you advise on an appropriate recode for disability status? We are currently using the "health" and "disdif96" variables as shown below to capture those who answer "yes" to being disabled and those who answer "no" to being disabled but nevertheless respond that they do have a condition that
would be considered a disability - So we want to capture those who don't consider themselves disabled but are technically disabled by virtue of having a certain condition. However, the way we have recoded this new "disabled_status" variable (code included below) gives us very large numbers of disabled people (USoC covid wave 8):
Not Disabled - 7180
Disabled - 5060

Can you comment on how we are recoding to capture disabled people and suggest an alternative way to create a "disabled_status" variable (if appropriate) that captures both those who consider themselves disabled and those who are disabled but don't consider themselves disable? We are nearing a publication date so your input is greatly appreciated.

Best wishes

Will

  1. DISABILITY STATUS #############################
  1. RECODING AND ADDING FACTORS FOR THE DISABILITY VARIABLES FOR WAVE 8

USocCov8$health_dis <- car::recode(USocCov8$health, "1 = 1; 2 = 2;else = NA")

USocCov8$health_dis <- factor(USocCov8$health_dis, levels = c(1, 2),
labels = c("Yes",
"No"))

USocCov8$disdif96_dis <- car::recode(USocCov8$disdif96, "1 = 1; 0 = 2;else = NA")

USocCov8$disdif96_dis <- factor(USocCov8$disdif96_dis, levels = c(1, 2),
labels = c("No Disability",
"Mentioned Disability"))

USocCov8 <- USocCov8 >%
dplyr::mutate(disabled_status = case_when(
health_dis "Yes" x%x
disdif96_dis "Mentioned Disability" ~ "Disabled",
health_dis "No" & disdif96_dis "Mentioned Disability" ~ "Disabled",
health_dis "Yes" & disdif96_dis "No Disability" ~ "Disabled",
health_dis "Yes" & is.na(disdif96_dis) ~ "Disabled",
disdif96_dis "Mentioned Disability" & is.na(health_dis) ~ "Disabled",

health_dis  "No" & disdif96_dis  "No Disability" ~ "Not Disabled",
health_dis "No" & is.na(disdif96_dis) ~ "Not Disabled",
disdif96_dis "No Disability" & is.na(health_dis) ~ "Not Disabled",
TRUE ~ NA_character_))%>%
mutate(disabled_status = factor(disabled_status,
levels = c("Not Disabled","Disabled")))

Files

Wave10 disabled_status full N breakdown.PNG (19.6 KB) Wave10 disabled_status full N breakdown.PNG William Shufflebottom, 04/28/2022 11:36 AM
Wave10 Disabled_status groups N.PNG (6.08 KB) Wave10 Disabled_status groups N.PNG William Shufflebottom, 04/28/2022 11:40 AM
Disabled checks image of code.PNG (136 KB) Disabled checks image of code.PNG William Shufflebottom, 04/28/2022 11:57 AM
#1

Updated by Understanding Society User Support Team 5 months ago

  • Category set to Data management
  • Status changed from New to In Progress
  • Assignee set to Understanding Society User Support Team
  • Private changed from Yes to No

Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can.

We aim to respond to simple queries within 48 hours and more complex issues within 7 working days.

Best wishes,
Understanding Society User Support Team

#2

Updated by Understanding Society User Support Team 5 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 50

So, if I understand correctly, your definition of disability is this: health=2 AND disdif96=0, which translates to: no self-declared long-standing illness or disability AND at least one condition on the disdif list, is that right? If this the case, then the number of such people in wave 8 equals 1,605, so maybe something went wrong in the recoding?

Best wishes,
Understanding Society User Support Team

#3

Updated by William Shufflebottom 5 months ago

Hi,

Our definition is:

health: (long-standing illness or disability) where the standard response 1 = "yes"; response 2 = "No"; is unchanged and other responses = "NA"

disdif96: (Type of impairment or disability question's specific response of "none of these") where the standard responses 1 = "Mentioned"; 0 = "Not mentioned"; has response 1 left as is and given the factor level "No Disability" (as if they are mentioning disdif96 "none of these" then they are saying they have no other conditions) and response 0 (which is disdif96 not being mentioned) changed to factor level 2 "Mentioned Disability" (because they assumedly answered another response other than disdif96's "not mentioned" therefore a condition was mentioned).

Remembering that we recoded disdif96 “0” response to “2” our disability status recode goes:

1. health = 1 (yes to long-standing illness or disability) & disdif96 = 2 [Mentioned disability (formerly 0)] -> "disabled"
2. health = 2 (No long-standing illness or disability) & disdif96 = 2 -> “disabled” (said no to being disabled but then mentioned a disability)
3. health = 1 (yes) & disdif96 = 1 (No disability) -> “disabled” (said yes to having a long-term condition but then did not mention a disability)
4. health = 1 (yes) & disdif96 = NA -> “disabled” (as mentioned a long-term condition)
5. health = NA & disdif96 = 2 -> “disabled” (as mentioned disability in one question)
6. health = 2 & disdiff96 = 1 -> “Not disabled” [no to health (long-term condition) and disdiff96 “none of these” (conditions)]
7. health = 2 & disdif96 = NA -> Not disabled (no to long-term condition and other response is NA)
8. health = NA & disdif96 = 1 (NA response to health and disdif96 “none of these” mentioned)

So, we try to take every combinations where someone has responded that they have a condition in at least one of the questions as confirmation of disability status and where they respond "No" to having a long-term condition in both questions, or no in one question and NA in the other as "not disabled". This is our "disability_status" variable which produces counts of 7180 for "Not Disabled", 5060 for "Disabled", and 440 for "NA" for USoC Covid Wave 8.

#4

Updated by Understanding Society User Support Team 5 months ago

  • % Done changed from 50 to 80

Hello,

We could not find these variables in the UKHLS Covid Survey Wave 8 dataset (collected in Mar 2021 and more information here https://www.understandingsociety.ac.uk/documentation/covid-19).
I am assuming you are using the main UKHLS survey Wave 8 data (collected in 2016-17 and more information here https://www.understandingsociety.ac.uk/documentation/mainstage). Using this data, we created a disabled_status variable as per your description, the sample sizes for each category are shown as N=?:

1. health = 1 (yes to long-standing illness or disability) & disdif96 = 2 [Mentioned disability (formerly 0)] -> "disabled" N=8851
2. health = 2 (No long-standing illness or disability) & disdif96 = 2 -> “disabled” (said no to being disabled but then mentioned a disability) N=1605
3. health = 1 (yes) & disdif96 = 1 (No disability) -> “disabled” (said yes to having a long-term condition but then did not mention a disability) N=4729
6. health = 2 & disdiff96 = 1 -> “Not disabled” [no to health (long-term condition) and disdiff96 “none of these” (conditions)] N=22793

The categories with NA are not useful, as these are the cases where these questions were never asked. Such cases are N=1315

Hope this helps. If you have further questions please let us know.

Best wishes,
Understanding Society User Support Team

#5

Updated by William Shufflebottom 5 months ago

Hi,

Maybe we can use the Main survey Wave 10 to make things clearer as we attached these data from Wave 11 to Wave 8 COVID.

So, on the subject of NAs, we took the approach that if they said they were disabled in one question and then didn't answer the second question then we still include them as having answered because a missing answer in one question doesn't change their response to the other question. Please let me know if I am not grasping what you mean. The cases where there's an answer in one and NA in the other are very few to negligible regardless and I don't think this is the cause for large numbers of disabled which can be seen in the breakdown of N for Wave10 disabled_status (Wave10 Disabled_status groups N.PNG).

Looking at Wave 10 then, I have labelled each combination and the column headings show the responses (please see attached "Disabled checks image of code.PNG" for an image of this code and "Wave10 disabled_status full N breakdown.PNG for N counts per combination") - Each row denotes a different combination to the answers to health_dis and disdif96_dis (these are recodes of health and sisdif96 but with response 0 in disdif changed to value 2 as explained above - the N breakdowns are otherwise identical to the original variables). The code looks correct to me if our goal is to capture those who have said they are disabled and those who have said they are not disabled but have answered with a long-term health condition to the disdif question.

It appears to my eye that something is happening in Disabled2 or Disabled3 [Disabled 2 -> health_dis "No" & disdif96_dis "Mentioned Disability" ~ "Disabled2" | Disabled3 -> health_dis "Yes" & disdif96_dis "No Disability" ~ "Disabled3"]. Or, is there possibly some way that people are funnelled from the "Health" question to the "disdif" question that may be causing issues?

This is taken from your variable checker:

disability_w11.disdif
Type of impairment or disability
Type
multichoice
Interviewer Instruction
PROBE FOR ANY OTHERS
CODE ALL THAT APPLY
Text
Even though you don't have any long-standing health problems, do {if HEALTH = 2} / Do you have any health problems or disabilities that mean you have substantial difficulties with any of the following areas of your life? {if HEALTH <> 2}

Your input is appreciated

Will

#6

Updated by Understanding Society User Support Team 5 months ago

Ok, I see.

We looked at the Wave 10 data and recreated the variable that you have created and we get the same frequencies and so your code is correct. But I am not sure I understand your question. You have queried disability2 & disability3 which are:
1. health = 2 & disdif96==0 --> reported NO to long term disability question but have said yes to having any health problems or disabilities that mean they have substantial difficulties with one of the areas in the list specified.
This is possible as someone may not see themselves as having a long term disability but may have difficulties in specific areas of their life. Upto Wave 7, respondents were only asked the disdif* question if they said yes to health(=1). But it was changed to accommodate these cases, based on research done in the Innovation Panel during IP6 & IP7. See these reports which includes analysis of experiements that were done regarding these quesitons. https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2015-03.pdf
https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2014-04.pdf

2. health = 1 & disdif96=1 --> reported has a long term disability but didn't choose having difficulties in any of the areas of functionining listed - that could be because they have a disability not listed here. It is likely they didn't consider choosing disdif12 "other health problem or disability". We did check if this has to do with interview mode and that is not the case. But without further research it is difficult to say why they responded in this manner.

Hope this helps.

Best wishes,
Understanding Society User Support Team

#7

Updated by Understanding Society User Support Team 4 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF