Support #1770
openMaking best use of the Ethnic Minority Boost Sample
100%
Description
Hi Understanding Society team,
I am conducting a basic analysis of health outcomes across combinations of ethnic groups and age groups using the Wave 10 data. My issue is that when I produce things like cross-tabulations, once I apply weights, and break down the sample by both broad ethnic group and age group, I end up with a small N in many cells (ie: < 10) and in some cases 0 counts for a cell and so I am ending up with very large confidence intervals. Having read the general User Guide and the Ethnicity and Immigration User Guide, my understanding is that one of the reasons the ethnic minority boost samples were included was to try to tackle this issue of small sample sizes in marginalised subgroups, so I wanted to check that I am not missing something. I think my main issue is that once I apply a weight (such as j_indinui_xw), the N for the ethnic minorities is being scaled down in order account for the oversampling of these groups, and since the weighted N (rather than the unweighted N) is being used to calculate confidence intervals by R, I end up with very large confidence intervals.
As a quick illustration: I'm using the j_indresp.dta file and have recoded the j_ethn_dv variable into Asian, Black, Other, White. In the Black category, there are 1314 respondents, but when I apply the weight j_indinui_xw and tabulate by ethnicity using the svytable function in R, this is scaled down to only 458.3841. If I then break this down by 10-year age group and my health outcome variable, I end up with very small Ns (or zeros) which means when using a function such as svyciprop in R, I get very large confidence intervals.
This may simply be an unavoidable problem, but given that Understanding Society has put lots of effort into including these extra ethnic minority samples, I wanted to make sure I was making best use of them. And just to double check - I can simply use the normal indresp file in order to draw on this ethnic minority boost sample?
Many thanks for your help and the amazing resources you provide!
Best wishes,
Laurence Rowley-Abel
Files
Updated by Understanding Society User Support Team about 2 years ago
- Category set to Survey design
- Private changed from Yes to No
Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can.
We aim to respond to simple queries within 48 hours and more complex issues within 7 working days.
Best wishes,
Understanding Society User Support Team
Updated by Understanding Society User Support Team about 2 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 50
Dear Laurence,
A significant reduction of the weighted counts for ethnic minorities (so from 1314 respondents to 458.3841 in your example) is expected and intended - due to oversampling they are overrepresented and the weights bring their proportion in the sample down to the level in the population. However, this in itself does not affect standard errors (though weighting may become slightly inflated when the data is weighted but that happens for different reasons).
Does this help?
Best wishes,
Piotr
Understanding Society User Support Team
Updated by Laurence Rowley-Abel about 2 years ago
Dear Piotr,
Many thanks for your reply. The issue is that when I use a function such as svymean in R, it uses the smaller scaled down N (ie: 458) rather than the actual raw N (ie: 1314) to calculate the standard error. For example, if I calculate the mean age of Black respondents while accounting for survey design but do not use weights, the standard error is 0.51. But if I calculate the mean age of Black respondents while accounting for survey design and also use weights, the standard error is 0.98. So the standard error almost doubles.
Is there any way around this? Eg: could weights be calculated that don't take account of the oversampling of ethnic minorities, which could be used in an analysis where ethnicity is one of the explanatory variables anyway (and so the bias due to oversampling would be accounted for in that way)?
Thanks for your help.
Best,
Laurence
Updated by Understanding Society User Support Team about 2 years ago
Dear Laurence,
Can I ask which weight you use and with which wave?
Best wishes,
Piotr
Understanding Society User Support Team
Updated by Laurence Rowley-Abel about 2 years ago
Dear Piotr,
I am using Wave 10 and am using the weight j_indinus_lw as later on in my analysis I will be involving variables from prior waves as well.
If it's helpful, I have attached a minimal reprex for the example I used above of calculating the mean age of black respondents.
Many thanks!
Best wishes,
Laurence
Updated by Understanding Society User Support Team about 2 years ago
- % Done changed from 50 to 70
Dear Laurence,
Here is the advice from Olena Kaminska, our survey statistician:
"For an analysis with only one wave (e.g. wave 10) use _xw weights. I suggest you use data release 11.1 with better sample sizes. For longitudinal analysis you can either use our weight or create your own tailored weight, you can access the tailored weights course here: https://open.essex.ac.uk/enrol/index.php?id=301."
Best wishes,
UKHLS User Support Team
Updated by Laurence Rowley-Abel about 2 years ago
Dear Piotr and Olena,
Thank you for your help with this. I believe my understanding of the use of weighted/unweighted N in the calculation of standard errors was incorrect, and my issue is simply that certain members of the ethnic minority boost sample are just not eligible for the weight (regardless of whether I use the longitudinal or cross-sectional weight), which is why my sample size was reducing so much.
Many thanks.
Best wishes,
Laurence
Updated by Understanding Society User Support Team about 2 years ago
- Status changed from Feedback to Resolved
- % Done changed from 70 to 100