Project

General

Profile

Actions

Support #2196

open

Weighting

Added by Evie Gates about 1 month ago. Updated 18 days ago.

Status:
Feedback
Priority:
High
Category:
Weights
Start date:
01/21/2025
% Done:

90%


Description

Hi there,

I am analysing wave 1 and 2 of Understanding society data and my sample is restricted to ethnic minorities, those who completed the extra 5 minutes questions and those who have ever consumed alcohol. From reviewing the weighting guidance I am using the "b_ind5mus_lw" weight. I selected this weight as wave 2 (b) is the last wave in my analysis, I am analysing individuals (ind), the data I am using is from both the extra five minutes and self completion questionnaires and as the extra five minutes questionnaire is the lowest level of analysis the "ind5m" was selected, my sample is made up of GPS and EMB members (us) and the analysis in longitudinal (lw).

However, when I have used the svyset commands in stata to account for PSU, strata and weight the population size that is reported is substantially lower than my sample size (sample size - 956, population size - 123). I have looked around on stata forums to see why this would be the case and it is stated that this is due to the wrong weighting being used, however as far as I can tell I am using the correct weight.

Please could you inform me why the population size would be so low?

Many Thanks,

Evie Gates


Files

Do file for US support.do (3.61 KB) Do file for US support.do Evie Gates, 01/21/2025 04:29 PM
Do file for US support.do (9.44 KB) Do file for US support.do Evie Gates, 01/21/2025 05:04 PM
Actions #1

Updated by Understanding Society User Support Team about 1 month ago

  • Category set to Weights
  • Status changed from New to In Progress
  • % Done changed from 0 to 10

Hi Evie,

Would you mind sending the code that reproduces the creation of your target group and weighting?

Thanks,
Piotr Marzec
UKHLS User Support Team

Actions #2

Updated by Evie Gates about 1 month ago

Understanding Society User Support Team wrote in #note-1:

Hi Evie,

Would you mind sending the code that reproduces the creation of your target group and weighting?

Thanks,
Piotr Marzec
UKHLS User Support Team

Hi Piotr,

Thanks you for getting back to me so quickly!

Please see the attached stata do file which details the code I used to create my target sample and how applied the weighting to said sample.

Evie Gates

Actions #3

Updated by Understanding Society User Support Team about 1 month ago

Hi Evie,

Thanks. Could you also include the parts that open the data files and create/rename the variables you use? (These variables are not available in the released data under these names.)

Thanks,
Piotr

Actions #4

Updated by Evie Gates about 1 month ago

Hi Poitr,

Apologies, please see the updated do file. If it would be helpful I can also attach my dataset.

Many Thanks,

Evie

Actions #5

Updated by Understanding Society User Support Team 26 days ago

  • Status changed from In Progress to Feedback
  • % Done changed from 10 to 90
  • Private changed from Yes to No

Dear Evie,

In Stata's SVY, the "Population size" is the sum of all individual values of the b_ind5mus_lw weight for respondents remaining in the analysis after you restricted the sample. Since this value is less than 1 for most respondents (as you retained only minorities who are oversampled and on average have lower weights), the total sum is much smaller than the actual number of people in the analysis.

You might also consider using the subpop option for your analysis.

Best wishes,
Piotr
UKHLS User Support

Actions #6

Updated by Evie Gates 26 days ago

Hi Piotr,

Thanks so much for explaining this!

Will the fact that my 'population size' is so low have any impact on my analyses going forward?

I am currently re-cleaning my data using the subpop option instead of dropping non-eligible cases.

Many Thanks,

Evie Gates

Actions #7

Updated by Understanding Society User Support Team 18 days ago

Hi Evie,

Sorry for the delay. No, the 'population size' in Stata doesn’t matter in your case — all statistical testing is based on the 'number of obs' figure. The former is more relevant when weights are scaled to the population size, which isn’t the case with UKHLS.

Best wishes,
Piotr Marzec

Actions

Also available in: Atom PDF