Covid-19 Survey Wave e weights
I would like to explore the relationship between internet use (ce_netpusenew) and mental health status (ce_scghq1_dv) using the data of wave e of covid-19 survey. The regression is carried out in Stata version 16 using the svy commands and survey weights. I use the following command:
- replace ce_scghq1_dv=. if ce_scghq1_dv<0
- replace ce_netpusenew=. if ce_netpusenew<0
- svyset psu [pweight=ce_betaindin_xw], strata(strata) singleunit(centered)
- svy: reg ce_scghq1_dv i.ce_netpusenew
The question is that the population size (10,198) is smaller than the number of observations (12,391)
I am wondering if I did something wrong. I assume that population size should be bigger than the number of observations.
Thanks for your help and looking forward to your reply.
Updated by Alita Nandi over 1 year ago
- Status changed from New to Feedback
- Assignee set to Alita Nandi
- % Done changed from 0 to 50
I repeated your code and then estimated your model without weights (option 1), with weights and survey design (option 2) and with just weights (Option 3), but I did not find the same number of observations as you.
use "$m/covid19/ce_indresp_w", clear
replace ce_scghq1_dv=. if ce_scghq1_dv<0
replace ce_netpusenew=. if ce_netpusenew<0
// Option 1
reg ce_scghq1_dv i.ce_netpusenew
// Option 2
svyset psu [pweight=ce_betaindin_xw], strata(strata) singleunit(centered)
svy: reg ce_scghq1_dv i.ce_netpusenew
// Option 3
reg ce_scghq1_dv i.ce_netpusenew [pw=ce_betaindin_xw]
Option 1: No. of obs = 12403
Option 2: No. of obs = 12391
Option 3: No. of obs = 10256
These numbers is as we would expect. In option 3, Stata ignores cases with zero weights, while in Option 2 it does not. As there are cases with 0 weights, hence the difference in no. of observations for Options 2 & 3.
Updated by JINGYA ZENG over 1 year ago
Many thanks for your reply. Please find my result table in the attachment. The command that I mentioned gives this result, where the No. of obs is exactly the same as the Option 2 (12,391). Here is the first two lines of the result.
Number of strata = 1,587 *Number of obs = 12,391
Number of PSUs = 3,458 * Population size = 10,198.388
The question is that the estimate of population size (10,198) is smaller than the No. of obs (12,391). And it is even smaller than the No. of obs in the Option 3 (10,256).
As mentioned in the Understanding Society COVID-19 User Guide: using “cW_betaindin_xw” will provide estimates that are representative of the population of all adults (16+) who were resident in private households in the UK at the time of wave 9, and who did not die or emigrate before the relevant web survey. So I assume that the result will give a bigger estimate of population size to show the representative of the population. Did I interpret things wrong?
Looking forward to your reply.