## Support #1484

### Covid-19 Survey Wave e weights

100%

**Description**

Hi,

I would like to explore the relationship between internet use (ce_netpusenew) and mental health status (ce_scghq1_dv) using the data of wave e of covid-19 survey. The regression is carried out in Stata version 16 using the svy commands and survey weights. I use the following command:

- replace ce_scghq1_dv=. if ce_scghq1_dv<0
- replace ce_netpusenew=. if ce_netpusenew<0

- svyset psu [pweight=ce_betaindin_xw], strata(strata) singleunit(centered)
- svy: reg ce_scghq1_dv i.ce_netpusenew

The question is that the population size (10,198) is **smaller** than the number of observations (12,391)

I am wondering if I did something wrong. I assume that population size should be bigger than the number of observations.

Thanks for your help and looking forward to your reply.

Best wishes,

Jingya

**Files**

#### Updated by Alita Nandi about 1 month ago

**% Done**changed from*0*to*50***Assignee**set to*Alita Nandi***Status**changed from*New*to*Feedback*

Hello,

I repeated your code and then estimated your model without weights (option 1), with weights and survey design (option 2) and with just weights (Option 3), but I did not find the same number of observations as you.

use "$m/covid19/ce_indresp_w", clear

replace ce_scghq1_dv=. if ce_scghq1_dv<0

replace ce_netpusenew=. if ce_netpusenew<0

// Option 1

reg ce_scghq1_dv i.ce_netpusenew

// Option 2

svyset psu [pweight=ce_betaindin_xw], strata(strata) singleunit(centered)

svy: reg ce_scghq1_dv i.ce_netpusenew

// Option 3

reg ce_scghq1_dv i.ce_netpusenew [pw=ce_betaindin_xw]

Option 1: No. of obs = 12403

Option 2: No. of obs = 12391

Option 3: No. of obs = 10256

These numbers is as we would expect. In option 3, Stata ignores cases with zero weights, while in Option 2 it does not. As there are cases with 0 weights, hence the difference in no. of observations for Options 2 & 3.

https://www.stata.com/support/faqs/statistics/svy-and-zero-weights/

#### Updated by JINGYA ZENG about 1 month ago

**File**Results.png Results.png added

Hi Alita,

Many thanks for your reply. Please find my result table in the attachment. The command that I mentioned gives this result, where the No. of obs is exactly the same as the Option 2 (12,391). Here is the first two lines of the result.

**Number of strata = 1,587 *Number of obs = 12,391****Number of PSUs = 3,458 * Population size = 10,198.388**

The question is that the estimate of population size (10,198) is **smaller** than the No. of obs (12,391). And it is even **smaller** than the No. of obs in the Option 3 (10,256).

As mentioned in the Understanding Society COVID-19 User Guide: *using “cW_betaindin_xw” will provide estimates that are representative of the population of all adults (16+) who were resident in private households in the UK at the time of wave 9, and who did not die or emigrate before the relevant web survey.* So I assume that the result will give a bigger estimate of population size to show the representative of the population. Did I interpret things wrong?

Looking forward to your reply.

Best wishes,

Jingya

#### Updated by Alita Nandi about 1 month ago

Sorry, I see what you mean.

The weights are scaled so that they add up to the sample size and not the UK population, and as there are zero weights this sum is less than the number of observations.

#### Updated by Alita Nandi about 1 month ago

**Private**changed from*Yes*to*No***% Done**changed from*50*to*100***Assignee**deleted ()*Alita Nandi***Status**changed from*Feedback*to*Resolved*