Support #307

Survey design xwaves

Added by Daisy McGregor over 6 years ago. Updated about 6 years ago.

Data analysis
Start date:
% Done:



Hi, I have constructed a confidence interval for a linear combination of point estimates across BHPS waves 1, 2 and 3. Specifically, I have calculated the share of mortgagors in arrears in each year, and then taken a simple average over these years. I would like to check my methodology for constructing the confidence interval.

First I have only specified the cluster variable psu in my survey design, rather than the strata, to allow for correlation between clusters across years. I have also used xhwght for each year but am worried this is incorrect?

Then I have used the command in stata:
svy, subpop(subpop): mean arrears, over(year)

where subpop is those with tenure==2, arrears is a 1/0 variable, and year takes the values 1991, 1992 or 1993.

I then use the lincom command to take a simple average across the point estimate from each year, which gives me confidence intervals.

Is this methodology correct? Should I be using different weights?



Updated by Peter Lynn over 6 years ago


This all sounds reasonable to me, with one exception. The observations within clusters are not independent. In fact, they will often be the same households in each year. This method of estimating standard errors ignores that, so you will under-estimate the true standard errors. I would conceptualise your target parameter as something like the mean of mean arrears over the 3 year period, so first calculate the mean for each household in the sample (this could be the mean of 3 observations, or mean of 2, or just a sole observation), then use svy: mean on that derived variable. Just a suggestion.

Best wishes,



Updated by Redmine Admin about 6 years ago

  • Category set to Data analysis
  • Status changed from New to Closed
  • Target version set to BHPS
  • % Done changed from 0 to 100

Also available in: Atom PDF