Survey design xwaves
Hi, I have constructed a confidence interval for a linear combination of point estimates across BHPS waves 1, 2 and 3. Specifically, I have calculated the share of mortgagors in arrears in each year, and then taken a simple average over these years. I would like to check my methodology for constructing the confidence interval.
First I have only specified the cluster variable psu in my survey design, rather than the strata, to allow for correlation between clusters across years. I have also used xhwght for each year but am worried this is incorrect?
Then I have used the command in stata:
svy, subpop(subpop): mean arrears, over(year)
where subpop is those with tenure==2, arrears is a 1/0 variable, and year takes the values 1991, 1992 or 1993.
I then use the lincom command to take a simple average across the point estimate from each year, which gives me confidence intervals.
Is this methodology correct? Should I be using different weights?
Updated by Peter Lynn about 9 years ago
This all sounds reasonable to me, with one exception. The observations within clusters are not independent. In fact, they will often be the same households in each year. This method of estimating standard errors ignores that, so you will under-estimate the true standard errors. I would conceptualise your target parameter as something like the mean of mean arrears over the 3 year period, so first calculate the mean for each household in the sample (this could be the mean of 3 observations, or mean of 2, or just a sole observation), then use svy: mean on that derived variable. Just a suggestion.