Support #393
closed
Weights for unbalanced panel
Added by Gareth Hagger-Johnson over 9 years ago.
Updated over 9 years ago.
Description
I want to use all available data from BHPS and US, rather than restricting my analysis to those who took part in every wave (which would lower statistical power). This produces an unbalanced design with no suitable weight. Are there any strategies I can use to address the complex survey design problem, such as controlling for variables that were used to create weights? The model I will be using is xtlogit in Stata which does not allow svy options.
Could you clarify what kind of analysis you intend to undertake? Are you thinking of some kind of longitudinal model in which you can include, and deal with appropriately, censored data of all kinds? Or something more like a cross-sectional analysis or pooled CS?
Peter Lynn wrote:
Could you clarify what kind of analysis you intend to undertake? Are you thinking of some kind of longitudinal model in which you can include, and deal with appropriately, censored data of all kinds? Or something more like a cross-sectional analysis or pooled CS?
It is a random intercept logistic regression model, e.g.
meqrlogit y age i.x c.age#i.x || pid:
This deals with nested observations within individuals and uses all available data from all waves. But it doesn't address the complex survey design.
So, it looks as though you are effectively pooling across waves. Your model structure should deal with the clustering of observations within persons, as you say. But I don't think it deals with higher levels of the hierarchy of clustering (i.e. that persons are clustered within households and households are clustered within PSUs). Neither does it deal with either startified sampling or unequal inclusion probabilities.
The last of these (unequal inclusion probabilities) is what weights are designed to deal with. I suggest that for each observation you use the relevant cross-sectional weight. That should correct for design probabilities and non-response. But it does not deal with stratified sampling or with the remainder of the clustering.
- Category set to Weights
- Status changed from New to In Progress
- Target version set to X M
- % Done changed from 0 to 90
Peter Lynn wrote:
So, it looks as though you are effectively pooling across waves. Your model structure should deal with the clustering of observations within persons, as you say. But I don't think it deals with higher levels of the hierarchy of clustering (i.e. that persons are clustered within households and households are clustered within PSUs). Neither does it deal with either startified sampling or unequal inclusion probabilities.
The last of these (unequal inclusion probabilities) is what weights are designed to deal with. I suggest that for each observation you use the relevant cross-sectional weight. That should correct for design probabilities and non-response. But it does not deal with stratified sampling or with the remainder of the clustering.
How then can I additionally account for strata and psu, in addition to the cross-sectional weight and allowing for clustering of individuals using || pid?
You could account for psu-level clustering by running multilevel (hierarchical) analysis, with psu as the higher level units. Or alternatively use some kind of replication method (e.g. BRR or Jackknife) to estimate standard errors. The latter would in theory allow you to take into account stratification too, though you would probably have to make some simplifying assumptions like merging strata. Or you could just (fairly safely) ignore the stratification, on the grounds that a) this is a conservative approach, and b) effects are anyway likely to be small (and least compared to clustering and weighting effects).
- Status changed from In Progress to Closed
- % Done changed from 90 to 100
Also available in: Atom
PDF