Support #1273

Weights_considering survey complex structure ?

Added by Lydia Palumbo over 4 years ago. Updated over 1 year ago.

Start date:
% Done:



Dear Support,

I have a question on how to analyse weights.
In the handouts of the Moodle, I found that I should
consider the survey complex design and set data this way with STATA.

svyset psu [pw=weight], strata(strata) singleunit(centered).

Now, I am interested in clustering the standard errors by individual (pid)
or in running a random effect model to account for individual heterogeneity,
but STATA does not allow to svy data in this way
and run this kind of model (which should be reasonable).

Would you think that it is possible to avoid controlling for the complex structure of the
survey? If not, what would I actually miss?

Thank you.


Updated by Stephanie Auty over 4 years ago

  • Category set to Weights
  • Assignee set to Olena Kaminska
  • Private changed from Yes to No

Updated by Olena Kaminska over 4 years ago


Indeed some statistical analyses do not work with svy command. One of such is multilevel models, random effect model is one of these. Two options are available to you: check with an expert on multilevel modelling and whether you may need to use a specialist software to run the model that would enable you to take into account full sample design.

Second option, use Stata and random effects model with weights - you should be able to use weights with this. Also, use person ID and PSU as two nested clusters (depending on your data setup - this may be different). This way you will be ignoring only stratification which does not influence your point estimates and makes your confidence intervals slightly wider. Talk to an expert though about potential effect on within and between variance estimates if you are interested in them.

Hope this helps,


Updated by Lydia Palumbo over 4 years ago

Thank you. It does.

I think that in this way it is possible to use svyset because
I can create a variable that considers both PSU and ID within a strata.
So the command should be:

svyset psupid [pw=weight], strata(strata) ...
I will check if this makes sense with an expert in multilevel.

Thank you again.


Updated by Olena Kaminska over 4 years ago


No, you shouldn't combine PSU and ID - as the results will be wrong. If you have to choose you should use the higher level clustering: PSU. You could also explore an option of using SSU (as ID) with PSU, as in this example:



Updated by Lydia Palumbo over 4 years ago

Thank you. Yes, I realized that the combination was nonsense.Thank you for the link.
I was not sure how to set the individual level in the design. This should help.


Updated by Stephanie Auty over 4 years ago

  • Status changed from New to Feedback
  • Assignee changed from Olena Kaminska to Lydia Palumbo
  • % Done changed from 0 to 80

Updated by Lydia Palumbo over 4 years ago


I do have another question about multistage sampling.

My sampling strategy is to consider only those who formed their union within the panel.
Should I apply a factor of correction for this? I am using longitudinal weights.

Thank you and best,


Updated by Olena Kaminska over 4 years ago


Yes, you can take couples into account. If you want you can specify three levels of clustering: PSUs, couples and individuals. Then I assume you will have observations within individuals.

Hope this helps,


Updated by Lydia Palumbo about 4 years ago


I am using longitudinal weights to analyse the event between t an t+1.

Now I am questioning whether I am using the correct weights. I noticed that once the
boost for Scotland/NI is done, those who attrited before wave 10/12 and
had a weight of 0, are then given a positive weight.
Which weight should I use for these units? Should I consider them as part of the sample
(like as they were truncated for some time)?
I would say yes because otherwise there would not
be the representativeness of GB, but I am not very sure.

I would appreciate your input.

Thank you and best,


Updated by Stephanie Auty about 4 years ago

  • Assignee changed from Lydia Palumbo to Olena Kaminska

Updated by Olena Kaminska about 4 years ago


Thank you for your question. Could you clarify? Are you creating your own weights or are you using ours? If you are using ours - they are correct and you don't need to worry about zeros etc. In our weights ui weight may be positive while ub or us weights are zero. This is due to how they are calculated and this is correct. More importantly in a pooled analysis using us, ub and ui weights together throughout time will give you correct results.

If this doesn't answer your question could you provide more details on which weights your are using in your analysis?


Updated by Lydia Palumbo about 4 years ago

Hi Olena,

Sorry. I was not clear. I will rephrase the issue.
I am doing a pooled cross-sectional analysis (with the events in t+1)
with all the waves from BHPS and UKHLS, including Scottish, Northern Irish
Sample, IEMB and EMB. I am using longitudinal weights, as you said.


Waves from 1 to 9 I am using b`w’_lrwght.
Waves 10 to 12 b`w’_lrwtsw1;
Waves 13 to 18, b`w’_lrwtuk1.

Wave 1 `w’_indinus_xw,
Waves 2 to 6 `w’_indinub_lw
Waves from 7 on `w’_indinui_lw.

Are they correct?

I noticed that those individuals that were having b`w’_lrwght = 0
between wave 1 and 9 (because they missed one wave) are given
a positive weight from wave 10 (by using b`w’_lrwtsw1) or from wave 13
(by using b`w’_lrwtuk1). They would have a weight of 0 if I used b`w’_lrwght
for all the waves (I do not do that because I want also to have boosts).

So my question is whether I have to include those who had a weight of 0
between wave 1 and 9 and then a positive one from wave 10 or 13 on.

Hope this is clear. Please tell me if I could be more explicit.
Thank you and best regards,


Updated by Lydia Palumbo about 4 years ago

I forgot one part. If possible, I would like to do
a robustness check by using cross-sectional (or design) weights
in t and apply my own correction for individual and
partners' non-response, as we spoke on the phone.

Would it be possible to be advised on how to perform it?
Thank you again.



Updated by Olena Kaminska about 4 years ago


My suggestion would be use longitudinal enumeration weight at time t, and conditional on it being positive predict personal response to wave t and wave t+1 at the same time. This will be valid for OSMs only.

Hope this helps,


Updated by Olena Kaminska about 4 years ago

And to respond to your earlier question, the weights that you suggested are correct. The reason that some zero weights become non-zero at wave 10 and 13 is also correct and indeed is related to new modelling for boosts. I suggest that you include everyone in the model and rely on weights to exclude people (people are excluded if a weight is zero). If you want to change anything like have people with zero weights in your model you have to create your own tailored weights. Otherwise I always suggest that your choice to include or exclude people should be only substantive (categories of social groups etc.) and never related to their response pattern or samples - as long as you use weights your results are representative.


Updated by Understanding Society User Support Team over 1 year ago

  • Status changed from Feedback to Resolved
  • Assignee deleted (Olena Kaminska)
  • % Done changed from 80 to 100

Also available in: Atom PDF