Project

General

Profile

Actions

Support #2255

open

Request for Feedback on Weighting Strategy for Longitudinal Event History Analysis

Added by Irene Frageri about 2 months ago. Updated 28 days ago.

Status:
Feedback
Priority:
Normal
Category:
Weights
Start date:
05/20/2025
% Done:

50%


Description

Dear Understanding Society Support Team,

I am writing to seek your advice on the weighting strategy I am using for a longitudinal Event History analysis based on the Understanding Society data. I have consulted the available documentation and discussed with other researchers, but given the specific structure of my data and research design, I would appreciate your expert opinion.

My data setup:
I have constructed a long-format monthly panel dataset, where each respondent appears in multiple rows. I follow individuals from their entry into the sample until one of the following:

-they experience the event (first birth),

-they exit the reproductive age window, or

-they drop out of the panel (non-intermittent response only).

As a result, different respondents exit the analysis at different waves. I observed that many weights are zero, which I assume is because the respondent is not part of the OSM (original sample members).

My weighting strategy:

I use the longitudinal individual weights (_indscus_lw) from each wave (b, c, ..., n).

For each wave, I compute the mean weight across all individuals with non-missing value. I use this to rescale the weights:
prefix_longitudinalweight = prefix_indscus_lw / mean(prefix_indscus_lw).
This ensures the rescaled weights have a mean of 1. This ensures that the rescaled weights have a mean of 1. Since my dataset is in monthly long format, each individual appears multiple times — once for each month they are observed — and their original weight is repeated across those rows. However, I think that because I compute a mean, this repetition does not affect the validity of the rescaling, as each individual’s weight contributes proportionally to the average.

I calculate the total rescaled weight for each wave by summing the rescaled weights:
prefix_totalweight = sum(prefix_longitudinalweight).
I then generate a constant variable per wave containing this total for all individuals.

I compute an average total weight across all waves: average_longitudinal.

I calculate a scaling factor for each wave:
prefix_scale = average_longitudinal / prefix_totalweight.

I apply the scaling factor to the rescaled weight:
prefix_weight_rescaled = prefix_scale * prefix_longitudinalweight.

Finally, for each respondent, I assign their weight based on the last wave in which they are observed (prior to their event or censoring).

For respondents who are only observed in wave 1, I use the weight from wave 2.

I would be grateful if you could let me know whether this approach is methodologically sound, particularly in the context of a monthly, long-format Event History analysis with varying exit points across individuals.

Thank you in advance for your time and support. It is truly appreciated.

Best wishes,
Irene Frageri


Files

Worksheet ex6 R.pdf (462 KB) Worksheet ex6 R.pdf Understanding Society User Support Team, 06/10/2025 02:37 PM
Worksheet ex 6 Stata.pdf (400 KB) Worksheet ex 6 Stata.pdf Understanding Society User Support Team, 06/10/2025 02:37 PM
Actions #1

Updated by Olena Kaminska about 2 months ago

Irene,

Thank you for your question. I can tell you that your current method is wrong as you can't use the last observed weight. Weights only represent a population taken from a particular point in time, and can't be taken from different points in time for different people (unless this is directly related to a particular event and not attrition).
To help you choose the best weight I need to know what you want to represent. Are you representing events? Also, are you using survival analysis (the weight for this may be different than for other analysis).

Hope this helps,
Olena

Actions #2

Updated by Irene Frageri about 2 months ago

Dear Olena,

thank you very much for your help.

I am conducting a descrete-time event history analysis. Basically it is a logistic regression in which the dependent variable is the occurrence of an event, in my case the conception of a child. I have a long format dataset in which conception=0 for every observation of the subject, except for the last one, in which it is =1 in case the subject has a child (and =0 if the person exits the observation with censoring). Therefore, to answer to your question, I am trying to represent a sample of individuals in their reproductive age (for now, childless individuals but maybe later I will extend to others as well). And regading survival analysis, yes I think the methodology I am using should fall under the umbrella of survival analysis.

I would also add another point to you which I forgot to mention yesterday: by now I was also using the psu and strata of last available weight.. I guess this is also not correct. could you also advice me on that, please?

Thank you very much again, your help is truly appreciated.

Best regards,

Irene

Actions #3

Updated by Olena Kaminska about 2 months ago

  • Assignee changed from Olena Kaminska to Alita Nandi

Irene,

Psu and strata can be used from any point of time - they don't change, so you can as well take them from the last wave. For a weight, it sounds like you are looking at longitudinal analysis, so you will need either an issue weight or longitudinal weight.
If your analysis deals with censoring (similar to survival) you may want to use it to correct for attrition instead of relying on weights for that part. Double check whether this is the case. If so, you can use an _lw or an issue (_li) weight from any (first wave) of your analysis, but from the same wave for everyone. No additional scaling needed.
There may be alternative ways of setting up your data and therefore weights, and I am forwarding this question to the appropriate team members who can give you a better advice on this.

Hope this helps,
Olena

Actions #4

Updated by Irene Frageri about 2 months ago

Dear Olena,

once again, thank you very much for your help and support. I think that yes, my analysis methodology should account for the censoring, because the individual is not present anymore after censoring (so I have a certain amount of observation-rows up till event of censoring). Is this that you are saying, right? That depending weather my methodology accounts for censoring, I might not need to use the weights to account for censoring.

But I am a bit confused: when I need to choose the longitudinal weights, I have both to identify the entry time and the exit time of my observed sample. In this sense, I don't undesrstand when you say "you can use an _lw or an issue (_li) weight from any (first wave) of your analysis, but from the same wave for everyone". So say my first wave of analysis is wave1.. what should I put for the last wave then? I believe if I put wave 14 I won't have a value for most of the people, which are those individual that have attrition and therefore exit the panel before wave14.

I hope I explain myself well enough.
Thank you also for asking the help of Alita. Looking forward to reach your suggestions.
A big thanks to you both.

Irene

Updated by Understanding Society User Support Team 28 days ago

Dear Irene,

We thought that you might find useful the exercise on weighting from our Introduction to Understanding Society workshop, particularly section 6.9 Longitudinal weights (attached in R and Stata). Please let us know if that helped to clarify the issue.

Best wishes,
UKHLS User Support team

Actions

Also available in: Atom PDF