Support #1903
openWeighting guidance for intergenerational analysis
100%
Description
Hi, I am doing research into the role of education in intergenerational income mobility, and I had some questions on tailoring my own weights.
My sample restrictions are individuals with atleast one income observation aged 25+, born between 1975-1995 with atleast one observed parental household income aged 0-16, restricted to only one observation per household (youngest eligible sibling) from the original and booster BHPS samples . Children are matched to their parents through the alterego file. My sample spans all 30 waves of the BHPS & USoc.
My main variables of interest are parental household income and offspring individual income, and the highest educational attainment of the offspring. To measure income, I take an average of income observations during aged 25+ and parental income observations aged 0-16, making it a pooled cross-section.
As my sample is already quite small based on these sample restrictions (N=1,392), I want to retain as many observations as possible. As my analysis is not longitudinal, the longitudinal weight provided is sub-optimal for me because I want to include respondents even if they haven’t responded in the most recent wave or have missed waves. I am not interested in change over time.
I have been reading the FAQs and other questions on the forum to help with tailoring my own weights, but I still had a few questions at the different stages of tailoring my own weights. I followed the advice from #658 and on the online course and weighting FAQ.
First, to create my base weight, I took the cross-sectional weight for everyone with atleast one income observation aged 25+, born between 1975-1995, as this is the sub-population.
- 1) Is it ok that the cross-sectional weights I take are USoc cross-sectional weights, despite me only including offspring from the BHPS samples?
- 2) Why do some people have a cross-sectional weight of zero in the BHPS and then a non-zero weight in USoc? If I’m trying to maximise my sample, can I take the first weight they have observed past the age of 25 that is non-zero?
However, my colleagues suggested that I should actually do the weighting based on parental characteristics. That is to say, the base sample should be all parents who responded in the original and booster samples, and then model the nonresponse based on the substantial attrition of how many offspring have an income observation aged 25+. My problem with this being that the base weight wouldn’t account for potential parents who went on to have children but left the survey beforehand. My questions are:
- 3) Is it ok to use people who answer the survey 25+ as the base weight, or would it be better to use people who had children from the original and booster samples as my base weight? What would the difference between these two be?
Finally, after rescaling the weights and taking mortality into account, I adjust for non-response. I fit a logistic regression model and tested the following predictors: sex, region, race, migrant status, quintile of income and education, and the wave they first responded over the age of 25. Variables of income, education and region are from the wave they are first observed over 25. When excluding the first wave they respond over 25, all other predictors are highly significant, but when including it, they are all less significant and sex is not significant.