Non-overlapping data query (waves a to c)

Hello Understanding Society team,

I am working on a project in which (amongst other variables) we are using parental occupation and parental education variables from waves a & b (specifically a_paedqf, a_maedqf, b_paedqf, b_maedqf, a_pasoc10_cc, a_masoc10_cc) to predict mental health outcomes at wave c (including c_sf12mcs_dv, c_scghq1_dv and c_big5n_dv). I have summed education data from across waves a and b to derive separate retrospective measures for mother's and father's education (i.e. ma_educ = a_maedqf + b_maedqf and pa_educ = a_paedqf + b_paedqf) and as a consequence have Ns of around 40,000 for mother's and father's education. The sample sizes for a_pasoc10_cc and a_masoc10_cc are smaller but still >25,000.

After cleaning the wave c data, I have usable mental health data for around 40,500 cohort members, which is also very healthy.

My problem arises, however, when I try to run the models and it would appear that there is a considerable mismatch/nonoverlap between the sample for parental education/occupation (at a&b) and mental health at wave c. For example, although I have usable data on mother's education for 40,541, this drops to around 27,500 who also appear to have usable wave c mental health data (similar mismatches occur for other mental health variables). It isn't clear to me where the nearly 13,000 data points have disappeared to.

Is there an obvious way to account for this mismatch that I have overlooked and does this map onto your records of the data? Are these missing data represented elsewhere? Was a large part of the sample who completed the main questionnaire at wave c distinct from those who completed wave a&b?

Many thanks for any help you can provide,
Emma Bridger

