Support #1372

Question about age of respondents based on age_dv, and racel_dv (Re-created as original was deleted)

Added by Alita Nandi 7 months ago. Updated 5 months ago.

Data inconsistency
Start date:
% Done:



Hello Alita,

I am wondering why the age_dv variable is not consistent with birthy when the latter is read side by side with the interview start year (istrtdaty). I randomly looked at missing cells for age_dv vs those with exactly similar pidp, i.e. same person/individual over time. There are instances that age_dv = istrtdaty - birthy. I checked the variables for birth month and birth date but these are missing. Why are there discrepancies like an individual is interviewed in consecutive waves but the year is unchanged, so his age stays the same; or there are cases like individuals refuse to reveal his birth year in a previous year but this information is filled in later waves, that first year of interview still represents a missing cell for birthy and age_dv? Kindly advise whether it makes sense to base age_dv on birthy and istrtdaty. Thanks so much

I am wondering whether race_bh could be combined with racel_dv for BHPS waves 1-12, as racel_bh covers BHPS wave 13-18. Does it make sense to assume that this is a time-invariant characteristic for individuals? I noticed that this is only asked once for new entrants. Can then this be carried over as non-missing info for succeeding waves, or this is simply counted once for each respondent? Note that I did use racel_bh to be combined with racel_dv when generating a separate new variable for use in Stata, so that original ethnicity variables are preserved.


Updated by Alita Nandi 7 months ago

  • Private changed from Yes to No

Updated by Alita Nandi 7 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

Hello Abgail,

I am passing on your query about age_dv to the data team and will get back to you wth their response.

racel_dv in the file xwavedat combines responses to race and racel reported in BHPS and racel* reported in UKHLS. These questions are asked only once, first time someone is interviewed, and that information is used to created racel_dv.

Best wishes,
On behalf of Understanding Society User Support Team


Updated by Alita Nandi 7 months ago

Hi Abigail,

Have you read the variable note and the input variables mentioned on the variable page?

Best wishes,


Updated by Abigail Dumalus 7 months ago

Hello Alita,

I have read this variable note. But one of the input variables, dob_dv, is based on birthy. That's why I am asking whether age_dv = istrtdaty - birthy. In most cases, dob_dv is missing, whilst birthy has non-missing entries. I would assume that birthy is the source variable for dob_dv, a derived variable. Another input variable for age_dv is intdat_dv. Similar to dob_dv, intdat_dv has many missing cases/cells, but istrtdaty has non-missing information to which I can refer. So, if i refer to istrtdaty, birthy, and age_dv, there are many instances wherein the difference between the first two do not result in the values indicated for age_dv. I tried looking at the variables dobm_dv and dob_dv, but most are missing or inapplicable entries. What should i do if there are discrepancies under age_dv when looking at individuals being interviewed over many waves and the their ages "evolve" through time do not make logical sense?


Updated by Alita Nandi 7 months ago

  • % Done changed from 30 to 60

For the 9 wave of UKHLS data:

The variable w_birthy was checked for consistency across the waves. After resolving the inconsistencies, the variable doby_dv was created.

The interview date, w_istrtdat? is only available for adult interviews. This information was imputed for children and adult non-respondents from their household interview dates and recorded in w_intdaty_dv. See the variable note for this variable for details on how it was created.

Then using dob?_dv and intdat?_dv the variable w_age_dv was created. Here ? represents y m d - year, month & day. The year of birth variables are available in levels of data, while month of birth is available in SL version & Secure access version, while day of birth is only available in Secure version of the data. See
The imputation flag for this variable is w_age_if

For the 18 BHPS waves, the variable bw_age_dv was computed using the interview date variables (bw_intdat?) and date of birth variables (bw_birth?). But unlike for the UKHLS waves, bw_birth? were not checked for consistency across the waves. So, you will find a handful of cases where a respondent's age has decreased over the waves. If you want to discuss how to resolve these cases please email us at and we will forward your query to Gundi.


Updated by Alita Nandi 5 months ago

  • % Done changed from 60 to 90

Updated by Alita Nandi 5 months ago

  • Status changed from In Progress to Feedback

Also available in: Atom PDF