Support #1316

birthyear of all first born child and vocational qualificaitons

Added by Maria Petrillo about 4 years ago. Updated over 2 years ago.

Start date:
% Done:



I am using both the Harmonized BHPS and the Understanding society for a study related to the motherhood wage gap. What I would need is the exact year of birth of the first child of each mother trying to reduce the missing values to the minimum. I tried the code that Alita provide in post #947 but this code, for some reason, does not work for the BHPS even if, of course, I change the name of the variables (for example mnpid to mnpid_bh). Do you have any suggestions on how I can proceed? Also, did I understand it correctly? Does pidp (indall) provides the child identifier? Does birthy (indall) provide the birth year of the child? and does mnpid provide the birth year of the mother?

Moreover, I saw that the BHPS provides a variable qfvoc related to the vocational background of the respondent. I was wondering:
1) does it refer to the highest qualification achieved? Or is it the variable just indicating that one of the qualifications achieved by the individual is a vocational qualification without referring to the highest qualification achieved?
2) is there any similar variable in Understanding society? if there is none is there any way I could derive this information? I noticed that not even the variable isced is included in US...

Thank you so much

Looking forward to hearing from you,



Updated by Stephanie Auty about 4 years ago

  • Status changed from New to Feedback
  • Assignee set to Maria Petrillo
  • % Done changed from 0 to 60

Dear Maria,

It is not necessary to change mnpid to mnpid_bh, as all individuals from BHPS now have a pidp in the harmonised data. What I do notice you will need to change in Alita's code is every time is says `w' you will need to write b`w'. Also, as the files are now downloaded into separate folders for each wave, you will either need to edit the file path to take account of this (using strpos) or putting all the files into one folder.

hiqual_dv is the best qualification variable to use and is available for all waves.

We have collected the data for isced in Wave 9, but it is not yet derived. You would be able to derive it yourself but all the information needed is not available in the waves before Wave 9.

Best wishes,


Updated by Stephanie Auty about 4 years ago

  • Private changed from Yes to No

Updated by Maria Petrillo about 4 years ago

Dear Stephanie
thank you for your quick reply. However, I have already tried to change Alina's code by adding b before `w' (and I have already put all the files into one folder), but when I run the do file I have a problem with the variable "bm_birthy". In particular, the commands generate a variable "bm_birthy" with all missing values. So i end up having no information on the year of birth of the mother.

For how it concern hiqual_dv I don't understand how to derive the vocational background from this variable. If I am not wrong hiqual_dv is a categorical variable with the following categories:
-Other higher
-A level etc
GSCE etc
- Other qual
-No qual

which does not tell me much about the vocational background. Ideally, I would like to derive a variable like qfvoc or isced in the HBPS.

Any suggestions will be very much appreciated.

Thank you,


Updated by Stephanie Auty almost 4 years ago

Dear Maria,

We have w_qfvoc1-16 in Understanding Society, which could be used to derive a variable similar to bw_qfvoc, but please check the question wording, universe and available responses are close enough for your purposes.

I'm not sure what could cause the problem you have with bw_birthy. Does it only happen in Wave 13?

Best wishes,


Updated by Maria Petrillo almost 4 years ago

Dear Stephanie,
thank you for your reply.
I tried to generate a variable from w_qfvoc1-16 and it worked. I was able to create a dummy variable equal to 1 if the individual has a vocational qualification, zero otherwise. Do you have any suggestions on which variable should be compared with this new generated variable in order to generate another variable that takes value one whether the HIGHEST qualification achieve by the individual is a vocational qualification rather than academic?

For how it concerns the year of birth for children, I don't think I can use the variable mnpid for the Harmonized BHPS since only mnpid_bh is available in indall.dta. However, while mnpid in the Understanding Society is the natural mother pidp, mnpid_bh (HBHPS) is the natural mother pid. The latter should be the personal identifier, while the pidp is the cross-wave identifier. I think this is the reason why Alina's code is not working with the harmonized version (My problem with bw_birthy was for all the waves).
So probably instead of using pidp w_birthy and mnpid as for the Understanding Society, for the Harmonized BHPS i should use pid w_birthy and mnpid_bh, doing the following (I tried to keep even the pidp variable so that I would be able to match the new dta with indiresp.dta.):

foreach w in a b c d e f g h i j k l m n o p q r {
use pidp pid b`w'_birthy b`w'_mnpid_bh using b`w'_indall, clear
save temp, replace
use temp, clear
keep pid b`w'_mnpid_bh
keep if b`w'_mnpid_bh>0 & b`w'_mnpid_bh<.
rename pid kpid
rename b`w'_mnpid_bh pid

merge m:1 pid using temp, keepus(b`w'_birthy pidp)
drop if _m==2
drop _m
rename b`w'_birthy b`w'_mbirthy
keep pid kpid b`w'_mbirthy
rename pid b`w'_mnpid_bh
rename kpid pid
merge 1:1 pid using temp
drop if _m==2
drop _m
rename b`w'_birthy b`w'_kbirthy
rename pid kpid
rename b`w'_mnpid_bh pid
save b`w'_mother_child, replace

Any comments are more than appreciated.

Thank you very much



Updated by Stephanie Auty almost 4 years ago

  • % Done changed from 60 to 70

Dear Maria,

You may want to compare your created variable or w_qfvoc1-16 with either w_hiqual_dv or w_qfhigh_dv, but you will need to investigate which vocational qualifications are a higher qualification than the academic qualifications.

I see now that the family identifiers have not yet been harmonised, so yes, using pid and bw_mnpid_bh is one solution. You could also use xwavedat (or indall within a wave) to merge the mother's pidp using the mother's pid, if you wanted to work with pidp.

Are you now getting results that make sense for the birth year?

Best wishes,


Updated by Understanding Society User Support Team over 2 years ago

  • Status changed from Feedback to Resolved
  • Assignee deleted (Maria Petrillo)
  • % Done changed from 70 to 100

Also available in: Atom PDF