Support #509

Using household identifier from xwaveid.dta to merge data across waves

Added by Dharmi Kapadia over 8 years ago. Updated over 8 years ago.

Start date:
% Done:



Please could you let me know if the following way is valid for merging household data into individual files. I am asking because I am aware that household identifier changes across waves if the household composition changes.

For my analysis, I want to have a measure of household income at Waves 1 through to 5. Therefore in order to do this, I brought the household identifiers (w_hidp) from xwaveid.dta for Waves 1 to 5, into the e_indresp.dta files using a m:1 merge. I then matched the household income from a_hhresp.dta to e_indresp.dta directly using a m:1 merge using a_hidp that I had previously brought in from the xwaveid.dta file. I repeated this for Waves 2 to 5. Is it correct to use the household identifier from the xwaveid.dta file, in this way? I thought it was, because even if the individual changes household over time, matching income from household files on the household id from a specific wave ensures that the observation is correct for that point in time.

The alternative method would be to merge household income into individual level files for each wave and then perform a 1:1 merge from individual files from Waves 1, 2, 3 and 4 into the Wave 5 file, matching on pidp. However, when I used this method, it resulted in 4,000 more cases of missing data for Wave 1.

Any advice that you can give on this issue would be much appreciated.

Many thanks,


Updated by Alita Nandi over 8 years ago

  • Assignee set to Dharmi Kapadia
  • % Done changed from 0 to 90

The reason for this difference is that in the second method you are only selecting individual adult respondents from each wave. For example, if someone was enumerated in a responding household in wave 1 then they would have a value for their household income. But if this person did not give us an adult individual interview, they would be missing from a_indresp file. Now suppose this person was interviewed in wave 5. By the second method you will exclude this person in the first step when you are mergin indresp with hhresp. However, if instead of indresp you used indall then the results should be identical to the first method.

By the way note that xwaveid and e_indresp files are both individual level files and so, you should be able to match these 1:1 using pidp. You don't need to specify a m:1 match.

Hope this answers your question.


Updated by Victoria Nolan over 8 years ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

Also available in: Atom PDF