Support #1348


Added by Laura Silva almost 4 years ago. Updated 6 months ago.

Data management
Start date:
% Done:



I am merging info from the child and the indresp datafiles of wave 9 (and I have to do it for all the other wavs).

As a first step I did the following:

use i_child.dta
merge m:m i_hidp using i_indall
drop _merge
merge m:m i_hidp using i_indresp

I end up with about 1,081 observation not matched from the master. Doesn’t it mean means that there are observation in the indall and child files which are not in the indresp? Is it possible?

More generally it is ok to proceed this way to put together info from the indresp and child datafiles?


Updated by Annette Pasotti almost 4 years ago

  • % Done changed from 0 to 10
  • Private changed from Yes to No

Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can.

We aim to respond to simple queries within 48 hours and more complex issues within 7 working days. While we will aim to keep to this response times due to the current coronavirus (COVID-19) related situation it may take us longer to respond.

Best wishes,



Updated by Annette Pasotti almost 4 years ago

  • Status changed from New to Feedback
  • Assignee set to Laura Silva
  • % Done changed from 10 to 80

Dear Laura

Cases in the child data file are a subset of the indall file (i.e. all children in enumerated households), and the cases in the indresp file are a subset of the indall files as well, i.e. all adults who participated in an interview. Moreover, not all households have children (and, in principle it is possible that not all households with children have adult interviews). Thus, we would expect the number of cases to drop when we merge all three data files on w_hidp keeping only the matches.

We’d also point out that it is unorthodox to link individual level data such as indall, child and indresp on the household identifier. We wondered whether you wanted to add the parents’ information to the child’s record or vice versa?

If you haven't been on our online courses we are running an Introduction to Understanding Society next week 20-21 May and again in November, sign up here Our online moodle is also available here

You can also browse previous Q&As in the user forum as well and read the FAQs on our website

We also have information on our new online user guide We’re always pleased to receive feedback on our user support materials. If you would like to send feedback on the new user guide, please email us at .

I hope this helps.

Best wishes



Updated by Laura Silva almost 4 years ago

Hi Annette,

thank you very much for the reply.

Just to be sure, even if unorthodox (why though?) is it correct? Which is the alternative in order to match info about parenting styles (eg i_homewk415, Parent helps their children with homework) which are available in the child file with parents?

Best wishes,



Updated by Laura Silva almost 4 years ago

Moreover, I forget to say:

if I start from the child file, after I merge with the indall I have unmatched observations from the using, and that makes perfect sense to me (households without children I guess). The problem I fail to understand is when, after keeping only the matched, I merge with the indresp: I understand to have many not-matched from the using (individuals with no children) but I do not understand why I have 651 not matched from the master. Are they children without any matching parent?

Apologies for the split message.

Best regards,



Updated by Annette Pasotti almost 4 years ago

  • Assignee changed from Laura Silva to Gundi Knies

Dear Laura

Thank you for your reply. I am referring you to my colleague Gundi who will be able to explore this further with you.

Best wishes.



Updated by Gundi Knies almost 4 years ago

  • Category set to Data analysis
  • Assignee changed from Gundi Knies to Laura Silva
  • Target version set to X M


if you look at the interview outcome variable _ivfio or _hhresp_dv, you can see that there are households in _indall data file that do not have any adult interviews, and some of them are households with members that are below age 16 and will therefore also be in the _child data file. That is why you have mismatches when merging _child, _indall, _indreasp data files.

Merging individual level records on _hidp is unorthodox because _hidp does not uniquely describe the rows in either data file.
It is not 'wrong'; many routes lead to the same results just osme seem more efficient than others. In our training course we have a worked example for matching spouses' records using the spouse identifier, and on this forum we have described how to match child and parent records.

For example, when we want to merge child records with their biological mothers's record, we'd first select only children who have a biogical mother in the household (i.e. non-missing mnpid). Then rename the child pidp and other variables so they have a child prefix (e.g., ch), then rename the mnpid to pidp, then merge on pidp using the _indresp data file. This way it is very clear who the mismatches are: children (with biological mothers in the hh) whose biological mothers did not provide an interview. All variables that have the ch prefix refer to the child, all other variables refer to the biological mother.



Updated by Laura Silva almost 4 years ago

Hi Gundi,

thank you very much for your tremendous help, really appreciated.

Have an awesome day,



Updated by Gundi Knies almost 4 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 80 to 100

You are welcome!


Updated by Understanding Society User Support Team over 2 years ago

  • Assignee deleted (Laura Silva)

Updated by Understanding Society User Support Team 6 months ago

  • Category changed from Data analysis to Data management

Also available in: Atom PDF