Support #21
closedMerging household and individual data set-wave1, 2009-2010
50%
Description
I want to merge the data set into one file. I started first with merging all household files. Let’s say I want to merge a_hhsamp with the a_hhresp. I suppose to keep (1 3) of the resulting merge, however I had only merge= 2 3, or I have exactly the number of the first household file in 2, which means the two files were not merged.
I try with distributing household level information to the individual level, where I am using the a_hidp as identifier and follow the example you gave in the documents. Now my merge is fine, but by keeping merge 1 and 3 my sample size increase dramatically and I had duplicate observations.
Next I continue with the I individual files, where I am using a_hidp and a_pno as unique identifier in order to match correctly individual files, however again the resulting merge is not fine.
Could you advise me please how to deal with matching the files? Do you have some users do files which would help us to combine all the data sets from the wave 1, 2009-2010?
Many thanks
Anita
Updated by Redmine Admin almost 14 years ago
- Category set to Data analysis
- Status changed from New to In Progress
- Assignee set to Redmine Admin
- % Done changed from 0 to 50
Anita,
I have tried to reconstruct your example here:
use a_hidp using a_hhresp,clear
merge 1:1 a_hidp using a_hhsamp,keepus(a_ivfho_dv)
table a_ivfho_dv _m,row col
-----------------------------------------------------------------------------------------
| _merge
household response outcome | using only (2) matched (3) Total
-----------------------------------------+-----------------------------------------------
f2f - all eligible hh intv | 21,694 21,694
f2f - interviews + proxies | 2,630 2,630
f2f - interviews + refusal | 5,708 5,708
hh comp + ques only | 137 137
lost capi interview | 21 21
demolished/derelict | 605 605
building not complete | 133 133
institution, not private hh | 198 198
no hh member contact | 2,240 2,240
unable to locate address | 201 201
contact made but not with correct people | 526 526
unknown eligibility | 483 483
other non-contact | 3,121 3,121
refus to rsrch cntre | 976 976
refusal to intviewer | 17,183 17,183
language problems | 531 531
other ineligible | 38,921 38,921
|
Total | 65,139 30,169 95,308
-----------------------------------------------------------------------------------------
The master data set is a_hhresp.dta, the using data set is a_hhsamp.
The households that match (_m==3) are those with a productive interview outcome, while the unmatched households are those with unproductive outcomes (_m==2).
This fits with the description of a_hhsamp as the data file with data on all enumerated households and a_hhresp for all responding households.
If we had chosen to open a_hhsamp first and then merged it to a_hhresp, the results would have been the same except for the _merge variable would have had the values 1 and 3 instead.
Next I continue with the I individual files, where I am using a_hidp and a_pno as unique identifier in order to match
correctly individual files, however again the resulting merge is not fine.
You can use pidp as the personal identifier on all individual level data files.
Do you have a specific example here?
Some more general advice...
The data are released in a set of data files that allows users to construct working data sets for a multitude of purposes. Due to the relative complex data structure, we recommend that you study the questionnaires and online data documentation and select the variables you need for a given study purpose. In that way, the working data sets remain of a manageable size and there should also be less scope for confusing variables with similar names but different meaning on from different files.
See also free course materials from some of our training courses or news of forthcoming training courses
Hth
Jakob Petersen
Updated by Redmine Admin over 13 years ago
- Status changed from In Progress to Closed