Support #1298
openMatching youth data to parent data in Understanding Society
100%
Description
Dear colleague,
I wonder if you can help me with the above. I have used the USS before and merged and matched waves and, following your online course, matched adults in a household. I am now experimenting with matching individuals from the youth file to, say, their mothers and whist I can match the files I end up with very small matched samples and wonder if I am doing something silly.
To illustrate based on some reduced files - I have also been saving replacing the files as I go to check each step - as I have learned syntax as I address specific projects
If I create a 'mum' file with a few variables
use "C:\ukhls_w8\h_indresp.dta"
keep if h_sex==2
save "C:\ukhls_w8\mum data.dta",replace
use "C:\ukhls_w8\mum data.dta"
keep pidp h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage
drop if h_mnspid==-8
rename (h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage) (msex mhidp mscsf1 mnspid mpno mchildpno mintdaty_dv dvage)
save "C:\ukhls_w8\mum data.dta",replace
I then created a youth file with a couple of variables in
use "C:\ukhls_w8\h_youth.dta"
keep pidp h_mnspid h_ypsrhlth h_hidp h_dvage
drop if h_mnspid==-8
rename (h_mnspid h_ypsrhlth h_hidp h_dvage) (mnspid yypsrhlth yhidp ydvage)
save "C:\ukhls_w8\youth data.dta",replace
I have then tried an m:1 merge
Use C:\ukhls_w8\youth data.dta"
merge m:1 mnspid using "C:\ukhls_w8\mum data.dta"
save "C:\ukhls_w8\Total W8.dta",replace
I get the message
variable mnspid does not uniquely identify observations in the using data
So, I checked the duplicates in the mum file and I get
duplicates report, mnspid
Duplicates in terms of all variables
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 2738 0
--------------------------------------
and in the youth file I get
Duplicates in terms of all variables
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 3174 0
--------------------------------------
So there doesn't seem to be an issue with duplicates but, if I repeat the steps above and this time remove the duplicates by force
e.g. in the mum file
. duplicates drop mnspid, force
Duplicates in terms of mnspid
(413 observations deleted)
and I leave any duplicates in the youth file as I assume it makes sense that there are duplicates of mnspid as a parent can be shared.
If I now follow the m:1 merge above I get
Result # of obs.
-----------------------------------------
not matched 4,458
from master 2,598 (_merge==1)
from using 1,860 (_merge==2)
matched 576 (_merge==3)
-----------------------------------------
This seems to be a very small subset of cases. Am I doing the right thing here? Any help would be greatly appreciated. My plan was to do this for each wave and then append the waves.
Thank you.
Paul.