



Support #1298


Matching youth data to parent data in Understanding Society

Added by Paul Downward almost 5 years ago. Updated almost 5 years ago.

Start date:
% Done:



Dear colleague,
I wonder if you can help me with the above. I have used the USS before and merged and matched waves and, following your online course, matched adults in a household. I am now experimenting with matching individuals from the youth file to, say, their mothers and whist I can match the files I end up with very small matched samples and wonder if I am doing something silly.

To illustrate based on some reduced files - I have also been saving replacing the files as I go to check each step - as I have learned syntax as I address specific projects

If I create a 'mum' file with a few variables
use "C:\ukhls_w8\h_indresp.dta"
keep if h_sex==2
save "C:\ukhls_w8\mum data.dta",replace
use "C:\ukhls_w8\mum data.dta"
keep pidp h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage
drop if h_mnspid==-8
rename (h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage) (msex mhidp mscsf1 mnspid mpno mchildpno mintdaty_dv dvage)
save "C:\ukhls_w8\mum data.dta",replace

I then created a youth file with a couple of variables in

use "C:\ukhls_w8\h_youth.dta"
keep pidp h_mnspid h_ypsrhlth h_hidp h_dvage
drop if h_mnspid==-8
rename (h_mnspid h_ypsrhlth h_hidp h_dvage) (mnspid yypsrhlth yhidp ydvage)
save "C:\ukhls_w8\youth data.dta",replace

I have then tried an m:1 merge
Use C:\ukhls_w8\youth data.dta"
merge m:1 mnspid using "C:\ukhls_w8\mum data.dta"
save "C:\ukhls_w8\Total W8.dta",replace

I get the message
variable mnspid does not uniquely identify observations in the using data

So, I checked the duplicates in the mum file and I get

duplicates report, mnspid

Duplicates in terms of all variables

copies | observations surplus
1 | 2738 0

and in the youth file I get

Duplicates in terms of all variables

copies | observations surplus
1 | 3174 0

So there doesn't seem to be an issue with duplicates but, if I repeat the steps above and this time remove the duplicates by force

e.g. in the mum file
. duplicates drop mnspid, force

Duplicates in terms of mnspid

(413 observations deleted)

and I leave any duplicates in the youth file as I assume it makes sense that there are duplicates of mnspid as a parent can be shared.

If I now follow the m:1 merge above I get

Result                           # of obs.
not matched 4,458
from master 2,598 (_merge==1)
from using 1,860 (_merge==2)
matched                               576  (_merge==3)

This seems to be a very small subset of cases. Am I doing the right thing here? Any help would be greatly appreciated. My plan was to do this for each wave and then append the waves.

Thank you.



Also available in: Atom PDF