Project

General

Profile

Support #1298

Matching youth data to parent data in Understanding Society

Added by Paul Downward about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
01/11/2020
% Done:

100%


Description

Dear colleague,
I wonder if you can help me with the above. I have used the USS before and merged and matched waves and, following your online course, matched adults in a household. I am now experimenting with matching individuals from the youth file to, say, their mothers and whist I can match the files I end up with very small matched samples and wonder if I am doing something silly.

To illustrate based on some reduced files - I have also been saving replacing the files as I go to check each step - as I have learned syntax as I address specific projects

If I create a 'mum' file with a few variables
use "C:\ukhls_w8\h_indresp.dta"
keep if h_sex==2
save "C:\ukhls_w8\mum data.dta",replace
use "C:\ukhls_w8\mum data.dta"
keep pidp h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage
drop if h_mnspid==-8
rename (h_sex h_hidp h_scsf1 h_mnspid h_pno h_childpno h_intdaty_dv h_dvage) (msex mhidp mscsf1 mnspid mpno mchildpno mintdaty_dv dvage)
save "C:\ukhls_w8\mum data.dta",replace

I then created a youth file with a couple of variables in

use "C:\ukhls_w8\h_youth.dta"
keep pidp h_mnspid h_ypsrhlth h_hidp h_dvage
drop if h_mnspid==-8
rename (h_mnspid h_ypsrhlth h_hidp h_dvage) (mnspid yypsrhlth yhidp ydvage)
save "C:\ukhls_w8\youth data.dta",replace

I have then tried an m:1 merge
Use C:\ukhls_w8\youth data.dta"
merge m:1 mnspid using "C:\ukhls_w8\mum data.dta"
save "C:\ukhls_w8\Total W8.dta",replace

I get the message
variable mnspid does not uniquely identify observations in the using data

So, I checked the duplicates in the mum file and I get

duplicates report, mnspid

Duplicates in terms of all variables

--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 2738 0
--------------------------------------

and in the youth file I get

Duplicates in terms of all variables

--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 3174 0
--------------------------------------

So there doesn't seem to be an issue with duplicates but, if I repeat the steps above and this time remove the duplicates by force

e.g. in the mum file
. duplicates drop mnspid, force

Duplicates in terms of mnspid

(413 observations deleted)

and I leave any duplicates in the youth file as I assume it makes sense that there are duplicates of mnspid as a parent can be shared.

If I now follow the m:1 merge above I get

Result                           # of obs.
-----------------------------------------
not matched 4,458
from master 2,598 (_merge==1)
from using 1,860 (_merge==2)
matched                               576  (_merge==3)
-----------------------------------------

This seems to be a very small subset of cases. Am I doing the right thing here? Any help would be greatly appreciated. My plan was to do this for each wave and then append the waves.

Thank you.

Paul.

Also available in: Atom PDF