Support #912: Duplicates of pid in the harmonized UKHLS/BHPS xwavedat - Understanding Society User Support

Actions

Copy link

Support #912

open

Duplicates of pid in the harmonized UKHLS/BHPS xwavedat

Added by Michael Baumkautner over 7 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Michael Baumkautner

Category:

Data inconsistency

Start date:

02/05/2018

% Done:

100%

Description

Dear User Support

I am trying to understand the identifier variables in the harmonized UKHLS/BHPS xwavedat.

1) Why are there records in xwavedat that have the same pid value?

2) And why do they have different pidp values?

See here:

use pid pidp using data\xwavedat, clear
recode pid (-8 = .)
keep if !missing(pid)
duplicates tag pid, gen(dup)

Thanks for your consideration

Actions

Copy link

Updated by Alita Nandi over 7 years ago

Category set to Data inconsistency
Status changed from New to Feedback
Assignee set to Michael Baumkautner
Target version set to X M
% Done changed from 0 to 50
Private changed from Yes to No

Hello Michael,

Thank you for identifying these cases. You are correct there are 11 pids with 2 observations. We have identified the source of the problem and it will be resolved in the next release. These cases are all cases from the BHPS samples. Here are a couple of suggestions:
(1) Delete these 11 cases
(2) A quick fix that we can suggest is to use the best information from these duplicate cases (N=11) and then keep just one observation. Here is the code:

use "\\usocdist0\restricted$\working\crosswave\crosswave_wave07_1\datasets\xwavedat", clear
recode pid (-8 = .)
keep if !missing(pid)
bys pid: g dup=_N
keep if dup==2

mvdecode all, mv(-21/-1)
ds pidp pid, not
global vlist `r(varlist)'
foreach var of global vlist {
bys pidp: egen y`var'=mean(`var')
g prob_`var'=1 if y_`var'~=`var' & `var'<. & y_`var'<.
}
// Check that the values of the same variables are the same across the two rows.
su prob_*
// You will find a few variables where the values from the two rows are different. This is because the BHPS and UKHLS versions were not harmonised.
// (*scend_dv *generation *j1soc90 *1soc90_cc *coh1m_dv coh1y_dv *lmar1y_dv). In these cases

// For the variables where either one of the rows is missing or both are the same do this:
foreach var of global vlist {
replace `var'=y_`var' if prob_`var'==.
}
// For the problem cases, decide on a rule and put that value for both rows.
// Next, keep one of the observations:
bys pid: keep if _n==1

Then append these cases to the original file, after first removing these 22 rows.

Please let me know if this does not work, or the response is not clear.

Best wishes,
Alita

Actions

Copy link

Updated by Stephanie Auty almost 7 years ago

Status changed from Feedback to Resolved
% Done changed from 50 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Understanding Society User Support

Custom queries

Support #912

Duplicates of pid in the harmonized UKHLS/BHPS xwavedat

Updated by Alita Nandi over 7 years ago

Updated by Stephanie Auty almost 7 years ago