Project

General

Profile

Support #968 ยป example code for annual employment history data.do

Stephanie Auty, 05/10/2018 11:25 AM

 

**NEW ONE: BIOMARKERS AS BASELINE.

*Now for the BHPS people. Remember, activity as of nurse visit itself will be in the loop from d, not from the c-wave information.

use "D:\USoc Data\aa_USoc W1-5_Updated\d_indresp.dta", clear

*generate a PRESENT variable:
generate d_PRESENT=1

*************************************************************************
*************************************************************************

*Drop all the new UKHLS people:
drop if (d_hhorig==1 | d_hhorig==2 | d_hhorig==7)

*************************************************************************
*************************************************************************

*OK - now start looping, but how you would do it now having solved those mysteries.

set more off

*keep pid pidp d_ioutcome d_indmode d_ivfio d_jbhas d_jboff d_ff_jbsemp d_jbsempchk d_jlsemp d_j2semp d_jbsemp d_ff_ivlolw d_ff_emplw d_indin91_lw d_indin01_lw d_month d_istrtdatd d_istrtdatm d_istrtdaty d_hhorig d_jbstat d_ff_jbstat d_notempchk- d_statendy410 d_ff_ivlolw d_ff_everint d_jbbgd d_jbbgm d_jbbgy d_age_cr d_dvage d_jbft_dv

*Recode the economic status variable so is exactly comparable to those in the BHPS:
recode d_jbstat 97=10
label variable d_jbstat "USoc wave 2 economic status, 40 unpaid work merged with other"
label define d_jbstatlab -7"proxy respondent" 1"self-employed" 2"employed" 3"unemployed" 4"retired" 5"maternity leave" 6"family care" 7"ft study, school" ///
8"lt sick, disabled" 9"govt training scheme" 10"other"
label values d_jbstat d_jbstatlab

***********************************************************

*Make an ALTERNATIVE VERSION with a more narrowly defined comparison group of full-time employees only, using the d_jbft_dv variable:
generate d_ALT_jbstat=d_jbstat
label variable d_ALT_jbstat "USoc wave 2 economic status, NARROW BASELINE, 40 unpaid work-->other"
recode d_ALT_jbstat 1=11
recode d_ALT_jbstat 2=12 if d_jbft_dv==2

label define d_ALT_jbstatlab -7"proxy respondent" 2"FT_Employed" 3"unemployed" 4"retired" 5"maternity leave" 6"family care" 7"ft study, school" ///
8"lt sick, disabled" 9"govt training scheme" 10"other" 11"self-employed" 12"PT_Employed"
label values d_ALT_jbstat d_ALT_jbstatlab
tab d_ALT_jbstat

************************************************************************************************************************************

*SORT OUT THE INFORMATION ON PAST EMPLOYMENT SPELLS.

*Now start to investigate the reports of past employment, starting with the 'fed-forward' one.

********IMPORTANT******: For the ~15% of people with either 'missing'(-9) or 'proxy'(-7) for fed-forward employment status, they were not sent
**through this route, as can be seen by <0 values for all the subsequent questions in this module:
tab d_empchk if d_ff_jbstat<0
tab d_notempchk if d_ff_jbstat<0
tab d_nxtst if d_ff_jbstat<0
tab d_nextstat1 if d_ff_jbstat<0

****************************

*These two items started the routes:
*For people employed at last interview, check whether still doing the same work activity:
tab d_empchk
tab d_empchk, nol
*list d_empstendd d_empstendm d_empstendy4 if d_empchk==2

*For people not employed at last interview, check whether still doing the same non-work activity:
tab d_notempchk
tab d_notempchk, nol
*list d_empstendd d_empstendm d_empstendy4 if d_notempchk==2

*For 944 people, they were asked both:
count if (d_empchk==1 | d_empchk==2) & (d_notempchk==1 | d_notempchk==2)
*For this whole group, their fed-forward status shows NONE WERE EMPLOYED so for these the non-missing d_empchk values are the errors?
*UPDATE, MARCH 2016: NO, BECAUSE PEOPLE SENT THROUGH BOTH LOOPS IF THEY HAD LAST WAVE ANSWERED JBHAS or JBOFF in a way which indicated they were working, but jbstat said otherwise.

*Flag them:
gen d_2Loops_flag=0
replace d_2Loops_flag=1 if (d_empchk==1 | d_empchk==2) & (d_notempchk==1 | d_notempchk==2)
label variable d_2Loops_flag "Sent through both empchk and notempchk loops"


*FIRST RECODE DECISION:

*UNLESS YOU WANT TO ALLOW SIMULTANEOUS ACTIVITIES, WILL NEED TO LET ONE OF THESE (I.E., JBSTAT AT PREVIOUS WAVE) TAKE PRECEDENCE.
tab d_ff_jbstat if (d_empchk==1 | d_empchk==2) & (d_notempchk==1 | d_notempchk==2)
*RECODE THEM:
recode d_empchk 1=-8 2=-8 if (d_empchk==1 | d_empchk==2) & (d_notempchk==1 | d_notempchk==2)

*****************************************************************************************************************************************
*****************************************************************************************************************************************

*The end dates of that fed-forward activity are stored here:
tab d_empstendd
tab d_empstendm
tab d_empstendy4

*Make ym_ and mdy_ variables for the end date of the fed-forward activity.
*Use ym(d_empstendy4, d_empstendm) for non-work activities, and ym(d_nxtjbendy4, d_nxtjbendm) for non-work activities
generate d_ym_ffact_enddate=ym(d_empstendy4, d_empstendm)

format d_ym_ffact_enddate %tm
generate d_mdy_ffact_enddate=dofm(d_ym_ffact_enddate)
format d_mdy_ffact_enddate %td
*Check it:
list d_ym_ffact_enddate d_mdy_ffact_enddate if d_ym_ffact_enddate!=.
*Yep

*************************

**OK - moving onto the FIRST NEW ACTIVITY, i.e. the FIRST THAT STARTED AFTER THE LAST INTERVIEW:

*Then this one: 'Immediately following that period of [ff_JBSTAT] {if NotEmpChk=2} / job {if EmpChk=2}, did you have a period of paid work
*or did you do something else?
tab d_nxtst
*For those who answered 'something else', there is the following question:
tab d_nxtstelse
*Check it adds up:
*tab d_nxtst if d_nxtstelse>0
*Yes it does
*Then this, for those who answered yes:
*Current non-employment status: 'Has this period of being [NxtStElse] ended or is this what you are doing now?'
tab d_nxtstelse d_cstat
*Current employment status: Cjob. Current job indicator: 'Was that {if NxtSt = 1} / your {if JbSamR = 2 | SameJob = 2} next job your current job?'
tab d_nxtst d_cjob


*TO DISTINGUISH BEWTEEN EMPLOYMENT AND SELF-EMPLOYMENT, THIS VAR WAS USED:
tab d_nxtjbes d_nxtst
tab d_nxtjbes, nol
*But apparantly this doesn't actually match up.
tab d_nxtjbes if d_nxtst==1
tab d_nxtjbes if d_nxtst==2
*DEC 1st: have identified from the small print in the interview script that the question was only deemed applicable if this d_nxtst hadn't finished.
*Can confirm this by tabbing the currentness variable for these people:
tab d_nxtjbes if d_nxtst==1 & d_cjob==2
tab d_nxtjbes if d_nxtst==1 & d_cjob==1
*So, for the group with d_nxtst==1 & d_cjob==1, need to look for whether curent job is self-employed.
tab d_jbsemp if d_nxtst==1 & d_cjob==1
*OK, that gives us a value for all but 20 of the 372.
*For the remaining 34, we have the contradiction that they said the first (work) activity following the fed-forward one was still current
*(i.e. d_cjob==1), yet also said they had not done any paid work in the last week ie. d_jbhas==2, so the question about whether current work
*is employment or self-employment wasn't asked:
tab d_jbhas if d_jbsemp<0 & d_nxtst==1 & d_cjob==1
tab d_jboff if d_jbsemp<0 & d_nxtst==1 & d_cjob==1
*flag them for deletion?

generate W2_EXCL_flag_flag=0
recode W2_EXCL_flag_flag 0=1 if d_jbsemp<0 & d_nxtst==1 & d_cjob==1
*Nb: there are actually a further 100 with this contradiction which may need to be W2_EXCL_flaguded later if they can't be sorted out:
tab d_jbsemp if d_jbhas==2 & d_nxtst==1 & d_cjob==1

**********************
**COMBINE ALL THE INFO INTO A SINGLE MORE USEFUL VARIABLE,
*applying the standard labelling WITH A NEW VALUE:
label define d_PastYearACTIVITY0lab -8 "inapplicable" -7"proxy respondent" 1"self-employed" 2"employed" 3"unemployed" 4"retired" 5"maternity leave" 6"family care" 7"ft study, school" ///
8"lt sick, disabled" 9"govt training scheme" 10"other" 12"unclear whether SE/E"

generate d_PastYearACTIVITY0=.
replace d_PastYearACTIVITY0=-8 if d_nxtst==-8
replace d_PastYearACTIVITY0=-7 if d_nxtst==-7
*Split the work spells into self-employment and employment, according to the information in the auxillary variables:
replace d_PastYearACTIVITY0=1 if d_nxtst==1 & (d_nxtjbes==2 | d_jbsemp==2)
replace d_PastYearACTIVITY0=2 if d_nxtst==1 & (d_nxtjbes==1 | d_jbsemp==1)
*Then for the non-employment ones:
replace d_PastYearACTIVITY0=3 if d_nxtstelse==1
*4 for retired:
replace d_PastYearACTIVITY0=4 if d_nxtstelse==2
*5 for mat/pat leave:
replace d_PastYearACTIVITY0=5 if d_nxtstelse==3
*6 for family care:
replace d_PastYearACTIVITY0=6 if d_nxtstelse==4
*7 for ft study, school
replace d_PastYearACTIVITY0=7 if d_nxtstelse==5
*8 for lt sick, disabled
replace d_PastYearACTIVITY0=8 if d_nxtstelse==6
*9 for gov't training scheme:
replace d_PastYearACTIVITY0=9 if d_nxtstelse==7
*10 for other
replace d_PastYearACTIVITY0=10 if d_nxtstelse==8
*UPDATE, MARCH 30TH:
*12 FOR THE SE OR E ONES:
replace d_PastYearACTIVITY0=12 if d_nxtst==1 & W2_EXCL_flag_flag==1
*And then label it:
label values d_PastYearACTIVITY0 d_PastYearACTIVITY0lab
tab d_PastYearACTIVITY0

*Check the dodgy ones:
tab d_PastYearACTIVITY0 if d_jbsemp<0 & d_nxtst==1 & d_cjob==1, missing
*And that it's correct for the self-employed vs. employed:
tab d_PastYearACTIVITY0 d_nxtjbes if d_nxtst==1 & d_cjob==2
tab d_PastYearACTIVITY0 d_jbsemp if d_nxtst==1 & d_cjob==1
*Good

*Information on end of these spells is stored in TWO SETS OF VARIABLES for work and non-work activities separately.

*For employment spells:
tab d_nxtjbendd
tab d_nxtjbendm
tab d_nxtjbendy4

*For non-employment spells:
tab d_nxtstendd
tab d_nxtstendm
tab d_nxtstendy4

*Look at them together:
tab d_nxtjbendd d_nxtstendd if d_nxtst==1
tab d_nxtjbendd d_nxtstendd if d_nxtst==2

tab d_nxtjbendm d_nxtstendm if d_nxtst==1
tab d_nxtjbendm d_nxtstendm if d_nxtst==2

tab d_nxtjbendy4 d_nxtstendy4 if d_nxtst==1
tab d_nxtjbendy4 d_nxtstendy4 if d_nxtst==2

*Yep.

*TO KEEP SANE LATER, MAKE A NEW VARIABLE SET WITH THE RELEVANT INFO, REGARDLESS OF ACTIVITY TYPE, WITH NAMES CONSISTENT WITH EQUIVALENT VARIABLES FOR ACTIVITIES LATER IN THE SEQUENCE.
gen d_statendd0=.
replace d_statendd0=d_nxtjbendd if d_nxtst==1
replace d_statendd0=d_nxtstendd if d_nxtst==2
tab d_statendd0

gen d_statendm0=.
replace d_statendm0=d_nxtjbendm if d_nxtst==1
replace d_statendm0=d_nxtstendm if d_nxtst==2
tab d_statendm0

gen d_statendy40=.
replace d_statendy40=d_nxtjbendy if d_nxtst==1
replace d_statendy40=d_nxtstendy if d_nxtst==2
tab d_statendy40

*NB: now that correct info is being subbed in, there is no longer the problem bunch which have 'inapplicable' for end dates, despite being 'ended' (i.e. d_cjob==2 | d_cstat==1).
tab d_statendy40 if d_cjob==2 | d_cstat==1

*Make ym format variable? No, THAT HAPPENS LATER IN A GENERAL LOOP.

*NB: While currentness for this activity was stored in the following pair of marker variable, one for work spells and one for non-work spells -
*NOTE THAT THE CODING IS REVERSED:
*For this first work spell, 1=CURRENT and 2=ENDED, but for the non-work spells, 1=ENDED and 2=STILL CURRENT:
*Currentness of first next work spell:
tab d_cjob
tab d_cjob, nol
*Currentness of first next non-work spell:
tab d_cstat
tab d_cstat, nol

*TO KEEP SANE LATER, RENAME THESE VARIABLES TO HAVE NAMES CONSISTENT WITH EQUIVALENT VARIABLES FOR ACTIVITIES LATER IN THE SEQUENCE.
*Those have the same reversed coding for work v. non-work spells, so no need to recode anything, just rename.
rename d_cjob d_currjob0
rename d_cstat d_currstat0

**************************************************************************************************************************************

*Then we move onto PAST SPELLS AFTER THAT - i.e. ones AFTER THE ONE AFTER THE ONE WHICH WAS FED-FORWARD.

*USING THE ABOVE QUESTIONS MULTIPLE TIMES IN LOOPS, THEY CONSTRUCTED THE COMPLETE SEQUENCE AND CONTAINED IT IN THE FOLLOWING VARIABLES:
*TWO sets of MARKER variables for whether a certain spell has ended or is current, but with CODING REVERSED for nwork vs. non-work activities.
*For d_currjob`num', 1=CURRENT and 2=ENDED,
*For d_currstat`num' 1=ENDED and 2=CURRENT.

*For employment spells:
foreach num of numlist 1/6 {
tab d_currjob`num'
tab d_currjob`num', nol
}
*For non-employment spells:
foreach num of numlist 1/6 {
tab d_currstat`num'
tab d_currstat`num' , nol
}
*These mark which is the LAST IN THE SEQUENCE.

*Two more sets of variables contain what the TYPE of each activity was:
*This set contains whether each spell was paid employment:
foreach num of numlist 1/6 {
tab d_nextstat`num'
tab d_nextstat`num', nol
}
*While this one breaks down the cases which weren't into types:
foreach num of numlist 1/6 {
tab d_nextelse`num'
tab d_nextelse`num', nol
}
*Check the relation between them with the following:
foreach num of numlist 1/6 {
tab d_nextelse`num' if d_nextstat`num'==1
tab d_nextelse`num' if d_nextstat`num'==2
}
foreach num of numlist 1/6 {
tab d_nextelse`num' if d_nextstat`num'==-8
}

*yep - all the options for d_nextstat`num' which are >0 are just a subdivisions for the people who said they WEREN'T in paid employment in the
*previous question.

*To see whether an employment spell was empl or self-empl, there was this:
foreach num of numlist 1/6 {
tab d_nextjob`num' d_nextstat`num'
}
*These all actually match up as they should (unlike the equivalent pair for the first spell after the fed-forward one),
*so could use this to code into 1 or 2, separating out the two.

*AGGREGATE this information into a single set of variables, and apply the SAME LABELLING AS ELSEWHERE IN THE DATASET.
***IMPORTANT: THERE DOES NOT APPEAR TO BE A SET OF VARIABLES DERIVED FROM d_nxtjbes`num', TO DISTINGUISH BEWTEEN EMPLOYMENT AND SELF-EMPLOYMENT.
**Have submitted a query about this with the USoc team; if there really isn't then will have to merge those into one group for this part of the
*histories only. In the meantime, miss out the 2 group and redefine 1 as "EMPLOYED OR SELF-EMPLOYED":

label define PastYearACTIVITYlab -8"inapplicable" -7"proxy respondent" 1"self-employed" 2"employed" 3"unemployed" 4"retired" 5"maternity leave" ///
6"family care" 7"ft study, school" 8"lt sick, disabled" 9"govt training scheme" 10"other"
set more off
foreach num of numlist 1/6 {
generate d_PastYearACTIVITY`num'=.
*Replace in the inapplicable and proxy values from d_nextstat`num' (since anyone with a -8 or -7 for this also has one for d_nextelse`num' but
*not vice versa);
replace d_PastYearACTIVITY`num'=d_nextstat`num' if (d_nextstat`num'==-8 | d_nextstat`num'==-7)
*Split the people in paid employment into self- and employee- employment according to the extra variables:
replace d_PastYearACTIVITY`num'=1 if d_nextstat`num'==1 & d_nextjob`num'==3
replace d_PastYearACTIVITY`num'=2 if d_nextstat`num'==1 & d_nextjob`num'>-1 & d_nextjob`num'<3
*3 for the unemployed:
replace d_PastYearACTIVITY`num'=3 if d_nextelse`num'==1
*4 for retired:
replace d_PastYearACTIVITY`num'=4 if d_nextelse`num'==2
*5 for mat/pat leave:
replace d_PastYearACTIVITY`num'=5 if d_nextelse`num'==3
*6 for family care:
replace d_PastYearACTIVITY`num'=6 if d_nextelse`num'==4
*7 for ft study, school
replace d_PastYearACTIVITY`num'=7 if d_nextelse`num'==5
*8 for lt sick, disabled
replace d_PastYearACTIVITY`num'=8 if d_nextelse`num'==6
*9 for gov't training scheme:
replace d_PastYearACTIVITY`num'=9 if d_nextelse`num'==7
*10 for other
replace d_PastYearACTIVITY`num'=10 if d_nextelse`num'==8
*And then label it:
label values d_PastYearACTIVITY`num' PastYearACTIVITYlab
}
*Check it has worked:
foreach num of numlist 1/6 {
tab d_PastYearACTIVITY`num'
}
*Yep.

*UNLIKE for the first activity post-ff-act, end date info is contained in the following three variables, which appear to apply to employment and non-employment spells alike:
foreach num of numlist 0/6 {
tab d_statendd`num'
tab d_statendm`num'
tab d_statendy4`num'
}

*Now make some ym_ and mdy_ format variables for all these:
foreach num of numlist 0/6 {
generate d_ym_spellend`num'=ym(d_statendy4`num', d_statendm`num')
format d_ym_spellend`num' %tm
generate d_mdy_spellend`num'=mdy(d_statendm`num', d_statendd`num', d_statendy4`num')
format d_mdy_spellend`num' %td
}

********************************************************************************************************************************************


***HOWEVER, THERE ARE THE FOLLOWING DISCREPANCIES REMAINING:


*1): SUPPOSEDLY CURRENT (OR ENDED) EMPLOYMENT SPELLS WITH 'INAPPLICABLE' SPELL TYPES.

*Current, though inapplicable type:
tab d_jbsamr d_samejob if d_nxtst<0 & d_currjob0==1
*Ended, though inapplicable type:
tab d_jbsamr d_samejob if d_nxtst<0 & d_currjob0==2

*Later in the sequence?
set more off
foreach num of numlist 0/6 {
tab d_PastYearACTIVITY`num' if (d_currjob`num'==1 | d_currstat`num'==2)
}
*These are actually all restricted to d_PastYearACTIVITY0, the first spell following the fed-forward spell.

*So for 1950 people (589 in BHPS sample), their employment status was coded to 'inapplicable' although they all said this was their current job.
*A further 135 (62 in BHPS sample) have inapplicable though this was a PAST spell of employment, since d_currjob0==2 rather than 1.
tab d_nxtst d_currjob0
*NB: no analagous issue for non-employment spells:
tab d_nxtst d_currstat0

*****************************************************************************************************************************************************************

***UPDATE, MARCH 29TH: THESE ARE PEOPLE DOING EITHER DIFF JOB FOR SAME EMPLOYER OR SAME JOB FOR DIFF EMPLOYER (explains why this discrepancy is only for employment spells)

***THE NEXT ACTIVITY IS THE CONTINUATION AFTER SLIGHT CHANGE.
***SOLUTION: SUB FORWARD THE ACTIVITY TYPE (EMPLOYMENT OR SELF-EMPLOYMENT) TO THIS NEXT ACTIVITY SPELL.
*HOWEVER: DOESN'T LOOK LIKE THERE IS ANY INFORMATION TAKEN ON WHEN THIS CHANGE OCCURED.
*HENCE, TWO CHOICES: CAN TREAT AS TWO SPELLS AND ASSUME CHANGEOVER USING MIDPOINT LOGIC REQUIRED, OR CAN COUNT AS AN EXTENSION OF THE SAME (FED-FORWARD) ACTIVITY,.
*THIS WILL INVOLVE SHIFTING ANY SUBSEQUENT SPELLS BACK BY ONE SLOT, RECODING FF-ACT CURRENTNESS FLAG TO CURRENT WHERE APPROPRIATE, AND UPDATING END DATES OF TEH FF-ACT.

*Flag them:
gen d_intraemplch_ff0=0
replace d_intraemplch_ff0=1 if d_nxtst<0 & d_currjob0==1
replace d_intraemplch_ff0=2 if d_nxtst<0 & d_currjob0==2
label variable d_intraemplch_ff0 "intra-empl change ff_act/-0 act. Treat as 1 spell. 1:ext ended,2:ext current"
tab d_intraemplch_ff0

******************************************************************************

*Treat them as an extension of the ff-act.

*It turns out that for all of these people, the ff-act is still coded to current
tab d_empchk if d_intraemplch_ff0>0

*So I guess what we need to do is just IGNORE the -0 act?
*Are there activities later?
foreach num of numlist 0/6 {
tab d_PastYearACTIVITY`num' if d_intraemplch_ff0>0
}
*AAARGH. A FEW - all restricted to where the -0 act has ended
foreach num of numlist 0/6 {
tab d_PastYearACTIVITY`num' if d_intraemplch_ff0==1
tab d_PastYearACTIVITY`num' if d_intraemplch_ff0==2
}

*For the first group, ERASE ALL TRACES OF THE -0 ACTIVITY.
*START BY CODING THE CURRENTNESS FLAG TO INAPPLICABLE.
replace d_currjob0=-8 if d_intraemplch_ff0==1

*For the 26, also do this, but later activities will need to be shifted back.
replace d_currjob0=-8 if d_intraemplch_ff0==2

*SHIFT BACK ALL THE INFO:
foreach varstem in d_PastYearACTIVITY d_currjob d_currstat d_statendm d_statendd d_statendy4 d_ym_spellend d_mdy_spellend {
foreach num of numlist 0/5 {
local Nplus1=`num'+1
replace `varstem'`num'=`varstem'`Nplus1' if d_intraemplch_ff0==2
}
}
*These 73 carry through to the next, larger problem group - people with additional activities reported despite a fed-forward act which is 'current'.

********************************************************************************************************************************************

**3): THE LARGER GROUP OF PEOPLE WHO REALLY DO HAVE INFORMATION FOR ACTIVITITES LATER IN THE SEQUENCE, DESPITE HAVING A FED-FORWARD ACTIVITY WHICH WAS 'CURRENT'.
*There should be no values other than inapplicable for the first activity after the fed-forward one, but there are:
count if (d_nxtst>0 | d_nxtstelse>0) & (d_empchk==1 | d_notempchk==1)
tab d_2Loops if (d_nxtst>0 | d_nxtstelse>0) & (d_empchk==1 | d_notempchk==1)

*They all also had end dates for the fed-forward activity:
tab d_empstendy4 if (d_nxtst>0 | d_nxtstelse>0) & (d_notempchk==1 | d_empchk==1), miss

*Another way to see this using the composite activity variable:
tab d_ff_jbstat d_PastYearACTIVITY0 if (d_notempchk==1 | d_empchk==1)

*Generalise this to later activities in the sequence:
foreach num of numlist 0/6 {
count if d_PastYearACTIVITY`num'>0 & (d_notempchk==1 | d_empchk==1)
tab d_PastYearACTIVITY`num' d_2Loops if d_PastYearACTIVITY`num'>0 & (d_notempchk==1 | d_empchk==1)
}

*OK, so this also affects later in the sequence.
*This group is a composite of 2 x problem groups:
***THE 73 ALREADY ENCOUNTERED IN DISCREPANCY SECTION (2) ABOVE, and also LATER SPELLS REPORTED IN THE SEQUENCE FOR THE 2Loops PEOPLE, WHERE START POINT FOR OTHER LOOP STILL CURRENT.

foreach num of numlist 0/6 {
count if d_PastYearACTIVITY`num'>0 & (d_notempchk==1 | d_empchk==1)
list d_PastYearACTIVITY`num' d_empchk d_notempchk if d_2Loops==1 & d_PastYearACTIVITY`num'>0 & (d_notempchk==1 | d_empchk==1)
}
*OK, so what happened here for the 2Loops people is that THESE NEXT ACTIVITITES ARE FROM LATER IN THE LOOP THAT STARTED WITH d_empchk==2, meaning ended.
*HOWEVER, since decision made for our purposes to let the OTHER LOOP TAKE PRECEDENCE, these need to be ignored.

*MAKE A FLAG:
gen d_IGNORE_laterspells=0
foreach num of numlist 0/6 {
replace d_IGNORE_laterspells=1 if d_PastYearACTIVITY`num'>0 & (d_notempchk==1 | d_empchk==1)
}
tab d_IGNORE_laterspells


***************************************************************************************************************************

*0): People who have an 'inapplicable' for the end date of the first later activity (0), even though
*a) it is ended and b) there is a -1 activity.
count if d_statendy40<0 & (d_currjob0==2 | d_currstat0==1) & (d_PastYearACTIVITY1>0 & d_PastYearACTIVITY1!=.)

*6 PEOPLE - WORRY ABOUT THIS LATER.

***************************************************************************************************************************

*5).1) FED-FORWARD ACTIVITIES WHICH NEED TO BE FRONT-TRUNCATED? NOT RELEVANT WHEN BIOMARKERS AS BASELINE. (Refer to files for BMI prep for the code).

********************************************************************************************************************************************

**THE OTHER, POTENTIALLY MORE PROBELMATIC, UNRESOLVED THING.

*Start with those who said their fed-forward activity was still current:
tab d_jbstat d_ff_jbstat if d_jbstat!=d_ff_jbstat & (d_empchk==1 | d_notempchk==1) & d_2Loops!=1
*677 MISMATCHES.
*What about people whose 'current' activity was later in the sequence?
foreach num of numlist 0/6 {
tab d_jbstat d_PastYearACTIVITY`num' if d_jbstat!=d_PastYearACTIVITY`num' & (d_currjob`num'==1 | d_currstat`num'==2) & d_IGNORE!=1 & d_2Loops!=1
}

*This is important because the STANDALONE ONE is the one taken forward as the fed-forward act at the next wave:
*tab b_jbstat d_ff_jbstat

*So there are LOADS WHICH NEED TO BE SEQUENCED, using month of W2 interview as the changeover date
*(or the MIDPOINT since activity start?)if a more useful variable for start of current activity doesn't show up.

*What else do we have though?
*For CURRENT EMPLOYMENT SPELLS ONLY, there is a set of variables for when it started: d_jbbgy, d_jbbgm, d_jbbgd
*However, this was ONLY ASKED OF NEW ENTRANTS NEVER INTERVIEWED, hence N/A for almost everyone. can't use this.

**NOV 21ST: I THINK THE ONLY THING TO DO HERE IS ACCEPT THAT IT DOENS'T MATCH UP AND JUST USE ACCOUNT FROM THE FED-FORWARD/PAST SPELLS MODULE.

*Having decided this, MAKE A SINGLE VARIABLE FOR THAT.
generate d_CURRACTIV_emphist=.
replace d_CURRACTIV_emphist=d_ff_jbstat if (d_empchk==1 | d_notempchk==1)
*Since in ff_jbstat, 10=unpaid work for family business and 97=other, merge these two:
recode d_CURRACTIV_emphist 97=10

foreach num of numlist 0/6 {
replace d_CURRACTIV_emphist=d_PastYearACTIVITY`num' if (d_currjob`num'==1 | d_currstat`num'==2)
}
*Label it:
label values d_CURRACTIV_emphist d_PastYearACTIVITY0lab
label variable d_CURRACTIV_emphist "USoc W2 current activity, according to W2 emp history module"
tab d_CURRACTIV_emphist, missing


tab d_CURRACTIV_emphist d_jbstat if d_CURRACTIV_emphist!=d_jbstat
*825
*How many is this out of?
count if d_ff_ivlolw==1 & d_ivfio==1
*so about 9% of the eligible sample at this stage.

*OK. The problem is that since the STANDALONE jbstat variable is the one fed-forward for start the `w'_CURRACTIV_emphist loop at the next wave,
*we CAN'T IGNORE IT.
*MAKE A FLAG:
generate d_CONFLICT_curractiv=0
recode d_CONFLICT_curractiv 0=1 if d_CURRACTIV_emphist!=. & d_CURRACTIV_emphist!=d_jbstat

tab d_CONFLICT_curractiv

*For BMI analysis, see if you can get away with EXCLUDING these people

********************************************************************************************************************************************

*GENERATE START DATES FOR ALL THE ACTIVITIES REPORTED AT THIS SWEEP.

*Do this by first making a variable for HOW MANY PAST SPELLS at this sweep, INCLUDING THE CURRENT ONE.
*Because here the -0 activity is a PAST SPELL TOO), use Nplus1
generate d_No_Pastspells=.
*Code to 0 for people whose FED-FORWARD ACTIVITY is still the current one, i.e. who reported NO NEW SPELLS AT ALL:
replace d_No_Pastspells=0 if d_empchk==1 | d_notempchk==1
*Then Nplus1 for the rest:
foreach num of numlist 0/6 {
local Nplus1=`num'+1
replace d_No_Pastspells=`Nplus1' if d_PastYearACTIVITY`num'!=. & d_PastYearACTIVITY`num'>0
}
tab d_No_Pastspells if d_PRESENT==1, missing
*Who are the people without any value at all?

*The people with a . fall into two groups:
*People not present at the last wave, hence no fed-forward status:
*tab b_PRESENT if d_No_Pastspells==., missing
*And people who were there at a but don't have valid information for the employment histroy module, for whatever reason:
*tab d_empchk d_notempchk if d_No_Pastspells==. & b_PRESENT==1, missing

*A bunch are explained by missing/proxy info from a_ such that they never started the Pastspells loop at b:
tab d_ff_jbstat if d_No_Pastspells==. & d_PRESENT==1, missing

*For the others:
list d_ff_jbstat d_empchk d_notempchk d_nxtst d_nxtstelse d_PastYearACTIVITY0 d_nextstat1 d_nextelse1 W2_EXCL_flag_flag if d_No_Pastspells==. & d_PRESENT==1 & d_ff_jbstat>0
*These are people with a terminated loop because they 'didn't know' or missing for an activity type, had inapplicable for both d_empchk and d_notempchk
*(No longer includes people for whom it is unclear whether a period of empl was E or SE, since assigned their own code now)

*Cannot treat these as defined periods of mystery activity and move on, because NO USABLE INFO LATER IN SEQUENCE.

*CANNOT USE THEM - FLAG THEM:
gen Truncated_emploop_flag=0
replace Truncated_emploop_flag=1 if d_No_Pastspells==. & d_ff_jbstat>0 & d_nxtst==-8 & d_nxtstelse==-8
replace Truncated_emploop_flag=1 if d_nxtst==-1 | d_nxtst==-9 | d_nxtstelse==-1 | d_nxtstelse==-9
replace Truncated_emploop_flag=1 if (d_empchk==2 | d_notempchk==2 | d_empchk==-1 | d_notempchk==-1 | d_empchk==-9 | d_notempchk==-9) & d_nxtst==-8 & d_nxtstelse==-8
replace Truncated_emploop_flag=1 if (d_empchk==2 | d_notempchk==2) & (d_nxtst==-1 | d_nxtstelse==-1 | d_nxtst==-2 | d_nxtstelse==-2 | d_nxtst==-9 | d_nxtstelse==-9)

tab Truncated_emploop_flag
*13

*Having done that, can now GENERATE START DATE VARIABLES for each activity, equal to the end date of the ones before.
*First the -0 activity, which takes the end date of the fed-forward act:
generate d_statbgd0=d_empstendd if d_No_Pastspells>=0 & d_No_Pastspells!=.
generate d_statbgm0=d_empstendm if d_No_Pastspells>=0 & d_No_Pastspells!=.
generate d_statbgy0=d_empstendy if d_No_Pastspells>=0 & d_No_Pastspells!=.
*Then the others:
foreach num of numlist 1/6 {
local Nminus1=`num'-1
generate d_statbgd`num'=d_statendd`Nminus1' if d_No_Pastspells>=`num' & d_No_Pastspells!=.
generate d_statbgm`num'=d_statendm`Nminus1' if d_No_Pastspells>=`num' & d_No_Pastspells!=.
generate d_statbgy`num'=d_statendy4`Nminus1' if d_No_Pastspells>=`num' & d_No_Pastspells!=.
}

*Now make some ym_ and mdy_ format variables for all of these:
foreach num of numlist 0/6 {
generate d_ym_spellstart`num'=ym(d_statbgy`num', d_statbgm`num')
format d_ym_spellstart`num' %tm
generate d_mdy_spellstart`num'=dofm(d_ym_spellstart`num')
format d_mdy_spellstart`num' %td
}
*Do some checks.
set more off
*Most of them add up as they should:
*For the -0 activity:
*Do any look wrong?
list d_No_Pastspells d_empchk d_notempchk d_empstendd d_empstendm d_empstendy d_ym_ffact_enddate d_statbgd0 d_statbgm0 d_statbgy0 d_PastYearACTIVITY0 if d_ym_spellstart0!=d_ym_ffact_enddate & d_No_Pastspells>0 & d_No_Pastspells!=.
*Nope!

*And the later ones:
foreach num of numlist 1/6 {
local Nminus1=`num'-1
count if d_ym_spellstart`num'!=d_ym_spellend`Nminus1'
*Check ones which are wrong:
list d_statendd`Nminus1' d_statendm`Nminus1' d_statendy4`Nminus1' d_ym_spellend`Nminus1' d_statbgd`num' d_statbgm`num' d_statbgy`num' d_ym_spellstart`num' if d_ym_spellstart`num'!=d_ym_spellend`Nminus1' & d_ym_spellstart`num'!=.
*WAHEY!
}
    (1-1/1)