Hello,
I have 8905 observations for gen1 after appending the two indresp files, as I joined b_pacob with its feed forward variable b_ff_pacob.
However, since I want to work with both waves, I need to keep only those individuals that responded in both waves. By doing this 28815 observations are deleted.
Moreover, from what I understood, if I want to estimate using the longitudinal weight, I have to keep only the observations from wave 2. In this case, other 38388 observations are deleted.
This is why I end up with so few observations.
Here below you can see my stata output in major detail.
. use pidp b_psu b_strata b_indscus_lw b_macob b_pacob b_ff_pacob b_ff_macob b_ukborn b_sclfsato ///
using "$dirdata\b_indresp", clear
. gen wave=2
. rename b_pacob b_pacob_new
. gen b_pacob=.
(54597 missing values generated)
. replace b_pacob=b_pacob_new if b_ff_pacob<0
(49675 real changes made)
. replace b_pacob=b_ff_pacob if b_pacob_new<0
(47410 real changes made)
. rename b_macob b_macob_new
. gen b_macob=.
(54597 missing values generated)
. replace b_macob=b_pacob_new if b_ff_macob<0
(49660 real changes made)
. replace b_macob=b_ff_macob if b_macob_new<0
(47397 real changes made)
.
. drop b_ff_pacob b_pacob_new b_ff_macob b_macob_new
.
. renpfix b_
. compress
wave was float now byte
pacob was float now byte
macob was float now byte
. save "$dirresults\bind_junk", replace
file C:\Users\gmontr\Desktop\Dropbox\Research\bind_junk.dta saved
.
.
. use pidp a_hidp a_psu a_strata a_macob a_pacob a_ukborn a_sclfsato ///
using "$dirdata\a_indresp", clear
.
. gen wave=1
.
. renpfix a_
. append using "$dirresults\bind_junk"
. compress
wave was float now byte
. save "$dirresults\abind_long", replace
file C:\Users\gmontr\Desktop\Dropbox\Research\abind_long.dta saved
. tsset pidp wave
panel variable: pidp (unbalanced)
time variable: wave, 1 to 2
delta: 1 unit
.
. mvdecode _all, mv(-9/-1)
ukborn: 46836 missing values generated
pacob: 50004 missing values generated
macob: 49756 missing values generated
sclfsato: 22609 missing values generated
.
. summ macob pacob ukborn sclfsato
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
macob | 55835 11.35709 25.02096 1 97
pacob | 55587 11.62329 25.18556 1 97
ukborn | 58755 2.051247 1.609738 1 5
sclfsato | 82982 5.231713 1.475936 1 7
.
. *Immigrant groups
. * 1st gen
. gen gen1=0
. replace gen1=1 if macob>4 & pacob>4 & ukborn==5 & ukborn!=. & macob!=. & pacob!=.
(8905 real changes made)
.
. tab gen1
gen1 | Freq. Percent Cum.
------------+-----------------------------------
0 | 96,686 91.57 91.57
1 | 8,905 8.43 100.00
------------+-----------------------------------
Total | 105,591 100.00
.
. *Keep only individuals from wave 2, who responded in both waves
. bysort pidp: gen q = _N
. *browse pidp wave q
. keep if q==2
(28815 observations deleted)
. keep if wave==2
(38388 observations deleted)
.
. tab gen1
gen1 | Freq. Percent Cum.
------------+-----------------------------------
0 | 38,230 99.59 99.59
1 | 158 0.41 100.00
------------+-----------------------------------
Total | 38,388 100.00
. *Statistics representative of UK population
. svyset, clear
. svyset psu [pweight=indscus_lw], strata(strata)
pweight: indscus_lw
VCE: linearized
Single unit: missing
Strata 1: strata
SU 1: psu
FPC 1: <zero>
. svy, subpop (if gen1==1):mean sclfsato
(running mean on estimation sample)
all observations in subpop() subpopulation have zero weights
r(461);