Support #2119
openSvy Commands and Fixed Effects Regressions
50%
Description
Hi there,
I'm using UKHLS panel data for my MSc Behavioural Science dissertation in which I am trying to explore the impact of perceived neighbourhood social cohesion (NSC_index) on life satisfaction (lfsato). However, I can't work out how to run a fixed effects regression accounting for the complex survey design of UKHLS..
UKHLS recommends using the svy suite of commands so I have set up my do-file as follows:
// DECLARE COMPLEX SURVEY DESIGN
use UKHLS_long_acfil_cleaned_usable.dta
- set correct weights
svyset, clear
svyset l_psu [pweight = l_indscus_lw], strata(l_strata) singleunit(scaled)
My first question is: Have I done this correctly? Should l_psu be pidp instead given that is the smallest unit I am looking at?
and is single unit (scaled) correct?
Then, I declare the panel data set up:
// DECLARE PANEL DATA SET UP
//Use xtset command to tell stata that this data has a panel structure - pidp being the unique identifier and wave being the time variable
sort pidp wave
xtset pidp wave
I am now trying to run fixed effects regressions to work out whether a change in perceived neighbourhood social cohesion leads to a change in life satisfaction however, the command I would normally use for fixed effects regressions (xtreg) is not compatible with svy. Does anyone know of a command that could do this?
I have since come up with the following options:
//OPTIONS TO ACCOUNT FOR COMPLEX DESIGN/WEIGHTS
- svy: reg lfsato NSC_index_nm i.wave (this leads to really high estimates as it doesn't account for individual fixed effects)
- svy: reg lfsato NSC_index_nm i.wave, absorb (pidp)
- xtreg lfsato NSC_index_nm i.wave [pweight=l_indscus_lw], fe vce(cluster pidp)
- areg lfsato NSC_index_nm i.wave [pweight=l_indscus_lw], absorb(pidp) cluster(pidp)
- reghdfe lfsato NSC_index_nm i.wave [pweight=l_indscus_lw], absorb(pidp) vce(cluster pidp strata) //chatGPT told me to add strata and then this command should mimic the syvset command?
In summary, my key questions are:
1. When I am declaring the complex survey design - have I done this correctly? Should l_psu be pidp instead? and is single unit (scaled) correct?
2. What syntax do I use to run a fixed effects regression that accounts for the complex survey design of UKHLS
Thank you in advance for any advice you can provide.
Emma
Files