Merging UKHLS waves and optimal weight

Hello everyone,

I am doing my master thesis and I am interested in studying the integration/assimilation process of immigrants in the UK namely in two areas: economic integration and cultural assimilation. For this, I thought I would use the UKHLS data set waves 1-10 (or 2-10 in order to use the sample that participated in the BHPS & IEMB), most specifically the indresp.dta of all rounds. I have some questions that i am going to expose here:

1- What is the best way to join all waves into one data set? Option 1: merge all individual level files using PIDP and the xwavedat; Option 2: simply append all individual level files (including xwavedat)?

2- In terms of optimal weight, given that I want to do a longitudinal analysis from round 2-10 and I intend to use the 5min survey, I am doing the following: svyset i_psu[pw=i_ind5mub_lw], strata(i_strata) . Is this correct? If I the decide to use waves 1-10 I should do svyset i_psu[pw=i_ind5mus_lw], strata(i_strata) right?

3- Now a more theoretical question: Ideally I would like to compare some variables for natives and immigrants (from the 5min survey). How can I be sure a question is asked to both groups? When tabulating some variables (e.g. bysort a_ukborn: tab a_resjobdeny3 , it seems this question was only asked to non-uk born is this right? Additionally, some variables appear to being asked to the IEMB but when tabulating that variable (e.g. by doing tab f_mabroad) I get 89.79% of observation with the value 'Only available for IEMB'. How can I analyse those observations? Do I need to access the data in a different way? Is it in a different file?

4- How can I interpret proxy variables? Can I take any value from it?

Thanks in advance and I apologise for such 'newbie' questions,

