Understanding Society User Support: Issueshttps://iserredex.essex.ac.uk/support/https://iserredex.essex.ac.uk/support/support/favicon.ico?15995719382024-03-26T16:09:20ZUnderstanding Society User Support
Redmine Support #2077 (Feedback): Using income variables https://iserredex.essex.ac.uk/support/issues/20772024-03-26T16:09:20ZMhairi Webster
<p>Hello,</p>
<p>I am looking to access the data to use the derived income variables (w_fimnnet_dv). Could you let me know under what access it is under on the UK Data Service as those variables don't appear to be available in the dataset I am using (Understanding Society: Waves 1-8, 2009-2017 and Harmonised <br />BHPS: Waves 1-18, 1991-2009. 11th Edition. UK Data Service. SN: 6614, <a class="external" href="http://doi.org/10.5255/UKDA-SN6614-12">http://doi.org/10.5255/UKDA-SN6614-12</a>).</p>
<p>Many thanks, <br />Mhairi Webster</p> Support #2076 (Feedback): Issues with xx_hadcvvac variables in COVID-19 data collectionhttps://iserredex.essex.ac.uk/support/issues/20762024-03-13T21:01:15ZLaura L
<p>Good evening,</p>
<p>I am currently analysing data from the <em>xx_indresp_w</em> datasets of the COVID-19 data collection, specifically from wave 9 (ci), wave 8 (ch) and wave 7 (cg). From the documentation, the questions <em>xx_hadcvvac</em> (about having received the COVID-19 vaccine in each survey wave) should be asked to respondents that have not already answered that they received 1 or 2 doses of vaccines in previous months (answer codes 1 and 2). However, by cross-tabulating the answers to the <em>xx_hadcvvac</em> questions for wave 7 and 9 for respondents present in wave 9 and 7 (left-joining the datasets by respondent ID <em>pidp</em>, i.e. matching all respondents in wave 9 with those that were also in wave 7):</p>
<p>table(ci_hadcvvac = wave_9$ci_hadcvvac, cg_hadcvvac = wave_9$cg_hadcvvac)</p>
<p>with <em>wave_9</em> the left-joined dataset, I obtain the following table:</p>
<pre><code>cg_hadcvvac<br />ci_hadcvvac -9 -8 -2 1 2 3 4<br /> -8 0 10 0 133 9 492 4835<br /> -2 2 0 2 0 0 0 4<br /> 1 0 0 0 4 1 1 133<br /> 2 0 3 1 <strong>1663 116</strong> 36 2538<br /> 3 0 0 0 0 0 1 5<br /> 4 0 0 0 2 0 3 322</code></pre>
<p>As you can see from the numbers in bold (took as examples), there are some respondents vaccinated in wave 7 that appear to be asked the question again in wave 9. Am I missing some information?</p>
<p>Thank you very much in advance for the support.</p>
<p>Best regards, <br />Laura</p> Support #2075 (Feedback): Using UKHLS to look at trends across calendar months https://iserredex.essex.ac.uk/support/issues/20752024-03-13T15:36:15ZJames Laurence
<p>Hi there,</p>
<p>I am interested in looking at calendar month trends in whether someone wants to move home or not (which is available in every wave): lkmove. Ideally, I would like to look at trends using all waves (1-13). However, if it is easier to look at trends from some other start point, e.g.. 2016 or 2017, then I am flexible. I am also flexible as to whether the BHPS sample is included or not. This will be cross-sectional analysis, so I hope to treat each calendar month as a cross-section (I won’t be doing any longitudinal analysis).</p>
<p>I have been reading the helpful notes on ‘Running analysis on a calendar year or month’ (<a class="external" href="https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/how-to-use-weights-analysis-guidance-for-weights-psu-strata/">https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/how-to-use-weights-analysis-guidance-for-weights-psu-strata/</a>). However, I just had some questions and was hoping to see if where I’d got to so far looked right.</p>
<p>I have been using the w_month and wave variables to generate a new date variable of year-month. To capture calendar year, I have used the wave and w_month variables in the following manner:</p>
<p>gen year = 2009 if wave==1 & (month>0 & month<13)<br />replace year = 2010 if wave==1 & (month>12 & month<25)<br />replace year = 2010 if wave==2 & (month>0 & month<13)<br />replace year = 2011 if wave==2 & (month>12 & month<25)<br />replace year = 2011 if wave==3 & (month>0 & month<13)<br />…<br />replace year = 2021 if wave==13 & (month>0 & month<13)<br />replace year = 2022 if wave==13 & (month>12 & month<25)</p>
<p>To measure calendar month, I have recoded the w_month variable, combining the two monthly measures into one. So, in the w_month variable, it tells us whether someone was sampled in January in the year 1 sample or January in the year 2 sample. I’ve now combined these into a single category of whether someone was sampled in January. For example, ‘jan yr1’ and jan yr2’ are now just ‘jan’; ‘feb yr1’ and ‘feb yr2’ are now just ‘feb, etc.</p>
<p>With these new calendar year and calendar month variables, I have now created a new measure of calendar year-month, which looks like this (I hope this is correct so far):</p>
<pre><code>2009 Jan = 1<br /> 2009 Feb = 2<br /> 2009 Mar = 3<br /> 2009 Apr = 4<br /> 2009 May = 5<br /> 2009 June = 6<br /> 2009 July = 7<br />…<br /> 2022 June = 162<br /> 2022 July = 163<br /> 2022 Aug = 164<br /> 2022 Sep = 165<br /> 2022 Oct = 166<br /> 2022 Nov = 167<br /> 2022 Nov = 168</code></pre>
<p>I understand that whatever weight I choose to use I need to correct it due to Northern Ireland only being sampled in issue month 1-12 (and not 13-24). Therefore, I will apply the following adjustment to the weight (gen adj=1, replace adj=0.5 if w_country==4, gen weight=w_xxxyyus_lw*adj 8) as outlined in the online notes.</p>
<p>However, where I’ve become a little lost is what weights to initially use. In the notes, it states due to exceptions in sample selection ‘we recommend use of the us_lw weight in analysis’. Given my intention to look at calendar months up to wave 13, does this mean I should use the m_indpxus_lw weight? Is this the case, even if I just want to look at the data cross-sectionally (treat every calendar month as a cross-sectional picture of lkmove)? Because it seems that if I use m_indpxus_lw then it substantially reduces the sample size (due to these longitudinal weights requiring someone to have participated in every wave). Is it possible to use the cross-sectional weights for my aims, while excluding the BHPS and IEMB, as is suggested that one needs to do for this kind of calendar month analysis in the online notes? Or, do I need to use longitudinal weights for my intended analysis?</p>
<p>I was also just trying to get my head around the issue of scaling discussed in the online notes: ‘The weights provided are not designed directly for pooling data across waves as they are scaled to a mean value of 1.0 within each wave, and therefore produce different weighted sample sizes in each wave’, under the section ‘Pooling data from different waves for cross-sectional analysis.’ Firstly, I just wanted to confirm this applies to my case of doing monthly trends?</p>
<p>And secondly, if so, from what I can see, the syntax kindly provided is intended to produce an accurate weight to look at the variable jbstat for the calendar year 2011, using months 13-24 of wave 2 and 1-12 of wave 3. At the end, we get the weight variable weight2011, to use for weighting calendar year 2011. In my situation, I would like to do a longer running trend of values of lkmove by months. Would I need to create these weights for each calendar year I look at? So, for 2014, I would need to create a new cross-sectional weight using e_indpxub_xw and f_indpxub_xw (waves 5 and 6). For 2015, I would need to create a new cross-sectional weight using f_indpxub_xw and g_indpxub_xw (waves 6 and 7). For 2016, I would need to create a new cross-sectional weight using g_indpxub_xw and h_indpxub_xw (waves 7 and 8). And to follow this all the way to my last calendar year. Then, to look at monthly trends, treating the data as pooled cross-sectional, I would have my data in long-format and have a new weight variable made up of all these new calendar year weights I’ve created?</p>
<p>I was also wondering if it would be possible to include monthly lkmove data from the calendar year 2022 (using wave 13 of the UKHLS mainstage). As I understand things, previous calendar years (e.g., 2018) are composed of samples from two waves (waves 9 and 10 of the mainstage). However, for the calendar year of 2022, it is only composed of the sample from wave 13. Is it still possible to look at calendar month trends in lkmove for 2022? If so, would I need to make other sample restrictions to the other calendar years, for example, drop the IEMB sample from the trends? And would I need to make other adjustments to the weights? Or, is it not possible yet to look at monthly trends until wave 14 comes out)? I think from the online notes this is mentioned: ‘The analysis sample is only representative when all 24 monthly samples are combined in equal measure.’ Does this point refer to my question?</p>
<p>I am also interested in potentially looking at quarterly trends (Jan-Mar, Apr-Jun, etc.), instead of monthly trends (using the x_quarter variable). To do so, can I take the same approach as above? So, create a new time variable which is years divided into quarters (e.g., 2013 Jan-Mar, 2013 Apr-Jun, 2013 July-Sep, 2013 Oct-Dec, 2014 Jan-Mar, 2014 Apr-June…2022 Jul-Sep, 2022 Oct-Dec). Do I need to do anything different with the weights?</p>
<p>I hope this all makes sense.</p>
<p>Thanks so much in advance.</p>
<p>James</p> Support #2073 (Feedback): Data filehttps://iserredex.essex.ac.uk/support/issues/20732024-03-08T16:20:51ZLuisa Edwards
<p>What is the date period for the Wave 13 indresp file, and was this post Covid.<br />I.e would it be possible to compare a monthly COVID data set to the whole of Wave 13, looking at post Covid environment?</p> Support #2072 (Feedback): tracking spouse after the household dissolutionshttps://iserredex.essex.ac.uk/support/issues/20722024-03-06T20:57:10ZSeok Woo Kwonkwonsw@gmail.com
<p>Hi, I am wondering if the study follows ALL household members after household dissolutions and not just the household head.<br />Thanks for your help in advance.</p> Support #2071 (Feedback): Choosing weight where some variables are self-completion and others are...https://iserredex.essex.ac.uk/support/issues/20712024-03-06T17:23:35ZMolly Rowe
<p>Hi,</p>
<p>I am trying to select the correct weight, and am having some difficulty with what to choose for the instrument/which question(aires) part. I believe that some of my variables were part of the self-completion section of the questionnaire, while others were obtained by interview (if that's possible?). Therefore, I'm not sure whether to select self-completion (sc) or interview (in) for the Yy part of the weighting.</p>
<p>Any help would be greatly appreciated!</p> Support #2070 (Feedback): Creating Chronology when using COVID-19 and main panel datahttps://iserredex.essex.ac.uk/support/issues/20702024-03-06T11:29:09ZIsaac Hance
<p>I want to create a variable in stata that allows me to ensure I am viewing each individuals responses in order, when using the COVID panel and main panel merged together. Because of the overlap in waves - such as some COVID panels being during the data collection period of multiple main survey waves, this is complex. I am planning to merge _intdatem and _intdatey, but cannot seem to do it in a way that lets me sort.</p> Support #2069 (Feedback): Match children information with parental informationhttps://iserredex.essex.ac.uk/support/issues/20692024-03-06T08:17:49ZGiovanni Greco
<p>Good morning Users.<br />For my Master thesis, I am using data on children, and I need parental information (their household ID and their household income) to be matched into childrens data. Probably the family matrix is of great help, but I am struggling to figure out how to do it in Stata.<br />Has there been anyone with a similar challenge?<br />Thank you in advance.</p> Support #2068 (Feedback): Wave 16 biological datahttps://iserredex.essex.ac.uk/support/issues/20682024-03-05T12:49:48ZEleanor Winpennyew470@cam.ac.uk
<p>Hi,<br />Do you have any estimate when the wave 16 biological data is likely to be released? This would be really helpful to know whether I can include it in a grant application.</p> Support #2067 (Feedback): data accesshttps://iserredex.essex.ac.uk/support/issues/20672024-03-04T17:02:55Zmike polkey
<p>Dear UKHLS</p>
<p>I am writing from the Royal Brompton Hospital (now part of GSTT) and imperial College in London</p>
<p>We would very much like to access the UKHLS database to extend a research theme that began 2 years ago. In the first stage we have developed a model which predicts the likelihood of having obstructive sleep apnoea. In the next stage we wanted to relate it to economic activity but HSE data only gives this at an occupation or industry level but our reading of your published papers is that you have individual level data; the other data we would need would be PSQI, ht, wt, h/o cardiovascular disease, age gender and occupation.</p>
<p>I reviewed the FAQ; the lead student here did his MSc with us at Imperial but has now returned to Japan so would not be able to analyse the data in the UK</p>
<p>Please do reply by phone if easier; 07801553468</p> Support #2065 (Feedback): How to manage longitudinal data analysis after excluding sample based o...https://iserredex.essex.ac.uk/support/issues/20652024-03-04T15:09:36ZMarina Kousta
<p>I am conducting a (longitudinal) diff-in-diff analysis for a policy evaluation where the date of policy introduction is important. I have a few questions below:</p>
<p>1) As my date of interest falls in the middle of a single wave, I could split up wave X into two parts indicating the before and after. Is this enough so that I can only use a single wave for the analysis, OR would you say it is preferable that I also use more waves to more accurately represent the year for the before and after treatment? ( the reason i am asking is because i read the following on your website: "As some samples are fielded in the first 12 months (BHPS and General Population-Northern Ireland samples), some in months 13-24 (IEMB sample) and some across all 24 months (General Population-Great Britain and EMB samples), just using data from the same wave to compare the two consecutive years will result in comparing different samples. Similarly, just using data from year 1 or year 2 of a wave to conduct cross-sectional analyses of that year will result in analysing samples that are not-representative. So, to correctly do these types of analyses, data from two waves need to be combined. For example, for 2019, use data from year 2 of Wave 10 and year 1 of Wave 11."</p>
<p>2) To split up any given wave into two separate waves etc, which variable would you recommend? I have seen many variables in the dataset indicating the month of interview, year, etc but there are also others relating to the sample, but I am unsure which variable would be the most accurate? Moreover, I am confused as some waves suggest they may extend across three calendar years but when I look at the year of interview variable, it only reflects year 1 and year 2, there is no mention of year 3.</p>
<p>3) Which weights would you recommend using in this case?</p>
<p>Many thanks in advance for any help you can provide.</p>
<p>Best,<br />Marina</p> Support #2064 (Feedback): calendar year dataset - longitudinal analysishttps://iserredex.essex.ac.uk/support/issues/20642024-03-04T14:44:42ZMarina Kousta
<p>I am conducting an analysis for which I need to use the provided calendar year datasets. I have the following questions:<br />1) You state on the website that the calendar year datasets are not intended for longitudinal analysis; why is that, and, is there a way to overcome this? (asking as I want to conduct a longitudinal analysis)<br />2) Do you also recommend avoiding longitudinal analysis when we manually construct the calendar year datasets ourselves (by merging the waves)?<br />3) If I go ahead with either 10 or 20, would you recommend avoiding to use the provided longitudinal weights?</p>
<p>Many thanks in advance for your help.</p>
<p>Best wishes,<br />Marina</p> Support #2062 (Feedback): hcondncode38 variablehttps://iserredex.essex.ac.uk/support/issues/20622024-02-29T15:24:44ZEmma Kirwan
<p>Hello,</p>
<p>I'm a little confused by the hcondncode38 variable and am hoping you can help me.</p>
<p>I read in the questionnaire Universe that this question, if they have ever been diagnosed with a condition (in this case i am interested in clinical depression), is asked to new entrants. It is coded 0 for 'not mentioned' and 1 for 'yes, mentioned' in the data file. But I'm wondering how can a participant have a response of 1 for Wave 10 and 0 for Wave 11? Or how should I interpret this? Does this mean the participant has reported a diagnosis but because they have not mentioned it in the subsequent wave it is coded as 0 'not mentioned'? In this case shouldn't this be inapplicable?<br />Is there a variable I can use where it will provide information if a participant has ever reported a diagnosis of depression?</p>
<p>Many thanks in advance.</p> Support #2060 (Resolved): Design weights taken account of in enumeration weights?https://iserredex.essex.ac.uk/support/issues/20602024-02-27T13:21:37ZRosie Cornish
<p>I think the answer to this is yes, but can you confirm that the household enumeration weights (e.g. a_hhdenus_xw) take account of the design weights - i.e. they are the product of the design weight and a household response weight?</p> Support #2058 (Resolved): Using longitudinal weights when combining Covid-19 waves and mainstage ...https://iserredex.essex.ac.uk/support/issues/20582024-02-22T16:48:24ZJames Laurence
<p>Hi there,</p>
<p>I was just hoping to get some more advice regarding correctly weighting my analysis combining the mainstage and Covid-19 waves of the UKHLS. You kindly helped with a previous weighting issue I had for treating the data as repeated cross-sections. However, I am also hoping to conduct some fixed effects panel data analysis of the combined mainstage and Covid-19 waves (web survey only).</p>
<p>As a basic set-up, I am combining wave 9 of the UKHLS mainstage survey (the last mainstage survey that doesn’t cover the pandemic) with waves 1 to 9 of the COVID-19 survey. The data are in long format. As I would like to do some fixed effects longitudinal analysis, I believe I need to use the longitudinal weights. From my reading, I need to choose the longitudinal weight from the last wave of the survey I will be using – in this case wave 9 of the Covid-19 survey: ci_betaindin_lw</p>
<p>Applying this weight [ci_betaindin_lw] will give me a balanced panel, restricting the sample to everyone who participated in all 9-waves of the Covid-19 survey. However, I would also like to analyse wave 9 of the mainstage survey as part of a longitudinal, fixed effects analysis covering mainstage wave 9 and Covid survey waves 1-9. Is this possible? If so, is one approach to feed back the ci_betaindin_lw weight so that the people who were in wave 9 of the mainstage survey who were also present in all 9-waves of the Covid-19 survey have the weight value of ci_betaindin_lw? Therefore, the ci_betaindin_lw weight would cover the mainstage wave 9 sample and the Covid-19 sample.</p>
<p>In case it’s not clear, to make-up an example of the data in long-format, which contains wave 9 of the mainstage survey and waves 1-9 of the Covid survey. Pidp no. 111111 was present in wave 9 of the mainstage sirvey and all 9 waves of the Covid survey and had a value of 1.5 for their longitudinal weight at wave 9 of the covid survey (ci_betaindin_lw). So, my data would just look like this:</p>
<p><strong>[PIDP]</strong> <strong>[WAVE] [Value of ci_betaindin_lw]</strong><br />111111 Mainstage wave 9 <em>Missing Value</em><br />111111 COVID wave 1 1.5<br />111111 COVID wave 2 1.5<br />111111 COVID wave 3 1.5<br />111111 COVID wave 4 1.5<br />111111 COVID wave 5 1.5<br />111111 COVID wave 6 1.5<br />111111 COVID wave 7 1.5<br />111111 COVID wave 8 1.5<br />111111 COVID wave 9 1.5</p>
<p>Is just feeding back the value of ci_betaindin_lw (1.5) what I need to do? So, it would now look like:</p>
<p><strong>[PIDP]</strong> <strong>[WAVE] [Value of ci_betaindin_lw]</strong><br />111111 Mainstage wave 9 <strong>1.5</strong><br />111111 COVID wave 1 1.5<br />111111 COVID wave 2 1.5<br />111111 COVID wave 3 1.5<br />111111 COVID wave 4 1.5<br />111111 COVID wave 5 1.5<br />111111 COVID wave 6 1.5<br />111111 COVID wave 7 1.5<br />111111 COVID wave 8 1.5<br />111111 COVID wave 9 1.5</p>
<p>If so, could this method apply if I wanted to include more mainstage waves of data? So, if I wanted to include waves 6, 7, 8 and wave 9 of the mainstage survey alongside waves 1-9 of the Covid survey - would I just feed back an individuals' weight value for ci_betaindin_lw back so the individual have that weight value for mainstage waves, 6, 7, 8 and 9?</p>
<p>I may be completely misunderstanding how to use the longitudinal weights, or have missed something crucial meaning you can't applying the Covid longitudinal weights to the pre-Covid survey mainstage waves. If so, apologies in advance and any advice would be hugely appreciated.</p>
<p>Best wishes,</p>
<p>James</p>