Understanding Society User Support: Issueshttps://iserredex.essex.ac.uk/support/https://iserredex.essex.ac.uk/support/support/favicon.ico?15995719382024-02-27T13:21:37ZUnderstanding Society User Support
Redmine Understanding Society User Support - Support #2060 (Resolved): Design weights taken account of in...https://iserredex.essex.ac.uk/support/issues/20602024-02-27T13:21:37ZRosie Cornish
<p>I think the answer to this is yes, but can you confirm that the household enumeration weights (e.g. a_hhdenus_xw) take account of the design weights - i.e. they are the product of the design weight and a household response weight?</p> Understanding Society User Support - Support #2058 (Resolved): Using longitudinal weights when co...https://iserredex.essex.ac.uk/support/issues/20582024-02-22T16:48:24ZJames Laurence
<p>Hi there,</p>
<p>I was just hoping to get some more advice regarding correctly weighting my analysis combining the mainstage and Covid-19 waves of the UKHLS. You kindly helped with a previous weighting issue I had for treating the data as repeated cross-sections. However, I am also hoping to conduct some fixed effects panel data analysis of the combined mainstage and Covid-19 waves (web survey only).</p>
<p>As a basic set-up, I am combining wave 9 of the UKHLS mainstage survey (the last mainstage survey that doesn’t cover the pandemic) with waves 1 to 9 of the COVID-19 survey. The data are in long format. As I would like to do some fixed effects longitudinal analysis, I believe I need to use the longitudinal weights. From my reading, I need to choose the longitudinal weight from the last wave of the survey I will be using – in this case wave 9 of the Covid-19 survey: ci_betaindin_lw</p>
<p>Applying this weight [ci_betaindin_lw] will give me a balanced panel, restricting the sample to everyone who participated in all 9-waves of the Covid-19 survey. However, I would also like to analyse wave 9 of the mainstage survey as part of a longitudinal, fixed effects analysis covering mainstage wave 9 and Covid survey waves 1-9. Is this possible? If so, is one approach to feed back the ci_betaindin_lw weight so that the people who were in wave 9 of the mainstage survey who were also present in all 9-waves of the Covid-19 survey have the weight value of ci_betaindin_lw? Therefore, the ci_betaindin_lw weight would cover the mainstage wave 9 sample and the Covid-19 sample.</p>
<p>In case it’s not clear, to make-up an example of the data in long-format, which contains wave 9 of the mainstage survey and waves 1-9 of the Covid survey. Pidp no. 111111 was present in wave 9 of the mainstage sirvey and all 9 waves of the Covid survey and had a value of 1.5 for their longitudinal weight at wave 9 of the covid survey (ci_betaindin_lw). So, my data would just look like this:</p>
<p><strong>[PIDP]</strong> <strong>[WAVE] [Value of ci_betaindin_lw]</strong><br />111111 Mainstage wave 9 <em>Missing Value</em><br />111111 COVID wave 1 1.5<br />111111 COVID wave 2 1.5<br />111111 COVID wave 3 1.5<br />111111 COVID wave 4 1.5<br />111111 COVID wave 5 1.5<br />111111 COVID wave 6 1.5<br />111111 COVID wave 7 1.5<br />111111 COVID wave 8 1.5<br />111111 COVID wave 9 1.5</p>
<p>Is just feeding back the value of ci_betaindin_lw (1.5) what I need to do? So, it would now look like:</p>
<p><strong>[PIDP]</strong> <strong>[WAVE] [Value of ci_betaindin_lw]</strong><br />111111 Mainstage wave 9 <strong>1.5</strong><br />111111 COVID wave 1 1.5<br />111111 COVID wave 2 1.5<br />111111 COVID wave 3 1.5<br />111111 COVID wave 4 1.5<br />111111 COVID wave 5 1.5<br />111111 COVID wave 6 1.5<br />111111 COVID wave 7 1.5<br />111111 COVID wave 8 1.5<br />111111 COVID wave 9 1.5</p>
<p>If so, could this method apply if I wanted to include more mainstage waves of data? So, if I wanted to include waves 6, 7, 8 and wave 9 of the mainstage survey alongside waves 1-9 of the Covid survey - would I just feed back an individuals' weight value for ci_betaindin_lw back so the individual have that weight value for mainstage waves, 6, 7, 8 and 9?</p>
<p>I may be completely misunderstanding how to use the longitudinal weights, or have missed something crucial meaning you can't applying the Covid longitudinal weights to the pre-Covid survey mainstage waves. If so, apologies in advance and any advice would be hugely appreciated.</p>
<p>Best wishes,</p>
<p>James</p> Understanding Society User Support - Support #2040 (Resolved): Survey Weightshttps://iserredex.essex.ac.uk/support/issues/20402024-01-25T10:27:40ZMartha Tindall
<p>Hi</p>
<p>I am conducting an analysis an dam struggling to determine the best weights to use and was hoping you could give me some guidance. My analysis uses data from the years 2018 to 2021 (inclusive) to conduct a TWFE linear model. My model includes a main effects and interaction term involving a binary variable for pre-pandemic and during-pandemic. I have the following questions regarding weighting.</p>
<p>1. Currently my pandemic cut off is March 2020, given the term of interest involves time, is it necessary to start the year 2018 in March and extend the data to March 2022 to ensure there is equal representation of sample months in each group, or is it okay to just go January 2018 to December 2021 (keeping the cut off in March 2020)?</p>
<p>2. I wish to use an unbalanced panel design as the subgroups I when I use longitudinal weights, my sample becomes just 12% of what it would be using an unbalanced panel. My question is how do I choose these weights? Guidance on the Understanding Society website is for creating balanced panels and using _lw weights, however in my situation this is not possible. Is it appropriate to apply the cross sectional weight for each observation in a given wave or is there something else I should be doing?</p>
<p>3. On the Understanding Society website you mention rescaling of weights for analysing by calendar year. First, is this required in my situation? Second, do you provide guidance for doing so in R as the only advice available is for stata which I am not familiar with.</p>
<p>Thank you in advance for your time and please let me know if you need any more information from me.</p>
<p>Martha</p> Understanding Society User Support - Support #2012 (Resolved): longitudinal weighthttps://iserredex.essex.ac.uk/support/issues/20122023-12-14T15:03:18ZMargherita Agnoletto
<p>Dear Understanding Society Team,</p>
<p>I am currently examining the relationship between flexible work arrangements (FWA) and some employees' outcomes.</p>
<p>Given that questions about FWA are asked every two waves, I have chosen to conduct a longitudinal analysis (FE) using waves 2, 4, 6, 8, and 10. Some of my outcomes come from the self-completion questionnaire. <br />As I understand, it is recommended to use the appropriate longitudinal weight from the last wave in my analysis (i.e. i_indinus_lw). However, I observe a significant loss of observations. <br />Given that my panel is unbalanced, could I use the corresponding longitudinal weight from the last available wave for each individual? For instance, if an individual 'i' has information until wave 8, I propose imputing the appropriate longitudinal weight from wave 8. Similarly, if individual 'k' has information until wave 6, I suggest imputing the weight from wave 6.</p>
<p>Thank you for your attention.</p>
<p>Kind regards</p> Understanding Society User Support - Support #2006 (Resolved): Longitudinal analysis using calend...https://iserredex.essex.ac.uk/support/issues/20062023-12-12T13:52:21ZMarina Kousta
<p>Hello,</p>
<p>I am reaching out to kindly request help on how to conduct longitudinal analysis using calendar year datasets.<br />1) Although online you state the published calendar year data are meant to be used for cross-sectional analysis, does that also stand for when we create our own calendar year datasets? Or is it meant to be a guidance only for when you release the pre-made calendar year data? If that is the case regardless, is there some way for us to still conduct longitudinal analysis after creating our own calendar year data?<br />2) Although you recommend using the w_month (sample month) to create calendar year data, would it still be ok to instead use the interview date instead, when the exact date is of great importance to the research question itself (i.e. when testing the introduction or removal of a social policy).</p>
<p>Many thanks in advance for your time and consideration.</p>
<p>Best wishes,<br />Marina</p> Understanding Society User Support - Support #2004 (Resolved): Selection of weightshttps://iserredex.essex.ac.uk/support/issues/20042023-12-11T16:11:12ZJoanna Clifton-SpriggJ.M.Clifton-Sprigg@bath.ac.uk
<p>Hello,</p>
<p>I am looking to use information on new parents (newmum/newdad), specifically dates of leave taken when child was born, in a difference in difference approach around the shared parental leave reform (2015).</p>
<p>Essentially, I will be comparing cohorts of parents who had a child before & after the reform. I will not be following specific parents longitudinally, at least not for the first part of the project.</p>
<p>I would like to run this analysis in calendar years, not waves, given that the reform happened in April 2015 & I will be comparing those with children born pre-April 2015 and post.</p>
<p>I have pooled waves 2-12 data and set this up in a long format. Now I am wondering what weights to apply.</p>
<p>1) Am I correct in thinking in this scenario cross-sectional weights will work? I would like to preserve as big a sample as possible as even without weighting sample size is a challenge.</p>
<p>2) If I can use cross-sectional weights, how can I apply them to this pooled data file, which includes waves 2-12? It is not clear to me from the user guide.</p>
<p>3) At which stage do I adjust for the calendar year analysis?</p>
<p>Thank you.</p> Understanding Society User Support - Support #1985 (Resolved): Representativeness of housing tenu...https://iserredex.essex.ac.uk/support/issues/19852023-10-24T13:13:42ZEoghan O'Brien
<p>I am looking at wave 11 responses in the hhresp table for the breakdown of housing tenure (tenure_dv) at the household level.</p>
<p>The screenshots attached include the % of each category (unweighted and weighted using "hhdenui_xw").</p>
<p>Comparing these figures with census results for tenure status in England and Wales (% of households by tenure), it appears that the number of private renters (in USoc "Rented private unfurnished" and "Rented private furnished" appears to be under represented (11.7% when weighted) relative to the census figures for England and Wales in 2021 (20.3%). I have tried limiting the USoc sample to just England and Wales household, but it does not materially change the results.</p>
<p>Link to census data here: <a class="external" href="https://www.ons.gov.uk/peoplepopulationandcommunity/housing/bulletins/housingenglandandwales/census2021">https://www.ons.gov.uk/peoplepopulationandcommunity/housing/bulletins/housingenglandandwales/census2021</a></p>
<p>Any info on why I may be finding this discrepancy would be very much appreciated.</p> Understanding Society User Support - Support #1982 (Resolved): reference person weights https://iserredex.essex.ac.uk/support/issues/19822023-10-12T11:19:12ZAmelia Wattsamelia.watts678@outlook.com
<p>Dear Olena/support team,</p>
<p>I'm selecting reference persons from households across waves to form a panel. Can the individual longitudinal weights for these respondents in the last wave be used as suboptimal weights in the analysis?</p>
<p>Many thanks, <br />Amelia</p> Understanding Society User Support - Support #1975 (Resolved): Weights - Cross-sectional Analysis...https://iserredex.essex.ac.uk/support/issues/19752023-09-19T09:55:04ZCaitlin Schmid
<p>Good morning,</p>
<p>Using the main survey, I aim to run a cross-sectional analysis on a number of variables to analyse sex differences between adults and their variation across Local Authority Districts. To increase the sample sizes, I want to pool UKHLS Waves 11 and 12. Do I require tailored weights or can I proceed with the two provided cross-sectional adults weights of the respective waves (_indinui_xw)?</p>
<p>Many thanks and best wishes,</p>
<p>Caitlin</p> Understanding Society User Support - Support #1913 (Resolved): Weighting with analysis of a datas...https://iserredex.essex.ac.uk/support/issues/19132023-06-05T16:01:28ZLaura Joneslaura.jones@nesta.og.uk
<p>Dear US team,</p>
<p>My question is about weighting when carrying out analysis with infants (0-2).</p>
<p>I want to carry out an analysis estimating the percentage of one and two year olds in England living in families who will be eligible for the new childcare policy announced in the Spring budget.<br />Eligibility depends on their parent's income.</p>
<p>My plan was to create a dataset of 0-2 year olds in England by using the information from j_indall, filter on country of residence (England only), then add information on their parent's income from j_indresp.</p>
<p>If I then weighted using j_psnenui_xw would this be the correct way to provide accurate population estimates of 0-2 year olds living in England? Or is a different weight required?</p>
<p>Thanks in advance for any help.</p>
<p>Laura</p> Understanding Society User Support - Support #1908 (Resolved): Weights using the BHPS Consolidate...https://iserredex.essex.ac.uk/support/issues/19082023-05-26T21:16:07ZNatalia Carralero
<p>Hello. I am studying differences in single/partnered parents. To do so, I am using the British Household Panel Survey Consolidated Marital, Cohabitation and Fertility Histories (1991-2009) to identify my sample of single/non-single parents, and then, merging it with the BHPS individual questionnaire to get the relevant variables.<br />My question is, which weights should I be using? I was thinking on indin91_lw, but I am not entirely sure. <br />Besides, which type of weights are they? Frequency or analytic weights? <br />Thank you!</p> Understanding Society User Support - Support #1907 (Resolved): Weighting advice for information f...https://iserredex.essex.ac.uk/support/issues/19072023-05-26T10:33:43ZPeter Humphreys
<p>Hi there</p>
<p>So I'm using the indresp dataset and looking at adult respondents. My population is respondents who are classed as economically inactive in wave 12 of the UKHLS (I recoded some variables to create a definition of economic inactivity) and I'm looking to calculate some descriptive statistics on variables for when these respondents were last in work (sector of last job worked, hours worked per week, average wage, etc.)</p>
<p>To do this analysis, would the correct weight be the indinus_lw value from wave 12?</p> Understanding Society User Support - Support #1904 (Resolved): Using weights when variables have ...https://iserredex.essex.ac.uk/support/issues/19042023-05-17T09:59:53ZRichard Belcher
<p>Dear Olena,</p>
<p>I am running a pooled cross sectional individual level analysis (waves 1-9), using cross-sectional weights, but I am worried that by removing cases where sf-12 responses are -9, my sample is no longer nationally representative.</p>
<p>I am selected weights in my analysis that are appropriate for how the questions leading to the variables I want to use were administered. E.g. I am using self-completion questionnaire cross-sectional weights (waves 2+) as sf-12 is my dependent variable of interest in later models (weight appropriate for each wave). After aggregating the cross wave data there are a number of cases where cases with non-zero weights have missing value codes attached to them (or are NA due to me merging in household level data which is occasionally not collected). It is understandable that errors and non-response happens during the survey process. Am I safe to assume that some are random, e.g. the lack of household interviews being undertaken is random, so it wouldn't impact the weighting removing responses without that information. I am however worried that some may not be random and there may be some demographic or regional bias to -9 codes in the sf-12 variable, which prevent my sample from being nationally representative when weighted. I have 96% of the non-zero weighted samples remaining after removing those with errors, most of the reduction (3%) comes from "sf12mcs_dv" responses with the code -9.</p>
<p>Thanks for your help,</p>
<p>All the best,</p>
<p>Richard</p> Understanding Society User Support - Support #1903 (Resolved): Weighting guidance for intergenera...https://iserredex.essex.ac.uk/support/issues/19032023-05-16T16:34:06ZEsme Lilly
<p>Hi, I am doing research into the role of education in intergenerational income mobility, and I had some questions on tailoring my own weights. <br />My sample restrictions are individuals with atleast one income observation aged 25+, born between 1975-1995 with atleast one observed parental household income aged 0-16, restricted to only one observation per household (youngest eligible sibling) from <strong>the original and booster BHPS samples</strong> . Children are matched to their parents through the alterego file. My sample spans all 30 waves of the BHPS & USoc.</p>
<p>My main variables of interest are parental household income and offspring individual income, and the highest educational attainment of the offspring. To measure income, I take an average of income observations during aged 25+ and parental income observations aged 0-16, making it a pooled cross-section.<br />As my sample is already quite small based on these sample restrictions (N=1,392), I want to retain as many observations as possible. As my analysis is not longitudinal, the longitudinal weight provided is sub-optimal for me because I want to include respondents even if they haven’t responded in the most recent wave or have missed waves. I am not interested in change over time.</p>
<p>I have been reading the FAQs and other questions on the forum to help with tailoring my own weights, but I still had a few questions at the different stages of tailoring my own weights. I followed the advice from <a class="issue tracker-3 status-5 priority-7 priority-highest closed" title="Support: Weights, BHPS and USoc pooled (Closed)" href="https://iserredex.essex.ac.uk/support/issues/658">#658</a> and on the online course and weighting FAQ.<br />First, to create my base weight, I took the cross-sectional weight for everyone with atleast one income observation aged 25+, born between 1975-1995, as this is the sub-population.</p>
<ol>
<li>1) Is it ok that the cross-sectional weights I take are USoc cross-sectional weights, despite me only including offspring from the BHPS samples?</li>
<li>2) Why do some people have a cross-sectional weight of zero in the BHPS and then a non-zero weight in USoc? If I’m trying to maximise my sample, can I take the first weight they have observed past the age of 25 that is non-zero?</li>
</ol>
<p>However, my colleagues suggested that I should actually do the weighting based on parental characteristics. That is to say, the base sample should be all parents who responded in the original and booster samples, and then model the nonresponse based on the substantial attrition of how many offspring have an income observation aged 25+. My problem with this being that the base weight wouldn’t account for potential parents who went on to have children but left the survey beforehand. My questions are:</p>
<ol>
<li>3) Is it ok to use people who answer the survey 25+ as the base weight, or would it be better to use people who had children from the original and booster samples as my base weight? What would the difference between these two be?</li>
</ol>
<p>Finally, after rescaling the weights and taking mortality into account, I adjust for non-response. I fit a logistic regression model and tested the following predictors: sex, region, race, migrant status, quintile of income and education, and the wave they first responded over the age of 25. Variables of income, education and region are from the wave they are first observed over 25. When excluding the first wave they respond over 25, all other predictors are highly significant, but when including it, they are all less significant and sex is not significant.</p> Understanding Society User Support - Support #1902 (Resolved): weights individual files waves 10 ...https://iserredex.essex.ac.uk/support/issues/19022023-05-15T13:20:37ZAelen Valen
<p>Hi,</p>
<p>I am trying to merge individual files across waves 10 and 11 into wide format to create a 2019 calendar year dataset.<br />I used this method from "Box 1: Example syntax for pooled analysis for cross-sectional estimation relating <br />to calendar year 2011, with weight re-scaling" in <a class="external" href="https://www.understandingsociety.ac.uk/sites/default/files/downloads/documentation/user-guides/mainstage/weighting_faqs.pdf">https://www.understandingsociety.ac.uk/sites/default/files/downloads/documentation/user-guides/mainstage/weighting_faqs.pdf</a></p>
<p>ge wts=0 <br />replace wts=indpxui_xw if month>=13 & month<=24 <br />ge ind=1 <br />sum ind [aw=indpxui_xw] if month>=1 & month<=12 <br />gen jwtdtot=r(sum_w) <br />sum ind [aw=indpxui_xw] if month>=1 & month<=12 <br />gen kwtdtot=r(sum_w) <br />replace wts=indpxui_xw*(jwtdtot/kwtdtot) if month>=1 & month<=12</p>
<p>For the purpose of the research I am working on, I am using the equivalised household income and other variables referring to parental occupation, education and place of birth.</p>
<p>Since I am using it together with EUSILC 2019 for different EU countries, I was comparing the weights with the weights in EUSILC. While the sum of the weights in the latter equals on average the 80% of the real population in each country, the sum of weights of the dataset I created for UK 2019 (with the merge of wave 10 and 11) gives a number way lower than the census 2019 UK population.</p>
<p>Could you please help me understanding how those weights are constructed, which characteristics of the population they consider, whether they can comparable to ones in EUSILC and whether the procedure I followed to merge the two waves is correct. <br />Many thanks in advance for the support!</p>