Understanding Society User Support: Issueshttps://iserredex.essex.ac.uk/support/https://iserredex.essex.ac.uk/support/support/favicon.ico?15995719382022-10-27T09:34:20ZUnderstanding Society User Support
Redmine Understanding Society User Support - Support #1794 (Feedback): Weights https://iserredex.essex.ac.uk/support/issues/17942022-10-27T09:34:20ZCaroline Kienast von Einem
<p>Hi team,</p>
<p>I have a question about applying survey weights: <br />When I am applying the weights do I have to do this via specific commands e.g. svy prefix in stata or is it also possible to multiply my varaible of interest with the weight to create a new weighted variable that I could then use alongside commands that cannot be combined with survey weights directly.</p>
<p>Thank you for your help.</p>
<p>Best wishes, <br />Caroline</p> Understanding Society User Support - Support #1786 (Feedback): Weights for longitudinal studyhttps://iserredex.essex.ac.uk/support/issues/17862022-10-17T16:41:52ZConnor Gascoigne
<p>Hi Olena,</p>
<p>I am currently looking to see the effect of government policy on the mental health of people in England, Scotland, and Wales. I drop Northern Ireland (NI) since a dataset I combine with at a later stage does not include information for NI. As part of that, I am performing a longitudinal analysis using the Waves 1-11 from the UKHLS to produce an estimate for each individual in the survey. I wish to take the individual level estimates and aggregate them to produce national and regional level estimates. To make sure the aggregated estimates properly account for the survey design, non-response, and any additional stratification, I plan on using the survey weights to produce the initial estimates.</p>
<p>I have seen there are two types of weights I can use: longitudinal and cross sectional.</p>
<p>For the longitudinal weights, I believe I would take the most recent surveys weight. For me, this would be k_indinus_lw. I then attached this weight to all the individuals for all waves (i.e., the weight for an individual is their k_indinus_lw weight for all waves). From this arises my first questions:</p>
<p>(Q1). I believe the weighting is weighted to include those individuals in NI as well. Since I do not consider these individuals in my analysis, will the fact I remove these individuals from the data set affect the weighting? Alternatively, do I have to alter the weights when I remove the respondents from NI?</p>
<p>(Q2). If a respondent lives in, say, Scotland, then I will include their response in the survey. If between waves 6 and 7, they move to NI, then due to the way I sort the data I will remove their responses from wave 7 onwards. Much like in (Q1), do I need to account for this by altering the longitudinal weights.</p>
<p>A main benefit of the longitudinal weights is the creation of a balanced dataset. From (Q2) (and similar examples where an individual’s change in response means I drop them), I create an unbalanced dataset. This got me wondering if I would be better to use the cross-sectional weight and then pool them to create my own set of weights. This is because it would be useful to still include an individual’s response even if they do not respond to all the surveys and this is essentially what I am doing for the example in (Q2). If this is the case:</p>
<p>(Q3). Because the naming convention changes for Wave 1, 2-5 and 6+, could I confirm if the weights I would need to make a pooled weight would be a_indinus_xw, b_indinub_xw, c_indinub_xw , d_indinub_xw, e_indinub_xw, f_indinui_xw, g_indinui_xw, h_indinui_xw, i_indinui_xw, j_indinui_xw, and k_indinui_xw'</p>
<p>(Q4). If these are the correct cross-sectional weights, what is the best way to go about making a pooled weight? I work in R and have my data organised into long format where each individual has one row per wave and a column for each of the above weights. Due to this, in each weighting column there is an NA for all rows except the rows relating to the wave for that weight.</p>
<p>I apologise for such an involved question, and I hope I have managed to explain myself in a manner that is understandable - if I need to better explain myself, then I am more than happy to do so. If you can give me any guidance at all, I would really appreciate it!</p>
<p>Thank you in advance for any help!</p>
<p>Kind regards,<br />Connor</p> Understanding Society User Support - Support #1743 (Feedback): Averaging regional data to obtain ...https://iserredex.essex.ac.uk/support/issues/17432022-08-05T10:14:30ZCarolin Schmidtcs2100@cam.ac.uk
<p>Hi there,</p>
<p>I am using wave 6 to study household heads' homeownership probabilities. I am looking at native Brits and immigrants (I came up with an immigrant dummy for every household head).</p>
<p>I would now like to generate a control variable for each of my household heads: the variable should reflect the proportion of immigrants in the UK region where the person resides (that is, every household head in e.g London will have the same immigrant share attached, etc.). I am wondering how I should calculate that average: does it have to be weighted (i.e. egen immishare = wtmean(immigrant), weight(indscui_xw) by(region) using the gwtmean package which calculates weighted statistics)? I would think so, because without weighting it, I would have an average immigrant share based on the (not-per-se representative) raw data. However, if I calculate a weighted mean, then I would effectively double-weight the data because the regression itself would be weighted too, no?</p>
<p>I am unsure how to proceed and would appreciate any help.</p>
<p>Best wishes,<br />Carolin</p> Understanding Society User Support - Support #1664 (Resolved): IEMB sample when combining data to...https://iserredex.essex.ac.uk/support/issues/16642022-03-02T18:15:01ZDorothee Schneider
<p>Dear UKHLS team,</p>
<p>I am trying to understand, based on Q11 and Q12 of the Weighting FAQ document, whether I can include the IEMB sample when pooling the data into financial years for cross-sectional analysis. Our population of interest are adults aged 18+ in England that have a common mental disorder (proxied by GHQ-12 score). I hope we can include the IEMB sample to help with numbers of observation when analysing by ethnicity.</p>
<p>I've pooled the data following Q12: months 4 to 15 from wave n, months 16 to 24 from wave n-1 and months 1-3 from wave n+1. Q12 says "data ... can be combined for cross-sectional analysis, provided that each of the 24 monthly samples is included in the analysis base an equal number of times". In the case of financial years, each has 12 sample months with IEMB (months 13 to 15 from wave n, and months 16 to 24 from wave n-1) and 12 sample months without IEMB (months 4 to 12 from wave n, and months 1 to 3 from wave n+1), so is equivalent to the original waves. My assumption was that (after adjusting the weights following the code on p10) I can use all subsamples including IEMB.</p>
<p>However in Q11, which is about calendar years or months, the advice is to exclude the IEMB because it is only part of months 13 to 24. For the BHPS and IEMB samples the advice is to use longitudinal weights to exclude them, but it seems one can use the Northern Ireland sample. Why is this, I don't understand the difference between these samples?</p>
<p>I also don't understand the example given for Northern Ireland weight adjustment: "please note that if you use months 13-24 you are excluding Northern Ireland from your analysis. If you use months 1-12 Northern Ireland will be over-represented without an additional adjustment to the weight. Here is the Stata syntax for adjustment if you use month 1-12: (...)." This sounds like using only sample months 1-12 (i.e. year 1) without months from year 2, which I thought I understood from Q12 shouldn't be done? Else, if it means using months 1-12 as part of a dataset pooling year 1 + year 2 sample months from different waves, then why do the Northern Ireland cases need extra adjustment? I seem to be missing something which might also help me understand if I can include the IEMB in my analysis sample.</p>
<p>Best wishes<br />Dorothee</p> Understanding Society User Support - Support #1624 (Resolved): Weights for subsamplehttps://iserredex.essex.ac.uk/support/issues/16242022-01-06T14:49:27ZAshley Burdett
<p>Hello,</p>
<p>I am trying to estimate the fraction of people that transition to their first relationship (cohabitation or marriage) by age using the BHPS.</p>
<p>To do this I have constructed an unbalanced panel containing observations for individuals who have never had a relationship (marriage or cohabitation) before. Precisely I use observations for individuals that did not report a relationship in the marital history datasets but provided a full response to the wave 2 main survey. I also include observations for individuals that aged into the sample during the panel to increase my sample size.</p>
<p>I include observations for these individuals up until either they form their first relationship, they have a missing observation or the survey ends (2008).</p>
<p>Using this sample, I simply calculate the fraction of individuals observed at each age that transition to their first relationship at that given age.</p>
<p>My question is how do I appropriately incorporate weights into this analysis? I have tried numerous ways of approaching this problem and get very different results each time.</p>
<p>Many thanks in advance for your help.</p>
<p>All the best,</p>
<p>Ashley</p> Understanding Society User Support - Support #1239 (Resolved): Using weights on a subsample of UKHLShttps://iserredex.essex.ac.uk/support/issues/12392019-09-07T12:31:55ZAmanda Moorghen
<p>Hi,<br />I am running analysis (logit) on a subsample of UKHLS - wave 6 only, people under the age of 30.</p>
<p>I am using the following weights: <br />svyset f_psu [pweight=f_indinui_xw], strata (f_strata) singleunit(centered)</p>
<p>I wanted to check that this was the correct approach? I am unsure whether the weights should be used in the same way for a subsample of UKHLS as if you were analysing the full sample.</p>
<p>Thanks<br />AM</p> Understanding Society User Support - Support #987 (Resolved): Weighting of sub-samplehttps://iserredex.essex.ac.uk/support/issues/9872018-06-26T22:51:48ZAnte Bab2242@cam.ac.uk
<p>Dear Sir or Madam,</p>
<p>I would like to compare the means of several variables of a sub-sample (e.g. income, education) after data cleansing with those of the initial sample to test for representativeness of the sub-sample. If all variables are from the same wave (i.e. wave 4 of the UKHLS), cross-sectional weights can be applied. However, the sub-sample contains two variables that were not surveyed in wave 4, so they were carried forward from wave 1 and 3. Should in this case the variables for the comparison be weighted with the longitudinal weights of the last wave (i.e. wave 4) or should cross-sectional weights be used (i.e. cross-sectional weights from wave 1 and 3 for the two carried-forward variables and for the remaining variables, cross-sectional weights from wave 4)? The variables are from household level questionnaires and self-completion interviews, so that the lowest level of hierarchy is 1, which would suggest to use d_indscus_lw if longitudinal weights are appropriate? Do you agree?</p>
<p>Thank you for your help.</p>
<p>Best regards<br />Ante</p> Understanding Society User Support - Support #932 (Resolved): Weightinghttps://iserredex.essex.ac.uk/support/issues/9322018-03-02T13:09:17ZElena Cora Magrinie.magrini@centreforcities.org
<p>Hi,</p>
<p>I am using data from different waves to see how job status changes over time (will be mostly working with variable jbstat) in different areas, mostly trying to compare cities with the rest of the country (we got access to place information through the special license).</p>
<p>It's just going to be a very simple model, in which it's very likely we will drop Northern Ireland. I assumed the only weight we would be interested in is something related about the assumption that people living in cities are the same of those who don't, but not sure how to do that.</p>
<p>Do I need to do any weighting? And if so, which one do I need to use?</p> Understanding Society User Support - Support #886 (Closed): Zero weights and statistical powerhttps://iserredex.essex.ac.uk/support/issues/8862017-12-04T17:45:15ZEric Emersoneric.emerson@lancaster.ac.uk
<p>Hi</p>
<p>I'm interested in data contained the harassment modules (in Waves 1, 3, 5 and 7), but am concerned about the significant reduction in statistical power arising from the increasing proportion of respondents who are assigned values of 0 in w_ind5mus_xw. I understand from a previous thread (<a class="issue tracker-3 status-3 priority-5 priority-high2" title="Support: weights for pooled cross-sections over waves (a)-(f) (Resolved)" href="https://iserredex.essex.ac.uk/support/issues/877">#877</a>) that ..... 'The provision of weights requires the ability to estimate probabilities of continuing to respond over multiple waves. This is true of cross-sectional weights as well as longitudinal ones, as they are derived from the longitudinal ones (how this was done is described in section 3.8.3.10 of the User Guide). In consequence, a person in a household where there is no person who has been enumerated at every wave up to wave w will get a weight of zero. Such people should not be given a weight, as the weights for all other sample members are calculated in a way that compensates for these "missing" people.'</p>
<p>However, the 'compensation' appears to also result in a significant loss of statistical power. Taking as base the unweighted number of respondents who provide a valid answer to the 'attacked' items, the weighted population size has reduced from 92% of actual respondents in W1 (7418/8072) to just 27% in W7 (2711/9973). The resulting reduction in power is of concern and given the rationale outlined above, will continue to increase over time as the % of households in which someone has been enumerated at every wave will continue to diminish. It also seems rather wasteful of people's time that the responses of the majority of participants is, through the weighting process, assigned to a statistical waste bin!</p>
<p>Be very grateful if you could suggest any ways round this problem.</p>
<p>Many thanks</p>
<p>Eric</p> Understanding Society User Support - Support #498 (Closed): weight youth self-completion + adult https://iserredex.essex.ac.uk/support/issues/4982016-02-04T10:02:59ZCarolina Zuccotticarolina.zuccotti@eui.eu
<p>Hello,<br />I would like to follow individuals (14-15 yrs) who completed the self-completion youth questionnaire into the adult questionnaire (16+). I am interested in the questions on parental involvement and how this affects their adult outcomes.<br />How should I weight this?<br />Let's say that I consider 14-15 yrs individuals in wave 1 and I follow them in wave 2 (and/or 3).<br />Many thanks,<br />Carolina</p> Understanding Society User Support - Support #456 (Closed): comparing across waveshttps://iserredex.essex.ac.uk/support/issues/4562015-11-27T16:55:32ZCarolina Zuccotticarolina.zuccotti@eui.eu
<p>Hello,<br />I wanted to know if it is possible to compare the effect of a variable in wave 1 with its effect in wave 5.<br />For example, has education a stronger effect in the probabilities of employment in 2009/2010 than in 2013/14?<br />At the naked eye, there seems to be a difference in the effect across waves. However, do you know if there might be a way to actually test this?<br />I would need to pool waves I assume. In that case, how should I weight the cases?<br />Many thanks in advance.<br />Carolina</p> Understanding Society User Support - Support #440 (Closed): Longitudinal Regression Analysis Weightshttps://iserredex.essex.ac.uk/support/issues/4402015-11-01T18:55:41ZEsther Afolalue.f.afolalu@warwick.ac.uk
<p>Hello. I am working on the understanding society database looking specifically at the self-completion questionnaire data for the sleep and health questions. I am carrying out a longitudinal regression analysis to explore the association between change in individual sleep status on the health outcomes from wave 1 – wave 4 controlling for a number of other variables. I just wanted to double-check which longitudinal weight I should apply to the regression analysis – I am thinking ‘d_indscus_lw’? And for descriptive statistics to describe the initial sample at wave one, would I just use the ‘a_indscus_xw’ weighting?</p>
<p>Also, if I wanted to incorporate nurse assessment CRP biomarker data at Wave 2 as a mediator or examine the association from Wave 1 sleep status to Wave 2 biomarker status, which weighting would I apply in this case 'b_indnsus_lw'? And lastly, is there a weighting that’s applicable perhaps to look at the association from Wave 2 biomarker status to Wave 4 sleep?</p>
<p>Thank you,<br />Esther.</p> Understanding Society User Support - Support #414 (Closed): Weights for unbalanced panelhttps://iserredex.essex.ac.uk/support/issues/4142015-09-10T17:44:42ZEwan Carrewan.carr@ucl.ac.uk
<p>My question is very similar to <a href="https://www.understandingsociety.ac.uk/support/issues/393#change-1207" class="external">this one</a>, posted last month.</p>
<p>I am estimating a random intercept logistic regression model, drawing upon all available data in the BHPS and US samples. Since I am using repeated measures data, from multiple waves, my understanding was that I would need to apply longitudinal weights to this analysis (specifically, "the weights from the last wave of any longitudinal sequence"; A5-1).</p>
<p>However, I want to use all available data. I do not want to limit the analysis to individuals with full information (i.e. complete cases) between BHPS wave 1 and US wave 4.</p>
<p>In the response given here, it was suggested that the cross-sectional weights should be used in this situation:</p>
<blockquote>
<p><cite>The last of these (unequal inclusion probabilities) is what weights are designed to deal with. <strong>I suggest that for each observation you use the relevant cross-sectional weight.</strong> That should correct for design probabilities and non-response.</cite></p>
</blockquote>
<p>However, this seems to go against the advice given in the User Guide (i.e., to use the longitudinal weights).</p>
<p>Can I confirm, therefore, that this approach (i.e. to apply the relevant cross-sectional weights) is a suitable strategy when treating the BHPS/US as an unbalanced panel?</p>
<p>Many thanks in advance.</p> Understanding Society User Support - Support #412 (Closed): Weights for BHPS and Understanding So...https://iserredex.essex.ac.uk/support/issues/4122015-09-09T13:04:23ZAndreas Wiedemannawiedem@mit.edu
<p>Hello,</p>
<p>I’ve merged the BHPS with the BHPS-subset of Understanding Society to create a longitudinal panel of BHPS respondents up until 2012 (i.e. I use the BHPS portion of Understanding Society). I am not entirely sure which weights I should use for the analysis. I’ve read the documentation of both dataset, but it is still not clear which weights are the best for my purpose. My goal is to re-create the same underlying population in both datasets, either for the UK or GB. Most importantly, however, I want to be consistent across these two dataset in order to analyze trends in, e.g., income over a time span covering both datasets. Most of my variables of interest are at the household level, but some are at the individual level. <br />Should I use the longitudinal BHPS weights (indin91_lw for individuals or the cross-sectional hhdenbh_xw for households)? And do I have to use weights only in the Understanding Society-part or also in the BHPS part of my panel.</p>
<p>Many thanks for your help,<br />Andreas</p> Understanding Society User Support - Support #291 (Closed): USOC cross-section household weight f...https://iserredex.essex.ac.uk/support/issues/2912014-07-29T10:19:38ZAlex Hurrellalexhurrell1@gmail.com
<p>I'm looking for the BHPS-sample cross-sectional household weight (n_hhdenbh_xw) for USOC wave 3. It is available in wave 2 but I can't find it in wave 2. Does it exist? If not, is it possible to derive it?</p>
<p>Many thanks,<br />Alex</p>