Understanding Society User Support - Support #357: cluster variable</h1> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-13T08:06:02Z</p> <ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>50</i></li></ul><p>The weights are needed for the point estimates, while the PSU and strata are needed for the confidence intervals. More on this can be found in the Understanding Society user guides and training course materials.<br /><a class="external" href="https://www.understandingsociety.ac.uk/documentation/mainstage">https://www.understandingsociety.ac.uk/documentation/mainstage</a><br /><a class="external" href="https://www.understandingsociety.ac.uk/2015/03/12/stata-training-course-online">https://www.understandingsociety.ac.uk/2015/03/12/stata-training-course-online</a><br />On behalf of the team,<br />Jakob</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-13T09:50:18Z</p> <ul></ul><p>Most statistical packages assume that the data is a simple random sample. But if it is not, as in the case of the BHPS, that needs to be specified in order to correctly estimate the standard errors (and hence confidence interval). As these are household level variables these are provided in the whhsamp file. But these should be used in individual level analysis as well.</p> <p>So, to summarize, you should merge the wpsu and wsrtata variables from whhsamp file into the windresp file and then specify this information in your analysis (e.g., in the svyset statement if using Stata).</p> <p>Depending on you sample, you may want to additionally consider that individuals are clustered within households.</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-14T07:45:06Z</p> <ul></ul><p>Dear Alita,</p> <p>Thank you for your answer.</p> <p>It is not totally clear to me how this procedure of weighting using longitudinal data will work in case I would use a subsample of the data. Would it still work?<br />The meaning of the longitudinal weights is to make "as if" there were no attrition?</p> <p>In addition to that, as you mentioned, it seemed to me that there have been 3 stages in the selection of the final respondents set. Yet, I can only find one "strata" variable. Which are the others?</p> <p>thank you in advance, <br />Elisa</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-15T10:20:18Z</p> <ul></ul><p>If in the documentation it has been specified that a specific weight is for the Original "Essex" Sample - it means you must use the Original "Essex" Sample or its sub-samples (such as only men, only 16-59 year olds etc) only. You cannot use the Extension/Boost samples with this weight. See Table 25 of the Volume A User Guide to see which weights are to be used with which sample.</p> <p>"The meaning of the longitudinal weights is to make "as if" there were no attrition?" <br />Yes, basically it makes the estimates representative of the population who were alive and residing in UK during the sample period.</p> <p>"In addition to that, as you mentioned, it seemed to me that there have been 3 stages in the selection of the final respondents set. Yet, I can only find one "strata" variable. Which are the others?"</p> <p>First Postcode sectors were selected, then from those approx 33 addersses were selected, within these upto 3 dwelling units were randomly selected (if there was more than 1 dwelling unit -rarely) and from each dwelling unit upto 3 households were selected (if there was more than 1 HH - rarely). So, using the wpsu represents this clustering. If you are doing individual level analysis, in households with more than one person, there will be additional clustering - the household identifier represents the clustering at HH level.</p> <p>The wstrata variable represents stratification not clustering.</p> <p>Alita</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-16T09:22:17Z</p> <ul></ul><p>Dear Alita,</p> <p>I am using the entire sample (with all the boosting) and I guess I will consider a subsample (age 40+). Since the analysis is longitudinal, I guess I should use longitudinal weights: I had a look at table 25 and it is not clear to me what does "wLRWTUK1 from latest wave in longitudinal sequence" means. In addition to that, I noticed that several of the longitudial weight have value=0, is this ok?</p> <p>This is the svyset command I am using:</p> <p>vyset psu [ pweight=xrwtuk1], strata (strata)</p> <p>Is this ok, or since I am doing an analysis at individual level, should I specify "hid" like this?</p> <p>svyset psu [ pweight=longitudinalweights], strata (strata) || hid</p> <p>as a result I got this:</p> <p>. svyset psu [ pweight=finlwght], strata (strata) || hid<br />Note: stage 1 is sampled with replacement; all further stages will be ignored</p> <pre><code>pweight: finlwght<br /> VCE: linearized<br /> Single unit: missing<br /> Strata 1: strata<br /> SU 1: psu<br /> FPC 1: <zero></code></pre> <p>thank you for your help.</p> <p>Elisa</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-16T12:12:52Z</p> <ul></ul><p>In addition, my variable of interest is "save": sometimes the variable takes value -7 since this specific obs was a proxy and the question was not answered.<br />My first attempt was to drop all the save==-7 observation. Now I am wondering if the longitudinal pweight would take this into consideration: I mean, I could simply leave the save==-7 in the sample and applying the longitudinal weight they would eliminate this values.</p> <p>Is this true?</p> <p>Kind regards, Elisa</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-21T13:02:19Z</p> <ul></ul><p>Response to #5</p> <p>If you are using 40+ year olds of all the 4 samples taken together then use wLRWTUK1. If the last wave of data you are using is wave 18, then the weight you should use is rLRWTUK1. But note that this longitudinal weight, rLRWTUK1, is non-zero for all those who responded continuously from the wave 11 to 18, zero otherwise. So, anyone who did not respond between these waves even once will have a zero weight. This is also zero for proxy and telephone respondents. Yes, if you left proxy and telephone respondents in (save=-7) then they will be effectively "dropped" from the analysis as their weight=0. But it may be better to drop them from your sample before you start the analysis so as to not get sample descriptives of the wrong sample. See section V of the Vol. A User guide.</p> <p>hid is not unique across waves and so cannot be used in its current form in longitudinal analysis.</p> <p>More on "Analyzing Correlated (Clustered) Data" <a class="external" href="http://www.ats.ucla.edu/stat/stata/library/cpsu.htm">http://www.ats.ucla.edu/stat/stata/library/cpsu.htm</a></p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-04-29T10:48:01Z</p> <ul></ul><p>Dear Alita, reading the volume A of the BHPS I have noticed that the sample for the BHPS was obtained through a 3 stages selection procedure. Yet, I am aware of just one sampling unit variable (PSU) which I believe to be the stage 1 sampling unit.</p> <p>So, as I wrote, I declared to be data to be a survey in this way:</p> <p>svyset psu [pweight=finxwght], strata (strata)<br />Note: stage 1 is sampled with replacement; all further stages will be ignored</p> <p>pweight: finlwght<br /> VCE: linearized<br /> Single unit: missing<br /> Strata 1: strata<br /> SU 1: psu<br /> FPC 1: <zero></p> <p>This works in principle but if I run regressions using svy, stata does not compute standard errors and gives this warning "Note: missing standard error because of stratum with single sampling unit". I am not fully sure but I guess that the problem comes from the fact that I could not specify the fact that the survey was a 3 stage design.</p> <p>So, my question is: where can I find the stage 2 and stage 3 sampling units? And are there any variables for stages finite population correction?</p> <p>kind regards, <br />Elisa</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-05-11T23:18:06Z</p> <ul></ul><p>"Note: missing standard error because of stratum with single sampling unit". This happened because you included Northern Ireland boost sample. This is the only BHPS sample to have a simple random sample design. So, for all those in this sample psu=-8 and strata=-8. Stata interprets this as a stratum with a single PSU and hence cannot compute se.</p> <p>Option 1: replace psu=hid to trick stata into thinking that there are many psu in one NI stratum<br />Option 2: use the singleunit option of svyset</p> </article> <article> <h1>Understanding Society User Support - Support #357: cluster variable</h1> <p>2015-05-22T11:29:06Z</p> <ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>50</i> to <i>100</i></li></ul> </article> </main></body></html>