Support #1198
openWeighting data Wave 8 h_indresp
Added by Sarah H over 5 years ago. Updated over 2 years ago.
100%
Description
Hi
The US provides instructions on pages 65 to 71. Section 3.3 to choose the correct Variable to Weight the data.
I have selected n_indpxui_xw because I'm using the individual dataset and not excluding proxy responses.
However, the naming convention on page 71 suggests that n_indpxus_xw would be the correct weight? This variable is not available in the dataset that I'm using. N_indpxus_lw is indeed available but LW stands for longitudinal data and I'm only doing analysis of one wave. therefore this is not longitudinal. Instead XW weights that stand for cross-sectional data, i.e. using one wave cross-sectional data.
n_indpxui_wv is the only variable that is cross-sectional that is available in the dataset I have selected this as the correct weight to use? Can you confirm this please
best wishes
Sarah
Files
weighting and ethicity.docx (17 KB) weighting and ethicity.docx | Sarah H, 07/03/2019 04:38 PM | ||
syntax understanding society ethnicity.docx (108 KB) syntax understanding society ethnicity.docx | Sarah H, 07/05/2019 09:57 AM |
Updated by Stephanie Auty over 5 years ago
- Category set to Weights
- Assignee set to Olena Kaminska
- Private changed from Yes to No
Updated by Olena Kaminska over 5 years ago
Sarah,
Yes, I can confirm that n_indpxui_xw is the correct weight to use in your analysis. Thank you for pointing this issue to us and apologies for the confusion.
Thanks,
Olena
Updated by Sarah H over 5 years ago
Hi
When I use this weight I get an error message >Warning # 3211
On at least one case, the value of the weight variable was zero, negative, or
missing. Such cases are invisible to statistical procedures and graphs which
need positively weighted cases, but remain on the file and are processed by
non-statistical facilities such as LIST and SAVE.
however, the variables do not have zero in them necessarily. So do we assume that this data cannot be weighted and therefore cannot be generalizable ?
Updated by Olena Kaminska over 5 years ago
Dear Sarah,
Yes, this message is correct but you don't need to worry about it. Understanding Society has a much more complex design than most surveys - hence zero weights. Having said that, any analysis that uses weights will be generalizable to a population and you should just ignore this message.
Best,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
I'm applying this weight to analysis conducted on data that respondents are resident in England only. Is this still correct?
Best wishes
Sarah
Updated by Sarah H over 5 years ago
Hi Olena
when I apply the weight, shouldn't expect the counts to shift and the frequencies to be higher? They do not change when the weight is applied
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
Which syntax are you using? Our weights are not frequency weights so please do not use fweights command. Our weights are probability weights and pweights should be used.
Hope this helps,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
I am using n_indpxui_xw using the h_indresp database (individuals and proxies) looking at England geographical area only
Updated by Olena Kaminska over 5 years ago
Sarah,
I was referring to the syntax in Stata, for example svy command and pw=n_indpxui_xw within this command. Is it how you use our weights?
Thanks,
Olena
Updated by Sarah H over 5 years ago
I am using SPSS. Does this change anything that you have responded above in terms of error message and weighting variable?
Updated by Olena Kaminska over 5 years ago
Sarah,
I suggest that you use proportions, and not frequencies, that SPSS gives you. If you need to get population frequencies - just multiple the SPSS weighted proportions by the population total.
Hope this helps,
Olena
Updated by Alita Nandi over 5 years ago
- Assignee changed from Olena Kaminska to Sarah H
Updated by Sarah H over 5 years ago
Hi
I think this has gone on to a different topic, to confirm that if I use the n_indpxui_xw weight in SPSS, this makes the data representative? an I should ignore the error message 3211
Updated by Alita Nandi over 5 years ago
- Assignee changed from Sarah H to Olena Kaminska
Updated by Sarah H over 5 years ago
Thank you Olena.
To follow on from this, I've conducted some analysis using the weight and results I would expect to find are not coming out. e.g. ethnicity difference in terms of employment permanency. This proved to be significant for my sample when the data was analysed unweighted but weighted there is no significance. I am wondering if I have done something wrong such as clean the variable (throughout my analysis I've excluded non responses/ missing/na/don't know)
Please let me know should you require any further information
Best wishes,
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
What was the sample size in your analysis before and after weighting, and what was the p-value?
Thank you,
Olena
Updated by Sarah H over 5 years ago
Hi
Fishers exact test performed.
Sample size before weighting n=962 p=.027
Sample size after weighting n=1015 p=.631
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
This doesn't sound right: I expect the total with weights to be smaller than in unweighted analysis. I also don't expect such a large difference between p-values. Are you sure that the definition of the variables is the same between the models? You should not take out nonrespondents etc. with weights - the weights do the job for you. You should have the same coding of all the variables as well to make a fair comparison.
Best wishes,
Olena
Updated by Sarah H over 5 years ago
The definition of the variable is the same. However, the variable i'm using is ethnicity and as there are so many, I've created a new variable with only 3 ethnicities to so that a test could be conducted.
with regards to you second point, I've performed another test in SPSS with two variables: variable 1) samples of interest (2 groups), variable 2) contract type e.g. perm/temp which was not cleaned and the weighting variable had not removed the inapplicable, don't know, refusal from the table, hence why I've cleaned all the variables i'm using so these don't appear in the tests.
Updated by Olena Kaminska over 5 years ago
Try to exclude the variable reflecting samples of interests. The weight takes into account samples correctly on your behalf. It is very easy to get a non-representative results if you exclude some of the samples.
Updated by Sarah H over 5 years ago
Dear Olena
Have I missed something? I am only looking at a very specific sub sample of the dataset (e.g. specific age and caring responsibility). Are you suggesting that I cannot use weights for this? Thus it can't be generalizable to this group.
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
I see. No, your subgroup definition is fine and yes, our data will represent this subgroup. I misunderstood thinking that you may have selected some of the samples (like general population sample or ethnic minority boost). So, this isn't a problem.
The difference between weighted and unweighted results must be explained by the difference in definitions of the variables then.
Hope this helps,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
Thanks for your message. No I've used the exact same variables as I am using the same dataset but just applied the weights.
With regards to removing the responses, such as don't know, not applicable etc - could I enquire about this? I have cleaned the variables to remove this (by creating a new variable and coding these as missing). you suggested earlier that this would have been done automatically, however, it does not happen automatically.
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
The weighting deals only with unit nonresponse. Don't knows and refusals can be treated as valid answers or can be recorded as missings, so yes you should record this yourself. Have you used the same recorded variables in your unweighted analysis too? It may be that this recording influences your analysis and p-value and the change is largely due to your coding of item missingness and has nothing to do with weighting.
Olena
Updated by Sarah H over 5 years ago
Hi Olena
Yes I have used all the same variables when I did the analysis weighted/unweighted.
however, I have done an experiment with my sub group and cross tabbed it with the ethnicity variable which I have not cleaned as a test. The results are very different, which I find surprising but would explain the difference in the P values? but does not explain the huge difference in percentages.
Updated by Olena Kaminska over 5 years ago
Sarah,
Unfortunately I don't think I know the reason for the difference between weighted and unweighted analysis that you observe. I can only say that the difference your observe is definitely wrong: there is no situation in which sample size can go up when you use a weight. The p-value chance looks very suspicious too. My guess is that weighting has little to do with the difference you observe.
I hope this helps,
Olena
Updated by Olena Kaminska over 5 years ago
Sarah,
I just want to clarify my previous comment. It relates to your earlier results of Fisher exact test.
I only now noticed your attachment of the weighted an unweighted distribution by ethnic group - and that's all fine and as expected. I am afraid this doesn't explain the earlier difference.
Thanks,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
This is most strange and I've re-done the test again and appears the N does go up for weighted data in the Fisher exact test whereas for all my other analysis weighted it goes down (sample size drops). Do you have any other explanation as to why this is? I'm a bit concerned with the difference in weighting. To confirm this has nothing to do with me coding out non-responses?
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
I am puzzled myself. But if you have a syntax that you can share I will be happy to look.
Thanks,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
Thank you. I've attached three pages in a word document. The first page has the syntax and the second and third the outputs with the Fishers Exact test
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
I see the problem. The way you specified weights in SPSS they are assumed to be frequency weights. Our weights are probability weights, centred around 1. What SPSS seem to do is for those who have a weight over 1 it gives them the weight of 1, and for the other half of people with weights below 1, it gives the weight of 0. In other words your results come from an unweighted non-random half of a sample and the value you obtain is wrong.
You should use Complex Sample module in SPSS. Please specify cluster and strata in addition to the weight in your analysis. Here is the link to more information:
https://www.spss.ch/upload/1071150823_SPSS%2012%20Complex%20Samples.pdf
Best wishes,
Olena
Updated by Sarah H over 5 years ago
Thank you Olena for this information
Can I confirm that if I do not do hypothesis testing and only cross tabulation that weighting can still be correct in SPSS (e.g. not using complex sampling module)
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
No, cross-tabulation will be wrong without weighting.
But if you are not presenting confidence intervals (though you should), you can ignore clustering and stratification.
Best wishes,
Olena
Updated by Sarah H over 5 years ago
Hi Olena
What is the cluster variable I should use? I have used the strata variable
Best wishes
Sarah
Updated by Olena Kaminska over 5 years ago
Sarah,
Cluster variable is called psu.
Best wishes,
Olena
Updated by Sarah H over 5 years ago
Hi
thank you. I'm coming up with an error message that says my weight is being ignored and a warning message on the output that 'the weight is being ignored'. would you be able assist me what I've done wrong? I have also copied the syntax
- Sampling Wizard.
CSPLAN SAMPLE
/PLAN FILE='C:\Users\sh33496\OneDrive - The Open University\Data\Secondary Data '+
'Analysis\Understanding Society\Datasets\understandingsocietyfinaltrial123.csplan'
/PLANVARS SAMPLEWEIGHT=SampleWeight_Final_ PREVIOUSWEIGHT=h_indpxui_xw
/PRINT PLAN
/DESIGN STAGELABEL='ussamplefinaltrial' STRATA=h_strata CLUSTER=h_psu
/METHOD TYPE=SIMPLE_WOR ESTIMATION=DEFAULT
/RATE VALUE=1
/STAGEVARS INCLPROB CUMWEIGHT.
CSSELECT
/PLAN FILE='C:\Users\sh33496\OneDrive - The Open University\Data\Secondary Data '+
'Analysis\Understanding Society\Datasets\understandingsocietyfinaltrial123.csplan'
/CRITERIA STAGES=1 SEED=RANDOM
/CLASSMISSING EXCLUDE
/PRINT SELECTION.
best wishes and many thanks for your assistance
Sarah
Updated by Alita Nandi over 5 years ago
- Assignee changed from Olena Kaminska to Sarah H
Dear Sarah,
Sorry for the delay in getting back to you. We generally provide support and advice on data (& weights) issues only. We provide guidance on syntax related to data management. Your query seems like a problem with SPSS syntax relating to how weights need to be specified.
You could look for solutions online. You could also send a query to our JISC mail group which has been set up for Understanding Society data users to discuss analysis (incl syntax related to analysis) issues. If you want to sign up please send an email to UKHLS-REQUEST@JISCMAIL.AC.UK
Best wishes,
Alita
Updated by Sarah H over 5 years ago
Dear Alita
I come to you with another question that I hope you can help with
I just want to confirm what Strata stands for, and PSU. i'm still struggling with complex sampling and going back to basics
best wishes
Sarah
Updated by Alita Nandi over 5 years ago
When a population is divided into mutually exclusive and exhaustive parts and then samples are chosen from each of these parts - known as strata - then we have stratified sampling. When a population is divided into mutually exclusive and exhaustive parts but samples are chosen from some of these parts - known as clusters or primary sampling units (PSU), then we have clustered sampling. These are very simplified explanations. There are different types of clustering and stratification. For example there are explicit and implicit stratification, multi-stage clustering where you will have primary and secondary clusters etc. You will find explanations of these concepts in many standard Statistics books. For example, Levy and Lemeshow "Sampling of Populations: Methods and Applications"
Hope this helps,
Alita
Updated by Sarah H about 5 years ago
Hi
what are the implications for the data analysis if I conduct the analysis without weighting the data using complex sampling method/ module?
Best wishes
Sarah
Updated by Sarah H about 5 years ago
Hi
what are the implications for the data analysis if I conduct the analysis without weighting the data using complex sampling method/ module?
Best wishes
Sarah
Updated by Understanding Society User Support Team almost 4 years ago
Here is information on why weights should be used.
https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/why-use-weights
Updated by Understanding Society User Support Team over 2 years ago
- Category set to Survey design
- Status changed from Feedback to Resolved
- % Done changed from 80 to 100