Two questions about weights
I have two questions about weights that perhaps you can help me with.
First, the documentation states that the right weight depends on the 'lowest' questionnaire that you are using. e.g. if you are using information from the adult interview and adult & proxy interview, you should use the adult interview weights. Can you explain to me why these weights vary, or point me to a source which could explain? I am basically wondering what to do if we want to include some questions not on the proxy questionnaire, but I am happy to make some assumptions for proxies.
Second, in the harmonised BHPS there seems to be one (set of) weights if we want to include proxies and non-respondents - lewght - and one if we want to just include those that give a full interview - lrwght. What should I do if we want to perform analysis on full respondents and proxies, but not non-respondents?
Grateful for any help.
Updated by Olena Kaminska almost 3 years ago
Thank you for your question.
All our derivation of weights is explained in the Understanding Society User Guide. Indeed if you use information from at least one question from full interview then you are technically limiting your analysis to only people who provided full interview and therefore you need to adjust for nonresponse correctly and use full interview weight. Nevertheless if you are planning to correct for missing information in the questions from full interviews - such that the pool of people you are using include full and proxy questionnaires (check numbers after running your model) - then you should use proxy weight.
And with regard to your second question, lewght is a longitudinal weight for enumerated people (this does include proxies and nonrespondents to full questionnaire, but the weight is only for questions asked at the household grid and the household questionnaire - and for these instruments all the enumerated people are 'respondents'). Again this weight should be used only if you use information from hhresp and indall files and do not use any information from proxy or full interviews. If you use any information from personal interviews you should use lrwght - this includes people who gave full interviews.
If you want to include proxies in your analysis you can model nonresponse yourself. I suggest you use enumeration weight from wave 1 as a starting point and model nonresponse from there. We are not ready to advice on the exact modelling, but we hope to develop guidelines in the future.
Hope this helps,