Averaging regional data to obtain control variables for individuals
I am using wave 6 to study household heads' homeownership probabilities. I am looking at native Brits and immigrants (I came up with an immigrant dummy for every household head).
I would now like to generate a control variable for each of my household heads: the variable should reflect the proportion of immigrants in the UK region where the person resides (that is, every household head in e.g London will have the same immigrant share attached, etc.). I am wondering how I should calculate that average: does it have to be weighted (i.e. egen immishare = wtmean(immigrant), weight(indscui_xw) by(region) using the gwtmean package which calculates weighted statistics)? I would think so, because without weighting it, I would have an average immigrant share based on the (not-per-se representative) raw data. However, if I calculate a weighted mean, then I would effectively double-weight the data because the regression itself would be weighted too, no?
I am unsure how to proceed and would appreciate any help.
Updated by Olena Kaminska over 1 year ago
You don't need our weight (probability weight) in creating variables. Sometime people use analitical weight in this situation when for example some groups should have a higher importance in the resulting variable than others. This is strictly research and theory driven and has nothing to do with sampling and nonresponse (for which our weights controls).
My suggestion for you is to ignore weighting in egen command and use our weight in any command that provides estimation. Or you could add analytical weight in egen command, but our weight would be unlikely suitable here - more likely you would need to use one of the variables in the dataset that corresponds directly to theoretical reasoning for this analytical weighting stage.
Hope this helps,