Alita Nandi wrote:
Hello,
Thanks for your query. I have passed on your concern to the income team.
About your question, "In this case, is it harmful to top-code the final total monthly income given that not all components are effectively top-coded and there are some very extreme values?" The answer depends on what you are trying to do in your analysis. For example, if you are using income as an explanatory variable then you could use income quantiles, then the very high incomes (including top-coded ones) will be in the same highest income category. Whether only components or the total income is top-coded will not matter.
But note that top-coding is done only to reduce disclosure risks. In many analyses this is not a problem. But if in your analysis precise income values are needed, then you should apply for the Special License version of the data where these variables are not top-coded. https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6931
Best wishes,
Understanding Society User Support
Alita Nandi wrote:
Hello,
Thanks for your query. I have passed on your concern to the income team.
About your question, "In this case, is it harmful to top-code the final total monthly income given that not all components are effectively top-coded and there are some very extreme values?" The answer depends on what you are trying to do in your analysis. For example, if you are using income as an explanatory variable then you could use income quantiles, then the very high incomes (including top-coded ones) will be in the same highest income category. Whether only components or the total income is top-coded will not matter.
But note that top-coding is done only to reduce disclosure risks. In many analyses this is not a problem. But if in your analysis precise income values are needed, then you should apply for the Special License version of the data where these variables are not top-coded. https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6931
Best wishes,
Understanding Society User Support
Thanks for your prompt reply.
I will use income as my dependent variable for conditional and unconditional quintile regressions. I was guessing that if some of these values - like top 0.5% - are already very extremes using the top-coded version, with the full-version the problems of right-skewness can only be exaggerated and using quintiles of income as outcomes, these should still be within the same quintile as you were pointing out.
In this sense I was asking if is it harmful or not.
Thanks again for your support!