Support #2033
openDerived variable on income / earnings of each benefit unit
Added by Thomas Stephens almost 2 years ago. Updated 29 days ago.
80%
Description
Hi,
I notice that Understanding Society has a useful derived variable which splits households by their benefit unit (buno_dv); https://www.understandingsociety.ac.uk/documentation/mainstage/variables/buno_dv/. This appears to be aligned to how the DWP and Family Resources Survey define a benefit unit.
Is there a derived variable on income strictly for those in each benefit unit? This would ideally include several additional variables, distinguishing between income from all sources, earnings, investments, benefit income, pension, e.t.c. I am aware of household income derived variables and individual income derived variables, and have used them in my other analysis,, but I can't seem to see a set of the same derived variables for the benefit unit. Perhaps this exists and I have missed it.
If it hasn't been created, any advice / code on how it has been constructed in other cases would be very welcome of course.
Thanks in anticipation and let me know if any questions or ambiguities.
Best wishes,
Tom
Updated by Understanding Society User Support Team almost 2 years ago
- Category set to Income
- Status changed from New to In Progress
- % Done changed from 0 to 10
Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can. We aim to respond to simple queries within 48 hours and more complex issues within 7 working days.
Best wishes,
Understanding Society User Support Team
Updated by Understanding Society User Support Team over 1 year ago
- Status changed from In Progress to Feedback
- % Done changed from 10 to 80
- Private changed from Yes to No
Sorry for the delay in getting back to you. Here is the response from our income team.
For complete households you can sum the individual incomes in the benefit unit to get the benefit unit total. The Stata code for this is something like this:
use pidp l_hidp l_buno_dv using l_indall, clear
merge 1:1 pidp using l_indresp, nogen keep(3) keepus(l_fimngrs_dv)
mvdecode _all, mv(-9/-1)
bys l_hidp l_buno_dv: egen l_fimngrs_dv_bu=sum(l_fimngrs_dv)
bys l_hidp l_buno_dv: keep if _n==1
drop l_fimngrs_dv pidp
isid l_hidp l_buno_dv
su
Similarly, you can do this for subcomponents of income that are provided like total benefit income or total earnings. Note, you will not be able to recreate benefit unit totals of individual benefits (eg. benefit unit child benefit income or universal credit income) as these amounts are not on the released datasets (at least not in a cleaned up form with imputed values. Only the raw survey reports are available on the benefit.dta file). Also note that you will also be ignoring council tax deductions which occur at the household level and not deducted from individual incomes (and are deducted in the variable fihhmnnet3_dv measured at the household level).
One thing to consider is that there can be joint reporting of benefits and unearned sources (eg. both partners report child benefit, we identify and give it only to one) which possibly could introduce an issue. When summing at the household level it doesnt matter – the household total will be correct. But at the benefit unit level it will only work if both of the joint recipients are in the same benefit unit – or if not it will be slightly random which benefit unit gets the amount. [In principle this should not be an issue as both recipients should always be in the same benefit unit. But there MAYBE small violations where there are reporting errors and respondents in different benefit units but in the same household report the same benefit (and so it will be a bit random which benefit unit gets assigned the amount). That should really be checked.]
For incomplete households (ie. households where not all members completed an interview) things are more tricky. See variable hhresp_dv. You will not observe the imputed income amounts for non-respondents as these are not available in the data. And so you will have to drop such households which is a kind of selection which they might want to create some weights to adjust depending on their research question.
Hope this helps. If you have further questions please let us know.
Updated by Understanding Society User Support Team over 1 year ago
- Status changed from Feedback to Resolved
- % Done changed from 80 to 100
Updated by Thomas Stephens 30 days ago
Hi,
Firstly thanks so much for this response, very helpful. Sorry to bring this up again but I've been doing some checks and the buno_dv variable isn't behaving in the way I thought it would, so wanted to clarify with you / the income team.
My understanding from surveys like the Family Resources Survey is benefit unit should strictly consist of up to two spouses/co-habiting couples plus their dependent children. Therefore the number of people in the benefit unit reporting earnings should almost always be a maximum of 2. I guess a small exception would be cases where e.g. a 16-year-old reports a small amount of income, or where children remain dependent when older, though I think in case of e.g. FRS even then you usually if not always get up to two earnings sources.
I've been using 'R' rather than Stata to replicate your income team's suggestion (see code below FYI), and I am coming up with many cases where more than 2 respondents within each benefit unit report earnings. When I check this in a more granular way, by converting it to a 'wide' format (see bottommost syntax), I can see lots of benefit units have people of various ages reporting income, so not just e.g. 16 year-olds plus their parents.
It's obviously fine if 'buno_dv' doesn't mean the same thing in USoc, but I just wanted to clarify, in case I'm doing something wrong here? At the moment the only workaround I have seems to involve using person number and spouse number (w_pno, w_sppno) to link the couples' earnings that way.
Best wishes,
Tom
- Create a unique variable combining pidp, hidp, buno, wave (useful for comparison/merging later):
Dat$variables$Comb_hidpbunowave <- paste(Dat$variables$hidp, Dat$variables$buno_dv, Dat$variables$wave, sep = "0") |> as.numeric()
- Create 'long' format data combining benefit unit income:
BUMergeDat <- Dat$variables |> #* This temporarily takes variables outside of all-waves survey design object I've created ('Dat')
group_by(hidp, buno_dv, wave) |> #* Grouping variables (note 'wave' is a variable I created that is simply the wave number)
mutate(
BU_fimnlabgrs_dv_adj_nominus = sum(fimnlabgrs_dv_adj_nominus), #* BU earnings (adjusted by me to convert minus earnings to 0);
BU_n_adults = sum(DV_Age_All_Num >= 16, na.rm = TRUE), #* Number of adults in benefit unit.
BU_n_adults_with_earnings = sum((fimnlabgrs_dv_adj_nominus == 0)) #* Number of adults reporting earnings in benefit unit.
) |>
slice(1) |>
ungroup() |>
select(pidp, hidp, buno_dv, wave, Comb_hidpbunowave,
fimnlabgrs_dv_adj_nominus, BU_fimnlabgrs_dv_adj_nominus,
ncouple_dv_adj, BU_n_adults, BU_n_adults_with_earnings) #* Select specific variables to remain in the dataframe.
sum(duplicated(BUMergeDat$Comb_hidpbunowave)) #* Check for sum of duplicated values; should be zero.
- To check, create wider version of Dat which also shows age, individual earnings, e.t.c.:
Dat_Wider_Filter <- Dat$variables |> select(hidp, buno_dv, wave, ndepchl_dv, age_dv, fimnlabgrs_dv_adj_nominus)
Dat_Wider <- Dat_Wider_Filter |> group_by(hidp, buno_dv, wave) |>
mutate(grp = 1:n()) |>
gather(var, val, -Comb_hidpbunowave, -grp) |>
unite("var_grp", var, grp, sep ='') |>
spread(var_grp, val, fill = '')
sum(duplicated(Dat_Wider$Comb_hidpbunowave))
Updated by Thomas Stephens 29 days ago
Hi,
Just to update you I now seem to have found a workaround which gets this right: I think the only way to do it is to exploit the linking variables w_pno, w_sppno, constructing spouse/co-habiting partners' incomes where they both report each others' pnos AND where they report either living with their spouse (livesp_dv = 1) or living as if they are spouses (livewith = 1 | livewith = 3).
The existing benefit unit variable created does correctly place spouses/co-habitees in the same benefit unit within each household, so it's useful for creating this variable, but it's just that other adults, and other adults' earnings (sometimes much older adults), seem to be included in these benefit units. So you can't just sum individual income at 'benefit unit' level, without capturing other adults beyond spouses/co-habiting couples.
Once you've created that, you can then combine this with info on other household/family characteristics to create a complete variable on benefit unit income - e.g. roughly speaking individual earnings can be usesd for single-adult benefit units; household earnings for two-adult households where both are a couple; summed-up individual earnings, without any of the above adjustment, for cases where there's 2 adults in the benefit unit and they're both a couple. Basically sppno is needed for cases which don't fit these examples.
There is however one issue I've noticed, which you might want to think about as a potential improvement in future updates: I noticed sppno_dv mostly only exists for live-in legal spouses (livesp_dv = 1). There's a lot of missingness in this variable for people co-habiting as if spouses (livewith = 1 / 3), even though for purposes of benefits calculation their combined income should be counted. I'm finding most of the missingness can be resolved through other adjustments (i.e. where we know there are just these two in the benefit unit; see my paragraph above), but in future this potentially calls for the pno to be asked for of this second group too, as a separate and distinct variable.
Perhaps I'm missing something obvious but this seems to me the only workaround, but I'll keep reviewing it. I don't think this needs a response - and I appreciate you're not responsible for the syntax of derived variables - but just flagging in case useful.
Best wishes,
Tom
Updated by Understanding Society User Support Team 29 days ago
- Status changed from Resolved to In Progress
- % Done changed from 100 to 80
Dear Tom,
Thank you for the detailed report, this is very helpful. We'll have a look into this.
Best wishes,
Piotr Marzec
UKHLS User Support