Project

General

Profile

Support #1554

hsval - units

Added by Edward Pinchbeck almost 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
Data inconsistency
Start date:
06/22/2021
% Done:

100%


Description

Hello,

I would like to use the hsval variable (About how much would you expect to get for your home if you sold it today?). However, the variable is often reported in different units (e.g. £, £/100,000) even within the same individual. See below for an actual example in the data.

pidp wave hvalue
XXXXXXX a 165000
XXXXXXX b 170000
XXXXXXX c 150000
XXXXXXX d 160
XXXXXXX e 160000
XXXXXXX f 170000

Is there any way to identify the units other than by manual inspection and educated guesswork? As you can imagine, not all cases are as straightforward as the one above.

Many thanks
Ted

#1

Updated by Understanding Society User Support Team almost 3 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 80
  • Private changed from Yes to No

We don't have any additional information that will help you to undersstand why some of the hsval reported are so low. I checked the distributions across waves, and the 1st percentile is around 50,000 for most waves, of which around half report values less than10000. Number of cases with hsval<10000 ranges from 27-41 between waves 1-7 & 115-181 between waves 8-10. You will need to decide what to do with these cases. As the numbers are very low, you could consider dropping these cases or keeping them but using a 0-1 indicator variable for these cases to see if the coefficients you are interested in are different for these cases. Depending on what you are interested in, you could also categorise hsval into quantiles and then these cases will be included in the lowest quantile.

Best wishes,
Understanding Society User Support Team

#2

Updated by Edward Pinchbeck almost 3 years ago

OK. Thank you. It is true that there are few cases like my example above. However, there are many other cases where the units are otherwise truncated but which results in extreme implied changes between waves. While some of these may be genuine, I suspect that most are not. For example,

pidp wave hvalue
1224299891 19 350000
1224299891 20 350000
1224299891 21 350000
1224299891 22 325000
1224299891 23 35000
1224299891 24 350000
1224299891 25 380000

I can come up with some ways to deal with this, but thought it may be helpful to feed this back.
All best
Ted

#3

Updated by Understanding Society User Support Team almost 3 years ago

Thanks Ted. I see what you mean. I can see in this example it does look like they misreported hsval in Wave 23 (UKHLS W5?). I will discuss this with the data team and see if they have any additional advice.

One way I can think of identifying such large changes that is likely due to reporting error is to check the cases where the person is still living at the same address. In those cases if the hsval changes between waves by a % more than the national average for change in house prices, then that would be a red flag.

#4

Updated by Edward Pinchbeck almost 3 years ago

Fab, thanks. Yes Wave 23 is UKHLS W5.

#5

Updated by Edward Pinchbeck almost 3 years ago

Hello again. Just following this up: I wonder if the data team had any further advice on this?

Also, could you provide specific advice on how I can best tell whether individuals are still living at the same address. Would this be by using plnew for BHPS and origadd for UKHLS?

Many thanks
Ted

#6

Updated by Understanding Society User Support Team almost 3 years ago

Hello,

Please take a look at this User Forum post about the variables we provide to identify movers.
https://iserredex.essex.ac.uk/support/issues/1539

Best wishes,
Understanding Society User Support Team

#7

Updated by Edward Pinchbeck almost 3 years ago

Thanks, that was helpful.

Regarding the hsval variable, further inspection of the UKHLS and BHPS data suggests that several hundred (or more) of the entries for non-movers are likely misreported. Is there any process by which I could request that this field is investigated and/or cross checked against the source material?

Many thanks
Ted

#8

Updated by Understanding Society User Support Team almost 3 years ago

Thanks - I have passed on your request to the data team.

#9

Updated by Edward Pinchbeck over 2 years ago

Hello. Thank you. When convenient, please could you let me know if the data team have been able to look at this yet?
All best
Ted

#10

Updated by Understanding Society User Support Team over 2 years ago

Sorry for the delay in getting back to you. The data team have looked into this but their investigation releaved that while this soft check happens no variables are created - so a message appears for the respondent to check whether their reported value is correct and if not to go back and correct it.

#11

Updated by Edward Pinchbeck over 2 years ago

Good morning. No problem.

So if I have understood correctly this soft check occurs when Hsval < 50000 OR > 1000000 and seems to have been in place since wave 1 (2009-11). However, there is no other ex post check on, or validation of, the values reported.

I am sorry for the follow up but I wish to use this field in my research and I have some concerns that a non-trivial amount of the values contain errors despite the soft check. Many of these appear to be just missing (or extra) zeros e.g. my example above: pidp 1224299891 UKHLS W5 hsval should be 350000 rather than 35000.

Is there any process by which I could request your data team investigate this further, ultimately with a view to correcting obvious errors or flagging more difficult cases, or is there nothing further you can do here?

Thanks,
Ted

#12

Updated by Understanding Society User Support Team over 2 years ago

Thanks Ted. I understand. I have passed on your request to the data team. If you have written any code to automate the process of identifying such cases and would be happy to share it with us, I can pass that on to the data team as well.
Best wishes

#13

Updated by Edward Pinchbeck over 2 years ago

Hi, sorry for the delay. I've been away.

After compiling the full BHPS and UKHLS data and retaining the HH head, I created wave and mover variables then used STATA code on the following lines:

gen ym = ym(intdaty_dv,intdatm_dv)
replace ym=ym(istrtdaty,istrtdatm) if mi(ym)
format ym tm
replace ym= tm(1991m9) if wave==1
gen hvalue= cond(hsval<. x%x
hsval>0, hsval,.)
bys pidp (ym): gen d_hval=(hval-hval[_n-1])/hval[_n-1] if moved==0 & wave==wave[_n-1]+1
sum d_hval, det
bys pidp: egen maxdelta=max(d_hval)
bys pidp (ym): egen mindelta=min(d_hval)
br pidp hvalue d_hval wave moved if (maxdelta>=2|mindelta<-.5) & !mi(maxdelta)

#14

Updated by Understanding Society User Support Team over 2 years ago

Thanks - I will pass this onto the data team

#15

Updated by Understanding Society User Support Team over 2 years ago

  • Assignee set to Understanding Society User Support Team
#16

Updated by Understanding Society User Support Team over 2 years ago

  • Status changed from Feedback to In Progress
#17

Updated by Understanding Society User Support Team over 1 year ago

  • Category set to Data inconsistency
  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Also available in: Atom PDF