Project

General

Profile

Actions

Support #2251

open

jbsic07_cc Missing Data Query

Added by Michael Francis 2 days ago. Updated about 9 hours ago.

Status:
Feedback
Priority:
Urgent
Category:
Data inconsistency
Start date:
05/13/2025
% Done:

50%


Description

Hi there,

I am using a combination of SIC and SOC codes to delineate between various employment and Covid groups. However, I've noticed that the proportion of individuals who have responded to wave n but have not been assigned a SIC code (coded -8) has increased massively. For example, if you check the number of NA (inapplicable,-8) responses (respondents who have an interview date) in wave n it comes to 25,885 compared to only 14,241 in wave m, 14,797 in wave l and 14,693 in wave k?

I am going to use the 'samejob' variable to see if this can help but please could somebody check if this is an actual mistake in the data or processing of the data itself - it also shows on the website variable page as having a lot more missing data in wave n (inapplicable, -8), for both the general jbsic07_cc and the special licence version.

If someone could point to a better solution that would also be super helpful!

Many thanks in advance,

Michael Francis


Files

clipboard-202505151137-58dmi.png (296 KB) clipboard-202505151137-58dmi.png Michael Francis, 05/15/2025 11:37 AM
Actions #1

Updated by Understanding Society User Support Team about 21 hours ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 50
  • Private changed from Yes to No

Hello Michael,

For a similar issue, please refer to Issue #2248, which you can access here: https://iserredex.essex.ac.uk/support/issues/2248.

I hope this information is helpful.

Best wishes,
Roberto Cavazos
Understanding Society User Support Team

Actions #2

Updated by Michael Francis about 9 hours ago

Hi Roberto,

Thanks for the reference - this clarifies what the issue is. However, this does not really resolve the issue in my case, as the UK Govt's definition of key workers is a cross-reference matrix of SOC2010 and SIC2007. I am also using the HRMC definition of furloughed industries, which only uses SIC 2007. Without an accurate representation of SOC/SIC this could possibly make the group estimates unstable.

In terms of a solution, would you recommend using the samejob flag? This seems to produce quite similar estimates to the previous wave once weighted, see image attached:

Alternatively, I could use jbsic07_1_cc and just look at the first job for wave n?

Another option would be to consider, as you explained, highlighting the individuals with multiple jobs and only using the first job for their sic classification? It would be good to get some guidance on a specific method as it is not possible to reliably use the latest wave (n) otherwise, and I'm looking at the recovery since Covid, so it's really the crux of my project.

Also, in order to externally validate the study results, e.g., with the LFS, it is necessary for the UKHLS SIC codes to be accurate and based on the same definition as the current job definition in the previous waves a-m.

Many thanks in advance,

Michael Francis

Actions

Also available in: Atom PDF