Project

General

Profile

Actions

Support #2251

open

jbsic07_cc Missing Data Query

Added by Michael Francis 23 days ago. Updated 14 days ago.

Status:
Feedback
Priority:
Urgent
Category:
Data inconsistency
Start date:
05/13/2025
% Done:

80%


Description

Hi there,

I am using a combination of SIC and SOC codes to delineate between various employment and Covid groups. However, I've noticed that the proportion of individuals who have responded to wave n but have not been assigned a SIC code (coded -8) has increased massively. For example, if you check the number of NA (inapplicable,-8) responses (respondents who have an interview date) in wave n it comes to 25,885 compared to only 14,241 in wave m, 14,797 in wave l and 14,693 in wave k?

I am going to use the 'samejob' variable to see if this can help but please could somebody check if this is an actual mistake in the data or processing of the data itself - it also shows on the website variable page as having a lot more missing data in wave n (inapplicable, -8), for both the general jbsic07_cc and the special licence version.

If someone could point to a better solution that would also be super helpful!

Many thanks in advance,

Michael Francis


Files

clipboard-202505151137-58dmi.png (296 KB) clipboard-202505151137-58dmi.png Michael Francis, 05/15/2025 11:37 AM
soc2020volume2thecodingindexexcel12102023.xlsx (4.28 MB) soc2020volume2thecodingindexexcel12102023.xlsx Understanding Society User Support Team, 05/16/2025 11:01 AM
Actions #1

Updated by Understanding Society User Support Team 22 days ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 50
  • Private changed from Yes to No

Hello Michael,

For a similar issue, please refer to Issue #2248, which you can access here: https://iserredex.essex.ac.uk/support/issues/2248.

I hope this information is helpful.

Best wishes,
Roberto Cavazos
Understanding Society User Support Team

Actions #2

Updated by Michael Francis 21 days ago

Hi Roberto,

Thanks for the reference - this clarifies what the issue is. However, this does not really resolve the issue in my case, as the UK Govt's definition of key workers is a cross-reference matrix of SOC2010 and SIC2007. I am also using the HRMC definition of furloughed industries, which only uses SIC 2007. Without an accurate representation of SOC/SIC this could possibly make the group estimates unstable.

In terms of a solution, would you recommend using the samejob flag? This seems to produce quite similar estimates to the previous wave once weighted, see image attached:

Alternatively, I could use jbsic07_1_cc and just look at the first job for wave n?

Another option would be to consider, as you explained, highlighting the individuals with multiple jobs and only using the first job for their sic classification? It would be good to get some guidance on a specific method as it is not possible to reliably use the latest wave (n) otherwise, and I'm looking at the recovery since Covid, so it's really the crux of my project.

Also, in order to externally validate the study results, e.g., with the LFS, it is necessary for the UKHLS SIC codes to be accurate and based on the same definition as the current job definition in the previous waves a-m.

Many thanks in advance,

Michael Francis

Actions #3

Updated by Understanding Society User Support Team 20 days ago

Hi Michael,

What we’ve found so far is that jbsoc00 appears to be the best option, as it has fewer missing values. I think you could use the ONS look-up tables to convert SOC2000 to SOC2010 codes and then make use of the cross-reference matrix you mentioned. (I’m attaching the file)

The analysis on the best way to address these issues is still ongoing. Once we have a reference guide or recommendation, we could share it with users as a tentative solution, at least until the next information release. I’ll keep you updated as soon as we know more.

I think it’s a good idea to use the samejob variable to help identify those who actually changed jobs (and therefore their SOC might change) versus those with only apparent changes. Just keep in mind that this question is only asked when the respondent has the same employer (jbsamr = 1), which means there could be people working for the same employer but in a different role or position, so their classification may still change. I think you can handle this by considering all possible combinations of the cjob, jbsamr, and samejob variables.

Regarding jbsic07, you could consider populating the main job classification when it's missing, by assigning classifications from multiple jobs when jbhas == 1. At the same time, you could apply the approach mentioned above.

I hope this information is helpful.

Best wishes,
Roberto Cavazos
Understanding Society User Support Team

Actions #4

Updated by Michael Francis 17 days ago

Hi Roberto,

Thanks for your help with this. I tried the lookup from SOC 2000 to SOC 2010, however, the issue is that I'm using the main user access version of the UKHLS and I would need to use the special licence for the 5 digit SOC for this to work properly, e.g., for SOC 2000 minor group 244 there are multiple conversions to different SOC 2010 groups, and so I need the granularity of the SL data.

The process for getting access for the Special Licence is quite bureaucratic and takes a while, is there anything that can be done in the meantime? I've corrected the SIC 2007 for those in the same job now, and will follow the same process for the SOC but it's important that a workaround is possible for all versions of the UKHLS available to researchers.

Thanks again,

Michael Francis

Actions #5

Updated by Understanding Society User Support Team 14 days ago

  • % Done changed from 50 to 80

Dear Michael,

The only possible fix for the n_jbsic07_cc variable is to backfill it using data from Wave 13. This should have been done before the data release in November 2024, but unfortunately, it wasn’t, due to an oversight on our part. However, we have since discovered that multiple jobs variables series (n_jbsic07_x_cc, e.g. n_jbsic07_1_cc, n_jbsic07_2_cc, and so on) are also partially incorrect. As a result, this fix can only be applied to respondents who had only one job in both Wave 13 and Wave 14. In such cases, the fix should be applied as follows:

gen n_fulljbsic07_cc=n_jbsic07_cc
replace n_fulljbsic07_cc=m_jbsic07_cc if n_jobcodechk01==1 & m_multijobs==1 & n_multijobstotal==1 & inrange(m_jbsic07_cc,1,99).

Regarding the lookup tables – the Special Licence provides access to the 4-digit version of SOC, which will make your task somewhat easier, though it won’t eliminate the issue of multiple possible matches for a single SOC2000 code. Unfortunately, we don’t have a better solution at this time.

The discrepancy between jbsoc00 and jbsoc10 (i.e. missing values in the latter) arises from individuals in continuous employment who haven’t changed jobs since the introduction of SOC2010 in Wave 3. Please see the text of our reply to a similar query from a few years ago:

we use dependent interviewing (DI) to minimise spurious changes in occupations, so instead of asking for a new description of the current job every wave we ask whether the job is still the same as last time interviewed. If it is still the same, the previous occupational code is fed forward. When a new classification scheme is introduced, such as SOC 2010 in w3, there are no values from the previous wave that could be fed forward, hence we see the data gap [-9] in W3 (or later) for those cases who continue to have the same job as they had before. 'Before' in this context may actually refer to a job the respondent held in the BHPS already (DI started in BHPS W15). Plucking this data gap requires resource-intensive coding of the freetext data.

We understand that this is inconvenient, and we will consider releasing a simplified 1:1 correspondence table between SOC2000 and SOC2010. However, I can’t guarantee when this might happen. I’m sorry for what I understand may be a somewhat disappointing response.

Best wishes,
Piotr Marzec
UKHLS User Support

Actions

Also available in: Atom PDF