Project

General

Profile

Actions

Support #2248

open

Some queries r.e. issues with SOC 2000 and 2010 variables

Added by Thomas Stephens 29 days ago. Updated 14 days ago.

Status:
Feedback
Priority:
Normal
Category:
Data documentation
Start date:
04/29/2025
% Done:

50%


Description

Good afternoon,

I'm currently updating some analysis for Waves 13-14 and I noticed that from your variable guide that Understanding Society is investigating issues with previous waves' SOC 2000 and SOC 2010 data, which you've identified following changes to the approach to gathering new SOC 2020 data.

This makes sense; I just had a few questions:

1. Does this issue also affect Standard Industrial Classification data, e.g. jbsic07? I currently use the condensed version (jbsic07_cc) to infer workplace health and safety. There is no mention of this issue affecting this variable, but the routing for this question is fairly similar to the old SOC (i.e. it's only asked of people who have changed jobs) so it would make sense if it were also affected. Any reassurance here would be greatly appreciated!

2. Could you provide any info on the number of SOCs affected by this issue, and which respondents are affected - e.g. would a variable be made available which flags where this is different, or some code be provided to compute it?

3. An obvious question but could you provide any further info on timeline to resolve this?

As this doesn't affect the latest (Wave 13-14) SOC 2020s this likely won't affect my planned analysis too much, but I would especially appreciate assurance r.e. the SICs point.

Thanks in anticipation for any clarification the team can provide.

Best wishes,

Tom


Files

clipboard-202505091059-g4pum.png (65.2 KB) clipboard-202505091059-g4pum.png Understanding Society User Support Team, 05/09/2025 10:59 AM
clipboard-202505091100-rkpjd.png (5.93 KB) clipboard-202505091100-rkpjd.png Understanding Society User Support Team, 05/09/2025 11:00 AM
clipboard-202505091100-qmocs.png (9.5 KB) clipboard-202505091100-qmocs.png Understanding Society User Support Team, 05/09/2025 11:00 AM
Actions #1

Updated by Understanding Society User Support Team 28 days ago

  • Category changed from Questionnaire content to Data documentation
  • Status changed from New to In Progress
  • % Done changed from 0 to 10
  • Private changed from Yes to No

Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can. We aim to respond to simple queries within 48 hours and more complex issues within 7 working days.

Best wishes,
Understanding Society User Support Team

Actions #2

Updated by Understanding Society User Support Team 27 days ago

  • Status changed from In Progress to Feedback
  • % Done changed from 10 to 50

Hello Tom,

1. Yes, the issue affecting jbsoc is the same for jbsic07. We observed a significantly higher rate of change (approximately 5-6 times higher than in previous waves) in the industry coding between waves. This is due to the absence of dependent interviewing, with data collected and coded from scratch, and much of this change is artificial, mirroring the situation with jbsoc.
2. Currently, a variable reflecting this correction isn't included. However, we may consider adding it in our next release.
3. Our aim is to have these corrections implemented for the next wave release, scheduled for November 2025.

Regarding the recoding process for jbsoc20, it involves individuals in continuous employment where a genuine job change was identified after comparing the jbsoc text fields, as well as those who were not in continuous employment. Consequently, jbsoc20 is not complete (approximately 9500 observations are coded as -9). Therefore, I would recommend using either jbsoc00 (the best option) or jbsoc10 (which still has a significant number of -9s, but fewer than jbsoc20).

I hope this information is helpful.

Best wishes,

Roberto Cavazos
Understanding Society User Support Team

Actions #3

Updated by Thomas Stephens 24 days ago

Hi Roberto,

Thanks. Super helpful. I've now appraised myself more of this, reading your reply and also the revisions set out by Understanding Society (here: https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/revisions-to-the-main-current-job-occupation-jbsocc-in-wave-13/).

I have some more questions, just to be sure I've understood this issue correctly.

For context, I've been using data on workers' SIC 2007, cross-referenced with Labour Force Survey data, to build a matrix of occupational health and safety and introduce this into Understanding Society. It does this by building a profile of workplace accidents and illnesses in comparable years of LFS and then introducing health and safety data based on workers' SIC in the corresponding wave. My data uses Waves 4, 6, 8, 10 and 12 of UKHLS but I wanted to update this to Wave 14. In addition, I also now want to build a matrix of workers' job quality by SOC 2020 - I probably only need at least some good data on SOC 2020 to do this.

It seems these issues may make both these things more difficult, but I think I might still be able to find a way through it depending on the severity of the issue - especially whether prior waves' data are affected. Pursuant to this, can I check that:

1. For the most part, this isssue isn't deemed to affect pre-Wave 13 data on jbsic07, jbsoc00 and jbsoc10? My initial worry was that the high rate of change has led the UKHLS team to worry about the accuracy of prior waves' data, but in fact it more seems like the absence of dependent interviewing has created a lot of as you say "artificial" changes in occupations amongst people whose jobs actually remained the same? The analysis linked above appears to suggest as much because once you filter out spurious changes you seem to end up with a rate of job change that's similar to prior waves (~15%).

2. For Waves 13-14, can any jbsoc20 data which are not coded as missing (-9) be relied on as providing a reasonably accurate reflection of that person's current job? I.e. someone coded -9 presumably reported an occupation change which has since been identified as spurious, meaning they don't have an soc20 (and unfortunately it can't be recovered).

3. For sic 2007 in Waves 13-14, amongst those in paid work, it seems there has been a huge jump in inapplicables (-8) for Waves 13-14 - previously I found no inapplicables for those prior waves amongst this sub-group - but -9 is fairly stable. Which suggests the UKHLS team have assigned spurious SICs to -8, is that right? This prompts the same question as #3: can those who aren't coded as 8 be relied on?

4. For SIC 2007, is it possible that some of these Wave 13-14 workers' data from prior waves is recoverable? I.e. in cases where their change in jobs is spurious, I can presumably write some code which carriers forward their prior waves' SIC 2007? I would need to know which of these -9s are genuinely spurious though, which I could presumably find using workers' self-reported job change data.

Any advice on the above would be greatly appreciated!

Best wishes,

Tom

Updated by Understanding Society User Support Team 20 days ago

Hello Tom

1. This issue regarding job details only arose from Wave 13 onwards. This is because Wave 13 was the first time we asked respondents about multiple jobs, and all respondents in paid employment answered the jbsoc question(s), not just those whose job descriptions had changed. Before Wave 13, this was managed via the jbsoc00chk question, which wasn't included in Wave 13. Therefore, this isn't a problem in earlier waves.

2. Generally, the jbsoc20 classification is more reliable. However, our initial reviews have identified instances where respondents with multiple jobs have the same SOC code repeated across all their job slots. For example, if m_multijob = 2, then m_jbsocxx_1_cc is identical to m_jbsocxx_2_cc. Furthermore, these repeated codes don't consistently align with either the main job or the first reported job. This same situation is also observed in the soc10 and soc20 variables. This raises an additional concern: for individuals with multiple jobs (multijob > 1), does the main job classification truly reflect their primary occupation, or could it be related to an additional job?

3 and 4. The issue with jbsic07==-8 comes from the adjustments made in Wave 13 to accommodate respondents with multiple jobs.

It seems we could potentially increase the number of valid observations, for example, by around 814 in Wave 14, by assigning the jbsic from the multiple job to the main job.

Of the potentially 8 observations where mainjob equals 3, we currently only have classifications for 2 observations, as shown in the table below.

Therefore, you could assign these values to increase the number of valid observations. However, it's important to note that these classifications might still exhibit the original issue.

I hope this information is helpful.

Best wishes,
Roberto Cavazos
Understanding Society User Support Team

Actions #5

Updated by Thomas Stephens 18 days ago

Hi Roberto,

Thanks this is very helpful. I will probably recode the SICs in Wave 14 who have multiple jobs as you suggest, and be transparent r.e. the limitations - missingness is still higher than I'd like for that wave but I can await the November 2025 release and revise accordingly after then. Failing that I might map to SOC instead in LFS (where there's less missingness in USoc so long as you use SOC 2000 and SOC 2010).

On the SOCs - I think a lot of this data might be recoverable for my purposes because the LFS does contain SOC 2010 mapped to SOC 2020 (for Q1 2021-) and SOC 2000 mappped to SOC 2010 (for Q1 2011 - Q1 2021), so for the data I'm using, which uses SOC 2020, I can probably make some assumptions based on what SOC data there is within USoc, likely relying mostly on SOC 2000 as you suggest.

However before I do this, can you clarify why there are respondents in each wave who sometimes report SOC 2000, 2010 and 2020 data at the same time? E.g. in Wave 14 I see there are over 9,000 people in paid work (or away from a paid job) who have jbsoc00_cc and jbsoc10_cc which are >0? Are these the same jobs, o is this carried over data from prior jobs which I should be wary of?

This is probably my final question - sorry for asking so many.

Best wishes,

Tom.

Actions #6

Updated by Understanding Society User Support Team 14 days ago

Hello Tom

The different classifications (SOC00, SOC10, SOC20) are provided by the fieldwork agency, which suggests that a mix of approaches may be at play. In some cases, the SOC code may have been carried forward from a previous wave (historical carry-forward), while in others, the respondent’s job may have been reclassified under a different system (parallel classification). At this point, we don’t have the proportions or breakdowns to distinguish these cases precisely.

However, this is something we are actively considering as part of the ongoing redesign and review of the SOC variables. Once we have more information, we will publish further guidance to keep our users informed.

I hope this information is helpful.

Best wishes,
Roberto Cavazos
Understanding Society User Support Team

Actions

Also available in: Atom PDF