Project

General

Profile

Support #1680

Merging: Many to one & sorting

Added by Irina Kolegova 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Data linkage and consents
Start date:
04/08/2022
% Done:

100%


Description

Hello,

I wanted to merge youth datasets for Wave 10 (UKHLS), Wave 4 (COVID) and Wave 8 (COVID).
1) Should I use "many to one" merging, "one to one", or "one to many"?
2) Do I need to sort datasets by pidp (and pidp_c) before merging them?
3) How can I unite those variables that repeat themselves in every wave (sex, ethnicity etc.)?

Thank you

#1

Updated by Understanding Society User Support Team 6 months ago

  • Private changed from Yes to No

Many thanks for your enquiry. The Understanding Society team is looking into it and we will get back to you as soon as we can.

We aim to respond to simple queries within 48 hours and more complex issues within 7 working days.

Best wishes,
Understanding Society User Support Team

#2

Updated by Understanding Society User Support Team 6 months ago

  • Status changed from New to In Progress
#3

Updated by Understanding Society User Support Team 6 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 50

Dear Irina,

1) all these files are individual level files in which pidp is a unique identifier, so this is 1:1 merge,
2) if you are using Stata version 12 or older then you need to sort first, if newer versions of Stata then you do not need to sort,
3) in wide format you by definition get extra variables preceded by a wave prefix (a_ - wave 1, b_ - wave 2 and so on) to accommodate additional time points. However, for time invariant variables (e.g. country of birth, sex, ethnicity) the value of these will be the same. One workaround is to omit these variables when merging and add them after the merge from the xwavedat file (for more information about xwavedat see https://www.understandingsociety.ac.uk/documentation/mainstage/user-guides/main-survey-user-guide/list-of-data-files-and-their-descriptions). Additionally, for sex specifically, in each youth datafile there is W_ypsex which contains the answers to the YPSEX question from the youth questionnaire. Although this would be rare, W_ypsex may vary between waves. We leave the choice which of these to use, d_sex or d_ypsex to users.

Best wishes,
Understanding Society User Support Team

#4

Updated by Understanding Society User Support Team 4 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 50 to 100

Also available in: Atom PDF