Support #856
closedhow to deal with contradictory records
100%
Description
Dear Understanding Society team,
I am writing to seek your suggestions about contradictory data.
The first example is in the same wave. I ran cross-tabulation between a_qfhigh (highest qualification ever achieved) and a_fenow (age when leaving school / never went to college or university). I assume that the respondents who attained university degree would not be found in the category "never went to college or university" in the variable a_fenow. However, this is not the case.
. tab a_qfhigh a_fenow if a_dvage>=25 | still in further education highest qualification | missing inapplica don't kno write in never wen at colleg | Total ----------------------+------------------------------------------------------------------+---------- missing | 1 17 0 0 0 0 | 18 refused | 0 3 0 2 1 0 | 6 don't know | 0 18 7 19 38 2 | 84 university higher deg | 0 16 3 3,805 53 236 | 4,113 1st degree level inc | 0 14 3 5,458 203 296 | 5,974 diploma in higher edu | 0 4 2 2,089 440 142 | 2,677 teaching qualificatio | 0 0 2 574 107 23 | 706 nursing or other medi | 0 1 2 556 335 37 | 931 a level | 0 11 3 1,443 1,156 152 | 2,765 welsh baccalaureate | 0 0 0 0 2 0 | 2 international baccala | 0 1 0 26 10 5 | 42 as level | 0 1 1 104 75 13 | 194 higher grade/advanced | 0 1 0 175 173 17 | 366 certificate of sixth | 0 0 0 60 49 6 | 115 gcse/o level | 0 18 10 3,198 5,037 262 | 8,525 cse | 0 1 1 529 1,360 41 | 1,932 standard/ordinary (o) | 0 1 0 183 430 21 | 635 other school (inc. sc | 0 7 0 355 967 35 | 1,364 none of the above | 1 392 16 1,676 10,915 287 | 13,287 ----------------------+------------------------------------------------------------------+---------- Total | 2 506 50 20,252 21,351 1,575 | 43,736
The category that is shown as "never wen" is actually "never went to college/university".
As can be seen from the above table. there are 203+53 respondents who had university degrees, yet reported that they never went to university.
The second example is related to cross-wave youth data. I assume that if a respondent who reported yes to "ever smoked cigarettes" in wave 1 would not say no to the same question in wave 2. However, this is what I got:
tab a_ypevrsmo b_ypevrsmo ever smoke | cigarettes | ever smoke cigarettes at all at all | missing yes no | Total -------------+---------------------------------+---------- missing | 0 1 15 | 16 yes | 2 74 39 | 115 no | 27 209 2,416 | 2,652 -------------+---------------------------------+---------- Total | 29 284 2,470 | 2,783
39 respondents who said that he/she has ever smoked in wave 1 said never smoked in wave 2.
This is getting more complicated when I link wave 1-6 together.
I understand that these are measurement errors/recall errors. I could have simply recoded these contradictory records as missing. But the numbers of these recodes may seem trivial but as they add up over waves, they are not small numbers. I am not sure what I am supposed to deal with them.
Many thanks for your time,
Regards,
Min