Support #1444

COVID-19 telephone survey household composition data

Added by Lizzie Augarde 2 months ago. Updated about 2 months ago.

Data inconsistency
Start date:
% Done:



Hi, I am working with the COVID-19 survey data. I've noticed that the household composition data for the May 2020 telephone survey look strange. For cb_hhcompa, cb_hhcompb and cc_hhcompc, all respondents have a value of "0+". Please can you clarify what this means? Given that the respondents are a range of ages, it seems surprising that none would have children in the household, is this the correct interpretation?



Updated by Alita Nandi about 2 months ago

  • Status changed from New to Feedback
  • Assignee changed from Alita Nandi to Lizzie Augarde
  • % Done changed from 0 to 80
  • Private changed from Yes to No


If you merge this file with the 2019 data file, jk_indall_cv which includes the household grid information collected for this individual in 2019, you will see that most of them report that they don't have any own children in the household. Here is the Stata syntax to check that.

// "datafolder" is the placeholder name for where you have stored the data

use pidp cb_hhcompa using "datafolder\cb_indresp_t.dta", clear
keep if cb_hhcompa==0
merge 1:1 pidp using "datafolder\mainstage_data_2019\jk_indall_cv.dta", keep(1 3)
fre jk_nchild_dv if _m==3

Hope this answers your question. If not, please let us know.
Best wishes,
Understanding Society User Support Team


Updated by Lizzie Augarde about 2 months ago


Thank you very much for this. I'm afraid I'm still a bit confused. When I run this command there are 120 respondents whose pidp does not merge (_m==1). Also, some respondents do report having their own children in the household in the 2019 data, but all except 2 respondents in the May 2020 survey are coded as "0+" on all 3 household composition variables relating to under 18s. Also, some respondents in the May 2020 survey are aged 16-18 so I think should have the cb_hhcompb variable coded differently? I have attached stata output to clarify.

Please can you advise what "0+" means and why all respondents are coded this way?

Thank you,

Lizzie Augarde


Updated by Alita Nandi about 2 months ago

The unmatched cases are those who were not interviewed in 2019 (either as part of Wave 10 or 11). So, I matched with main survey Wave 9 and Wave 10 (not yet released) data. I could match 709 of the 718 May telephone respondents and 53 of them reported at least one co-resident child in at least Wave 9 or 10. We are looking into why hhcompa-c is zero for all of these cases.

Also available in: Atom PDF