Dear Support group,

I am measuring clinical depression and I would kindly need your advice on a couple of questions. I apologise sincerely for putting immediate priority on this, but your answer might also have implications for a paper I am co-authoring within the Understanding Society EU Referendum project and we have a deadline shortly for submitting the paper.

As I am interested in objective depression, I was using the questions H_COND17 and H_CONDS17 to create a measure of depression. What I was doing is to assign value 1 to respondents who replied that they still have depression in H_CONDS17=Yes (as I am interested in the effects of depression, I do not care much if the person was diagnosed with depression at some point in his/her life - i.e. H_COND17=Yes - but rather it is important that the person is depressed at the time of the interview). I assign value 0 if the respondent mentioned that he/she has never been diagnosed with depression in H_COND17=No.

So far I was using data from waves 1, and 3 to 6 as I noticed that these two variables are available in all waves but wave 2 (, where instead a slightly different question is asked: H_CONDN17. In turn, this question is not available in all waves and sometimes is asked together with the previous two questions (e.g.,

My questions thus are the following. Do you please know what is the reason of such a variation and, more importantly, can I "maximise" my number of depressives by creating a measure of depression that combines both sets of questions (i.e., H_COND17 and H_CONDS17, and H_CONDN17) and makes use of all available waves (i.e. 1 to 6)?

My idea was to do the following:

gen depression = .

replace depression = 1 if hconds17==1 | hcondn17==1

replace depression = 0 if hcond17==0 | hcondn17==0

However, I wonder how problematic can be mixing questions that are not available in all waves, as this is certainly a point that reviewers will raise. I would really appreciate your thoughts on this.

Many thanks and best wishes,


