Missing/Inapplicable Data on Physical Activity

Dear Support Team,

Hello! I am currently looking into the data on people's physical activity. I see that in waves 7, 9, 11, and 12, the survey put all kinds of physical activities into three categories, namely walking, moderate, and vigorous activities. Data related to these categories are extremely helpful for my research.

However, a lot of these variables have so many missing/inapplicable data values that they are hardly usable. Some examples of such variables include wwmin, wwhrs, mwhrs, vwhrs, etc.

My questions are: 1) the reason why a majority of data values are missing/inapplicable, given that there is a "Don't know" option for people to choose; 2) do you have any suggestions on alternative variables I can look into on people's physical activities (ideally for more than 1 waves)?

Thank you very much for your help!


The high number of missing values on the variables you listed is expected as these were asked only when the corresponding x dhrs and x dmin equaled -1 "don't know" (where x is m or w or v). Take wwhrs as an example, the universe for this variable is:

If (WDAY > 0) // R walked in last 7 days
And If (WDHRS = DK & WDMIN = DK) // Hours and minutes of walking is not known

So, the information on the usual weekly activity comes from a combination of wday, wdhrs and wdmin, and only when both wdhrs and wdmin equal -1, wwhrs is asked and can be used as a proxy.
Furthermore, these questions are part of the International Physical Activity Questionnaire (IPAQ), so you may want to read more about this tool, e.g. here:

I hope this helps.

Best wishes,
Piotr Marzec
UKHLS User Support


Updated by Pengyu Li about 1 year ago

Dear Piotr,

Your answer is incredibly helpful.

Thank you very much for your kind help and support.

Best Wishes,

Pengyu Li


