Project

General

Profile

Support #1173

Clustering

Added by OLAYIWOLA OLADIRAN over 1 year ago. Updated over 1 year ago.

Status:
Feedback
Priority:
Immediate
Category:
Data analysis
Start date:
03/27/2019
% Done:

90%


Description

I am trying to run some analysis using individual and household characteristics and having to deal with clustering issues. Presently, I am clustering at Pidp level on the premise that errors are unlikely to be uncorrelated within individuals. Do you think clustering at Pidp makes sense? I am also trying to cluster at household level (hidp), but household IDs change every year and it will be difficult to keep track of all these changes for a large number of observations. Do you recommend clustering at hidp level instead (despite that these change every year)?

#1

Updated by Alita Nandi over 1 year ago

  • Status changed from New to Feedback
  • Assignee changed from Alita Nandi to OLAYIWOLA OLADIRAN
  • % Done changed from 0 to 50
  • Private changed from Yes to No

Hello,

Please provide some more information about your analysis.

We can say that the data is clustered and stratified and the variables representing the PSU and Strata are called "psu" and "strata" in the file "xwavedat". These variables with their wave prefixes are also available in each wave file "w_psu" and "w_strata". If you are pooling data across waves then you can cluster at the PIDP level as that will include any PSU level clustering.

The household IDs are only unique within a wave and cannot be used across waves - this is because household composition changes across waves.

Best wishes,
Alita

#2

Updated by OLAYIWOLA OLADIRAN over 1 year ago

Alita Nandi wrote:

Hello,

Please provide some more information about your analysis.

We can say that the data is clustered and stratified and the variables representing the PSU and Strata are called "psu" and "strata" in the file "xwavedat". These variables with their wave prefixes are also available in each wave file "w_psu" and "w_strata". If you are pooling data across waves then you can cluster at the PIDP level as that will include any PSU level clustering.

The household IDs are only unique within a wave and cannot be used across waves - this is because household composition changes across waves.

Best wishes,
Alita

Thank you for your response.
We are using a multinomial model to analyse the factors that may be responsible for the housing tenure choices (ownership, rental and public housing) of individuals (using both individual and household characteristics. We are using a pooled cross-section (so we use waves 1-8, but not in panel form). Do you feel clustering at pidp level is necessary especially since we are using the pooled cross-sectional setup?

#3

Updated by Alita Nandi over 1 year ago

If you use a pooled cross-sectional set up then there will be more than one observation for some individuals and so the error terms will not be independently distributed for all observations in your dataset - clustered on individuals. Hence the need to cluster on pidp.

#4

Updated by OLAYIWOLA OLADIRAN over 1 year ago

Alita Nandi wrote:

If you use a pooled cross-sectional set up then there will be more than one observation for some individuals and so the error terms will not be independently distributed for all observations in your dataset - clustered on individuals. Hence the need to cluster on pidp.

Thank you very much.

#5

Updated by Stephanie Auty over 1 year ago

  • % Done changed from 50 to 90

Also available in: Atom PDF