Support #1173
openClustering
100%
Description
I am trying to run some analysis using individual and household characteristics and having to deal with clustering issues. Presently, I am clustering at Pidp level on the premise that errors are unlikely to be uncorrelated within individuals. Do you think clustering at Pidp makes sense? I am also trying to cluster at household level (hidp), but household IDs change every year and it will be difficult to keep track of all these changes for a large number of observations. Do you recommend clustering at hidp level instead (despite that these change every year)?
Updated by Alita Nandi almost 6 years ago
- Status changed from New to Feedback
- Assignee changed from Alita Nandi to OLAYIWOLA OLADIRAN
- % Done changed from 0 to 50
- Private changed from Yes to No
Hello,
Please provide some more information about your analysis.
We can say that the data is clustered and stratified and the variables representing the PSU and Strata are called "psu" and "strata" in the file "xwavedat". These variables with their wave prefixes are also available in each wave file "w_psu" and "w_strata". If you are pooling data across waves then you can cluster at the PIDP level as that will include any PSU level clustering.
The household IDs are only unique within a wave and cannot be used across waves - this is because household composition changes across waves.
Best wishes,
Alita
Updated by OLAYIWOLA OLADIRAN almost 6 years ago
Alita Nandi wrote:
Hello,
Please provide some more information about your analysis.
We can say that the data is clustered and stratified and the variables representing the PSU and Strata are called "psu" and "strata" in the file "xwavedat". These variables with their wave prefixes are also available in each wave file "w_psu" and "w_strata". If you are pooling data across waves then you can cluster at the PIDP level as that will include any PSU level clustering.
The household IDs are only unique within a wave and cannot be used across waves - this is because household composition changes across waves.
Best wishes,
Alita
Thank you for your response.
We are using a multinomial model to analyse the factors that may be responsible for the housing tenure choices (ownership, rental and public housing) of individuals (using both individual and household characteristics. We are using a pooled cross-section (so we use waves 1-8, but not in panel form). Do you feel clustering at pidp level is necessary especially since we are using the pooled cross-sectional setup?
Updated by Alita Nandi almost 6 years ago
If you use a pooled cross-sectional set up then there will be more than one observation for some individuals and so the error terms will not be independently distributed for all observations in your dataset - clustered on individuals. Hence the need to cluster on pidp.
Updated by OLAYIWOLA OLADIRAN almost 6 years ago
Alita Nandi wrote:
If you use a pooled cross-sectional set up then there will be more than one observation for some individuals and so the error terms will not be independently distributed for all observations in your dataset - clustered on individuals. Hence the need to cluster on pidp.
Thank you very much.
Updated by Understanding Society User Support Team over 2 years ago
- Status changed from Feedback to Resolved
- Priority changed from Immediate to Low
- % Done changed from 90 to 100
Updated by Understanding Society User Support Team over 1 year ago
- Category changed from Data analysis to Weights