David,

You are right that the wave 3 cross-sectional weight is a combined BHPS + GPS + EMB weight, so using it only for BHPS will give biased results.

There are two options for you:

1. Simple but crude: take longitudinal weight for BHPS for wave 3. Longitudinal weight also represents cross-sectional population (with tiny differences due to recent immigrants longitudinal and cross-sectional weights should give similar results);

2. Take the cross-sectional weight but scale it (and all other waves) such that each wave has the same weighted sample size. The explanation is below.

The aim of pooling data from different waves is often to represent events (e.g. number of events in the last 20 years). This works fine if each wave has the same number of people. As you know even BHPS does not have the same number of people (there is a boost in 1999 for example). While each single wave (once weighted using cross-sectional weight) represents the population in that year, the waves that have higher number of people will contribute to your estimates more than the waves with smaller number of people. Even before wave 3, if one uses pooled BHPS data to study events in GB over the last 20 years the years before 1999 would be underrepresented and therefore events after 1999 would make a larger contribution on your estimate.

It is easy to correct for this. 1) first calculate the weighted sample size for each wave (total of weight variable will give you this - note the weight variable should have mean of one); 2) take the average of weighted sample sizes across the waves you use; 3) divide the average by the weighted sample size for each year to get the scaling factor; 4) multiply the scaling factor for each wave by its cross-sectional weight. Use this product as a new weight for pooled data. This will ensure that each wave has the same weighted sample size and therefore each year has the same importance in your estimate. For example if one wave has weighted sample size of 1000 and another has 2000, then the average is 1500, the scaling factor for wave 1 is 1500/1000=1.5; for wave 2 is 1500/2000=0.75. The new weighted sample size (you could check this) will be the same in both waves (1500).

Treat the new BHPS + GPS + EMB sample in the same way - the scaling factor will be small for this wave and the scaling factor for BHPS waves will be over 1. But after correction your analysis will have higher precision (then if you were to not use GPS and EMB data) and will correctly and evenly represent all years. Finally, this method also corrects for differences in sample size due to non-response as well. In other words it should be used with pooled data even when there aren't sample boosts.

Hope this helps,

Olena