BCS70 response and missingness

BCS70 response

The table below presents statistics about response in BCS70 at every major sweep from birth.

Of the 16,589 cohort members who participated in the first sweep, 2,763 (16.7%) have participated in all 11 major sweeps.

Of all 18,038 cohort members, 11,300 (62.6%) have taken part in at least half the sweeps (ie six or more sweeps).

Table: Participation in BCS70 from birth to 51 years

Total cohort Dead Emigrants Eligible sample Participants (% of eligible sample)
Birth – 1970 16,606 0 0 16,606 16,589 99.9
Age 5 – 1975 16,959a 567 0 16,392 13,135 80.1
Age 10 – 1980 17,799a 589 0 17,210 14,870 86.4
Age 16 – 1986 18,038a 622 0 17,416 11,615 66.7
Age 26 – 1996 18,038 716 35 17,287 9,003 52.1
Age 30 – 2000 18,038 766 235 17,037 11,261 66.1
Age 34 – 2004 18,038 819 432 16,787 9,665 57.6
Age 38 – 2008 18,038 881 456 16,701 8,874 53.1
Age 42 – 2012 18,038 965 433 16,640 9,841 59.1
Age 46 – 2016 18,038 985 466 16,587 8,581 51.7
Age 51 – 2021 18,038 1,047 242 16,749 8,016 47.9

a The original sample was supplemented by migrants born in 1970.

Figure 1: BCS70 response (as % of eligible sample) over time

Graph showing BCS70 response as percentage of eligible sample over time

Missingness in BCS70

BCS70 variables

We have used a systematic data-driven approach to identify variables that are important predictors of non-response at each BCS70 sweep between age 5 and age 46. Future work will extend this analysis to include non-response at the age 51 sweep.

A list of these variables is provided in the appendix of the Handling missing data in the CLS cohort studies user guide.

These variables can then be considered for inclusion in analyses – for example as auxiliary variable when using multiple imputation – in order to maximise the plausibility of the missing at random (MAR) assumption.

Read more about the process in the preprint: A data-driven approach to address missing data in the 1970 British birth cohort.

Example: restoring the composition of BCS70 at age 46

We have been able to restore the composition of the BCS70 sample at age 46 to be more representative of the study’s target population. We did this by including predictors of non-response at age 46 as auxiliary variables in multiple imputation analyses.

For example, we were able to replicate:

  • the original distribution of paternal social class observed at the birth survey
  • the distribution of cognitive ability at age five

Figure 2: Social class of mother’s husband at birth before and after adjustment for missing data

Graph showing social class of mother’s husband at birth before and after adjustment for missing data.

The imputation phase of this analysis included predictors of non-response at age 46 and social class at birth only for cohort members that participated at age 46.

Figure 3: Mean cognitive ability at age 5 before and after adjustment for missing data

Graph showing mean cognitive ability at age 5 before and after adjustment for missing data

The imputation phase of this analysis included predictors of non-response at age 46 and cognitive ability at age 5 only for cohort members that participated at age 46.

Useful documents

Katsoulis, M., Narayanan, M., Dodgeon, B., Ploubidis, G., & Silverwood, R. (2024)
A data driven approach to address missing data in the 1970 British birth cohort
medRxiv
Read the full paper

Handling missing data

Find out more about the approaches that CLS recommends for the handling of missing data, and resources available, on the Handling missing data page.

Contact us

Centre for Longitudinal Studies
UCL Social Research Institute

20 Bedford Way
London WC1H 0AL

Email: clsdata@ucl.ac.uk

Follow us