Applied statistical methods

On this page:

Detail Outputs

Background

Our applied statistical methods research programme supports and enables users to tackle some of the important challenges in using longitudinal data, including handling missing data, making causal inferences, and dealing with measurement error.

We bring together ideas and methods from a number of disciplines, such as statistics, econometrics, psychometrics, epidemiology and computer science.

We publish applied methodological papers in peer-reviewed journals and are developing a series of step-by-step user guides and training to help users apply these methods in their own research, using widely available software such as Stata and R.

Here you can find out more about our work on applied statistical methods.

Our missing data strategy

All users of longitudinal data need to handle the issue of missing data in their research since some non-response is inevitable. Strategies for how to deal with missing data depend on the nature of non-response.

Find out more in our recent webinar:

Missing data webinar 2023

Our methods

We know different types of people tend to drop out of our studies at different times, depending on their individual circumstances and characteristics. We can take account of that in the methods we use.

We have developed approaches to deal with ‘missingness’ which capitalise on the rich data cohort members provided over the years before they left the study, in order to deal with missing data and reduce bias.

We use well-known methods such as multiple imputation, inverse probability weighting and full information maximum likelihood.

These rely on the assumption that the data are missing at random (MAR), implying that systematic differences between the missing values and the observed values can be explained by observed data. Most studies employing MAR methods rely on a largely arbitrary selection of variables used as predictors of ‘missingness’, and the extent to which the plausibility of the MAR assumption is maximised for a given dataset is not known.

Download the Handling missing data in the CLS cohort studies user guide for detailed guidance on how to handle missing data in your own research, including a detailed worked example.

Causal inference

Causal inference in observational data is far from straightforward. However, the wealth of information we have collected about cohort members about the whole of their lives, and even about the circumstances of their birth, gives data users the opportunity to select rich controls for multivariable adjustment.

There are also circumstances in which causal identification may also be achieved with approaches such as instrumental variable modelling/Mendelian randomisation, regression discontinuity, and fixed effects/correlated random effects methods.

We are developing a programme of work on causal inference in our cohorts, which will use a range of methods. We will use directed acyclic graphs to represent assumptions about potential confounders, and apply techniques such as negative controls and simulations of unmeasured confounders to test the degree to which omitted variables might lead to bias.

Measurement error

Data from self-reported measures can be biased due to processes driven by cohort members’ personalities and circumstances. On the other hand, data from objective measures may also be affected by instrumental errors, for example the precision of the blood pressure device used by the nurse or laboratory variations in blood analysis. Additional sources of error arise when comparing data from multiple studies as there can be variation in how different groups interpret the same question and in response tendencies.

In our work on measurement error, we use the latest extensions of the generalised latent variable modelling framework to specify complex error structures. This allows us to investigate the properties of some key areas of measurement of our cohort data, including on physical health, mental health and cognition, and to establish within and between cohort equivalent measures.

Two ongoing projects funded by CLOSER investigate the measurement properties of mental health and cognitive ability measures in British birth cohorts.

Research projects and outputs

Data documentation

Handling missing data in the CLS cohort studies - User Guide

This user guide aims to describe and illustrate a straightforward approach to missing data handling, while detailing some more general considerations around missing data along the way.

Download

Publication

Improving the plausibility of the missing at random assumption in the 1958 British birth cohort: A pragmatic data driven approach – CLS working paper 2020/6

21 February 2019

This paper presents a systematic data-driven approach to identify predictors of non-response at each sweep of the 1958 National Child Development Study (NCDS) and demonstrates…

Download

Publication

A data driven approach to understanding and handling non-response in the Next Steps cohort – CLS working paper 2020/5

21 February 2019

This paper presents a systematic data-driven approach to identify predictors of non-response at wave 8 (age 25-26 years) in Next Steps and demonstrates that including…

Download

Large longitudinal studies: design and methodology

21 February 2019

George Ploubidis, Research Director and Chief Statistician at CLS, presented at this event at the Royal Statistical Society.

The CLS missing data strategy

21 February 2019

Here you can watch a recording of a presentation on the CLS missing data strategy, given as part of the CLOSER Longitudinal Methodology Series.

Harmonisation of mental health measures in British birth cohorts

21 February 2019

This research project aims to harmonise existing mental health measures over the life course in five British birth cohorts.

Assessment and harmonisation of cognitive measures in British birth cohorts

21 February 2019

This research project will help researchers to more accurately explore the links between cognitive ability, social background and education.

Featured scientific publications

Mostafa, T., Narayanan, M., Pongiglione, B., Dodgeon, B., Goodman, A., Silverwood, R.J., & G.B. Ploubidis, G.B. (2021)

Missing at random assumption made more plausible: evidence from the 1958 British birth cohort

Journal of Clinical Epidemiology

Read the full paper

Ploubidis, G. B., Sullivan, A., Brown, M., & Goodman, A. (2017).

Psychological distress in mid-life: evidence from the 1958 and 1970 British birth cohorts.

Psychological Medicine, 47(2), 291-303.

Read the full paper

Wiggins, R., Brown, M., & Ploubidis, G.B. (2017).

A measurement evaluation of a six item measure of quality of life (CASP6) across different modes of data collection in the 1958 National Child Development Survey (NCDS) Age 55 years.

CLS Working Paper Series. London.

Read the full paper

Contact us

Centre for Longitudinal Studies
UCL Social Research Institute

20 Bedford Way
London WC1H 0AL

Email: clsdata@ucl.ac.uk

Applied statistical methods

Background

Our missing data strategy

Missing data webinar 2023

Our methods

Causal inference

Measurement error

Research projects and outputs

Handling missing data in the CLS cohort studies - User Guide

Improving the plausibility of the missing at random assumption in the 1958 British birth cohort: A pragmatic data driven approach – CLS working paper 2020/6

A data driven approach to understanding and handling non-response in the Next Steps cohort – CLS working paper 2020/5

Large longitudinal studies: design and methodology

The CLS missing data strategy

Harmonisation of mental health measures in British birth cohorts

Assessment and harmonisation of cognitive measures in British birth cohorts

Featured scientific publications

News

CLS Bibliography

Data access & training

Contact us

Follow us