Applied statistical methods

On this page:

Background

Our applied statistical methods research programme supports and enables users to tackle some of the important challenges in using longitudinal data, including handling missing data, making causal inferences, and dealing with measurement error. We bring together ideas and methods from a number of disciplines, such as statistics, econometrics, psychometrics, epidemiology and computer science.

We publish applied methodological papers in peer-reviewed journals and are developing a series of step-by-step user guides and training to help users apply these methods in their own research, using widely available software such as Stata and R.

Here you can find out more about our work on applied statistical methods.

Our missing data strategy

All users of longitudinal data need to handle the issue of missing data in their research since some non-response is inevitable. Strategies for how to deal with missing data depend on the nature of non-response.

We know different types of people tend to drop out of our studies at different times, depending on their individual circumstances and characteristics, and so we can take account of that in the methods we use. We have developed approaches to deal with ‘missingness’ which capitalise on the rich data cohort members provided over the years, before they left the study, in order to deal with missing data and reduce bias.

We use well known methods such as multiple imputation, inverse probability weighting and full information maximum likelihood. These rely on the assumption that the data are missing at random (MAR), implying that systematic differences between the missing values and the observed values can be explained by observed data. Most studies employing MAR methods rely on a largely arbitrary selection of variables used as predictors of ‘missingness’, and the extent to which the plausibility of the MAR assumption is maximised for a given dataset is not known. We have implemented a systematic data-driven approach to predict non-response and are currently developing a user guide which will provide detailed guidance on how to adopt this approach in your own research.

Causal inference

Causal inference in observational data is far from straightforward. However, the wealth of information we have collected about cohort members about the whole of their lives, and even about the circumstances of their birth, gives data users the opportunity to select rich controls for multivariable adjustment.

There are also circumstances in which causal identification may also be achieved with approaches such as instrumental variable modelling/Mendelian randomisation, regression discontinuity, and fixed effects/correlated random effects methods.

We are developing a programme of work on causal inference in our cohorts, which will use a range of methods. We will use directed acyclic graphs to represent assumptions about potential confounders, and apply techniques such as negative controls and simulations of unmeasured confounders to test the degree to which omitted variables might lead to bias.

Measurement error

Data from self-reported measures can be biased due to processes driven by cohort members’ personalities and circumstances. On the other hand, data from objective measures may also be affected by instrumental errors, for example the precision of the blood pressure device used by the nurse or laboratory variations in blood analysis. Additional sources of error arise when comparing data from multiple studies as there can be variation in how different groups interpret the same question and in response tendencies.

In our work on measurement error, we use the latest extensions of the generalised latent variable modelling framework to specify complex error structures. This allows us to investigate the properties of some key areas of measurement of our cohort data, including on physical health, mental health and cognition, and to establish within and between cohort equivalent measures.

Two ongoing projects funded by CLOSER investigate the measurement properties of mental health and cognitive ability measures in British birth cohorts.

Research projects and outputs

Large longitudinal studies: design and methodology

George Ploubidis, Research Director and Chief Statistician at CLS, presented at this event at the Royal Statistical Society.

The CLS missing data strategy

Here you can watch a recording of a presentation on the CLS missing data strategy, given as part of the CLOSER Longitudinal Methodology Series.

Harmonisation of mental health measures in British birth cohorts

This research project aims to harmonise existing mental health measures over the life course in five British birth cohorts.

Assessment and harmonisation of cognitive measures in British birth cohorts

This research project will help researchers to more accurately explore the links between cognitive ability, social background and education.

Featured scientific publications

Ploubidis, G. B., Sullivan, A., Brown, M., & Goodman, A. (2017).
Psychological distress in mid-life: evidence from the 1958 and 1970 British birth cohorts.
Psychological Medicine, 47(2), 291-303.
Read the full paper
Wiggins, R., Brown, M., & Ploubidis, G.B. (2017).
A measurement evaluation of a six item measure of quality of life (CASP6) across different modes of data collection in the 1958 National Child Development Survey (NCDS) Age 55 years.
CLS Working Paper Series. London.
Read the full paper
Contact us

Centre for Longitudinal Studies
UCL Institute of Education

20 Bedford Way
London WC1H 0AL

Email: clsfeedback@ucl.ac.uk