Applied statistical methods

Background

CLS’ applied statistical methods research programme supports and enables users to tackle some of the important challenges in using longitudinal data, including:

  • missing data
  • causal inferences
  • measurement error and
  • survey mode effects.

We bring together ideas and methods from a number of disciplines, such as statistics, econometrics, psychometrics, epidemiology and computer science.

We publish applied methodological papers in peer-reviewed journals and are developing a series of step-by-step user guides and training to help users apply these methods in their own research, using widely available software such as Stata and R.

Here you can find out more about our work on applied statistical methods.

Handling missing data

All users of longitudinal data need to handle the issue of missing data in their research since some non-response is inevitable. Strategies for how to deal with missing data depend on the nature of non-response.

Find out more in our recent webinar:

Missing data webinar 2023

Our methods

We know different types of people tend to drop out of our studies at different times, depending on their individual circumstances and characteristics. We can take account of that in the methods we use.

We have developed approaches to deal with ‘missingness’ which capitalise on the rich data cohort members provided over the years before they left the study, in order to deal with missing data and reduce bias.

We use well-known methods such as multiple imputation, inverse probability weighting and full information maximum likelihood.

These rely on the assumption that the data are missing at random (MAR), implying that systematic differences between the missing values and the observed values can be explained by observed data. Most studies employing MAR methods rely on a largely arbitrary selection of variables used as predictors of ‘missingness’, and the extent to which the plausibility of the MAR assumption is maximised for a given dataset is not known.

Download the Handling missing data in the CLS cohort studies user guide for detailed guidance on how to handle missing data in your own research, including a detailed worked example.

Causal inference

Causal inference in observational data is far from straightforward. However, the wealth of information we have collected about cohort members about the whole of their lives, and even about the circumstances of their birth, gives data users the opportunity to select rich controls for multivariable adjustment.

There are also circumstances in which causal identification may also be achieved with approaches such as instrumental variable modelling/Mendelian randomisation, regression discontinuity, and fixed effects/correlated random effects methods.

We are developing a programme of work on causal inference in our cohorts, which will use a range of methods. We will use directed acyclic graphs to represent assumptions about potential confounders, and apply techniques such as negative controls and simulations of unmeasured confounders to test the degree to which omitted variables might lead to bias.

Measurement error

Data from self-reported measures can be biased due to processes driven by cohort members’ personalities and circumstances. On the other hand, data from objective measures may also be affected by instrumental errors, for example the precision of the blood pressure device used by the nurse or laboratory variations in blood analysis. Additional sources of error arise when comparing data from multiple studies as there can be variation in how different groups interpret the same question and in response tendencies.

In our work on measurement error, we use the latest extensions of the generalised latent variable modelling framework to specify complex error structures. This allows us to investigate the properties of some key areas of measurement of our cohort data, including on physical health, mental health and cognition, and to establish within and between cohort equivalent measures.

Two ongoing projects funded by Closer  investigate the measurement properties of mental health and cognitive ability measures in British birth cohorts.

Survey mode effects

Each of CLS’ cohort studies contains elements of mixed mode data collection. This can include: carrying out interviews via face-to-face, telephone, video and/or web survey.

The potential advantages of mixed mode data collection are lower costs, increased efficiency, and higher participation rates.

However, participants’ responses may differ systematically between survey modes used – this is termed “mode effects”. For instance, the presentation of a survey item either aurally or visually can influence responses and sensitive information may be reported more accurately when given anonymously.

Unaccounted for, mode effects may lead to bias in analyses.

User guide

To help data users work with mixed mode data, we have developed a comprehensive Handling Mode Effects user guide.

This guide:

  • provides frameworks and relevant empirical evidence to help researchers think about the possible consequences of mode effects in their own analyses
  • describes methods for handling mode effects, including their strengths and limitations
  • highlights sensitivity analysis as a particularly promising approach
  • provides walkthroughs for these methods with code in R and Stata
  • contains recommendations data users may want to follow in their own work.

Download the Handling Mode Effects user guide.

Research projects and outputs

User guide

Handling mode effects in the CLS cohort studies user guide (Nov 2024)

26 November 2024

A user guide which provides guidance and recommendations for handling mode effects in CLS’ cohort studies through applied examples, using data from the National Child…

Download
User guide

Handling missing data in the CLS cohort studies - User Guide

1 May 2024

This user guide aims to describe and illustrate a straightforward approach to missing data handling, while detailing some more general considerations around missing data along…

Download

Featured scientific publications

Silverwood, R.J., Calderwood, L., Henderson, M., Sakshaug, J.W. & Ploubidis, G.B.(2024)
A data-driven approach to understanding non-response and restoring sample representativeness in the UK Next Steps cohort
Longitudinal and Life Course Studies
Read the full paper
Narayanan, M.K., Dodgeon, B., Katsoulis, M., Ploubidis, G.B. & Silverwood, R.J. (2024)
How to mitigate selection bias in COVID-19 surveys: evidence from five national cohorts
European Journal of Epidemiology
Read the full paper
Katsoulis, M., Narayanan, M.K., Dodgeon, B., Ploubidis, G.B. & Silverwood, R.J. (2024)
A data driven approach to address missing data in the 1970 British birth cohort
medRxiv
Read the full paper
Rajah, N., Calderwood, L., De Stavola, B.L., Harron, K., Ploubidis G.B. & Silverwood, R.J. (2023)
Using linked administrative data to aid the handling of non-response and restore sample representativeness in cohort studies: the 1958 national child development study and hospital episode statistics data
BMC Medical Research Methodology
Read the full paper
Goodman, A., Brown, M., Silverwood, R.J., Sakshaug, J.W., Calderwood, L., Williams, J. & Ploubidis, G.B. (2022)
The Impact of Using the Web in a Mixed-Mode Follow-up of a Longitudinal Birth Cohort Study: Evidence from the National Child Development Study
Journal of the Royal Statistical Society
Read the full paper
Mostafa, T., Narayanan, M., Pongiglione, B., Dodgeon, B., Goodman, A., Silverwood, R.J., & G.B. Ploubidis, G.B. (2021)
Missing at random assumption made more plausible: evidence from the 1958 British birth cohort
Journal of Clinical Epidemiology
Read the full paper
Contact us

Centre for Longitudinal Studies
UCL Social Research Institute

20 Bedford Way
London WC1H 0AL

Email: clsdata@ucl.ac.uk

Follow us