All users of longitudinal data need to handle the issue of missing data in their research since some non-response is inevitable. Strategies for how to deal with missing data depend on the nature of non-response.
We know different types of people tend to drop out of our studies at different times, depending on their individual circumstances and characteristics, and so we can take account of that in the methods we use. We have developed approaches to deal with ‘missingness’ which capitalise on the rich data cohort members provided over the years, before they left the study, in order to deal with missing data and reduce bias.
We use well known methods such as multiple imputation, inverse probability weighting and full information maximum likelihood. These rely on the assumption that the data are missing at random (MAR), implying that systematic differences between the missing values and the observed values can be explained by observed data. Most studies employing MAR methods rely on a largely arbitrary selection of variables used as predictors of ‘missingness’, and the extent to which the plausibility of the MAR assumption is maximised for a given dataset is not known. We have implemented a systematic data-driven approach to predict non-response and have published a user guide which provides detailed guidance on how to adopt this approach in your own research using NCDS.