Diabetes research enhanced with new harmonised datasets

Data release
29 September 2025

Harmonised data on diabetes from five UK cohort studies are now available for the scientific community to download from the UK Data Service.

The UCL Centre for Longitudinal Studies (CLS) has harmonised data on diabetes collected from study participants born in 1946, 1958, 1970, 1989-90 and 2000-02. These new harmonised datasets bring together information on prevalence of diabetes and diabetes type, self-reported during adulthood and from parents’ and doctors’ reports in childhood.

Harmonisation is a process of recoding or standardising variables so that survey data are comparable across studies, or across multiple sweeps of the same study.

These harmonised datasets will allow researchers to combine and compare data from longitudinal studies, increasing the statistical power of analyses and enhancing cross-cohort research about people’s experiences of diabetes across generations.

What’s included in the new datasets?

The researchers harmonised data to create new variables about measures of diabetes asked at specific sweeps and also harmonised information from across sweeps using data from the following UK cohort studies:

  • The MRC National Survey of Health and Development (MRC NSHD) – 5,362 people born in England, Scotland and Wales during one week of 1946.
  • The 1958 National Child Development Study (NCDS) – 17,415 people born in England, Scotland, or Wales in a single week in 1958.
  • The 1970 British Cohort Study (BCS70) – 17,198 people born in England, Scotland, and Wales in a single week of 1970.
  • Next Steps – 16,000 people in England born in 1989/90.
  • The Millennium Cohort Study (MCS) – 19,517 children born in England, Scotland, Wales, and Northern Ireland in 2000-2002.

The sweep-specific harmonised variables cover two main areas. These include:

  • Direct questions about lifetime prevalence of diabetes: including whether the study participants has ever had diabetes, or whether they have had diabetes since the previous sweep (in which case these responses can be combined with those from previous sweeps to derive lifetime prevalence).
  • Direct questions about current diabetes: including if they currently have diabetes at the time of the sweep, or in the last 12 months.

In both cases, information from direct questions on diabetes was supplemented with reports of diabetes in response to any available questions about longstanding illness in the same sweep.

The new datasets include information harmonised from across survey sweeps to create a variable that indicates whether participants had ever reported diabetes and what type of diabetes they have. These include:

  • Cumulative ‘ever’ reported diabetes indicators: these variables include information from all sweeps where questions on diabetes have been asked. Multiple cumulative indicator variables were created reflecting whether a cohort member had ever reported diabetes up to a certain time-point.
  • Diabetes type: this variable includes information about type of diabetes reported.

The datasets focus on harmonised variables from questions that were administered to entire cohorts only and do not include information from biomarkers and linked health data.

Questions on diabetes in the two youngest cohorts (Next Steps and MCS) were restricted to information collected during the Covid-19 surveys. Therefore, information harmonised from across multiple survey sweeps focused on the three oldest cohorts (NSHD, NCDS and BCS70).

Why this new data is important

Dr Laura Gimeno (UCL Centre for Longitudinal Studies) said: “Diabetes is an increasingly common health-related condition with an estimated 4.3 million people in the UK living with diagnosed diabetes.

“By improving the accuracy and comparability of the data collected on the condition, researchers are better placed to understand when diabetes tends to occur during people’s lives, the potential risk factors and subsequent impacts on health. Through cross-cohort research, they may also be able to identify whether more recent generations are more susceptible to the disease.

“With diabetes linked to a range of serious complications including stroke and cardiovascular disease, these new datasets can provide the foundation for new analyses to help improve public health and wellbeing.”

How to access the data

The NCDS, BCS70, Next Steps and MCS harmonised datasets are available from the UK Data Service (UKDS) website under an end user licence agreement.

The NSHD dataset can be accessed by downloading the UKDS Special Licence application form. Once the form has been reviewed by UKDS and approved by the NSHD Data Sharing Committee the data will be available to download. Find out more on the UK Data Service website.

The NSHD diabetes dataset is also available from MRC Unit for Lifelong Health and Ageing at UCL (LHA), which manages the NSHD. This route of access is necessary for analysts wishing to use the diabetes data alongside other information held for the 1946 cohort. The research project needs to first be approved by the NSHD Data Sharing Committee. Full details on how to access the data can be found on the NSHD Skylark website. Once a data access form has been approved and a data sharing agreement is in place, the data can be accessed via the NSHD data sharing website.

Find out more

Further information about the harmonised diabetes datasets is available in the CLS user guide – Harmonised indicators of self-reported diabetes in five British cohort studies.


Back to news listing

Media enquiries

Ryan Bradshaw
Editorial Content Manager

Phone: 020 7612 6516
Email: r.bradshaw@ucl.ac.uk

Contact us

Centre for Longitudinal Studies
UCL Social Research Institute

20 Bedford Way
London WC1H 0AL

Email: clsdata@ucl.ac.uk

Funded by
Follow us
Index