Millennium Cohort Study data included in first-ever DNA dataset for childhood research

4 March 2025

For the first time, large-scale DNA sequence data on three UK birth cohort studies has been released, creating a unique resource to explore the relationship between genetic and environmental factors in child health and development.

The dataset includes high-resolution DNA sequencing data for over 37,000 children and parents who are taking part in the Millennium Cohort Study (MCS), based at CLS, as well as two other UK birth cohort studies: Children of the 90s (ALSPAC) and Born in Bradford (BiB).

The data release is led by the Wellcome Sanger Institute, and supported by the Medical Research Council (MRC) and the Economic and Social Research Council (ESRC).

This work is supported by the ongoing efforts of Population Research UK, a UK-wide initiative led by teams at UCL and the University of Bristol, which aids longitudinal population studies by working to coordinate and connect the current research landscape.

High-quality genomic data

Now available on the European Genome-phenome Archive (EGA), these high-quality genomic data can be used in combination with the existing longitudinal health and survey information provided by participating families in MCS, ALSPAC and BiB. These combined data resources offer the scientific community the opportunity to make valuable insights in areas ranging from population genetics to the social sciences.

For example, it could be used to investigate the impact of genetic variation on neurodevelopmental conditions or childhood obesity, and how these are influenced by environmental factors.

Sequencing birth cohorts

Longitudinal research follows large numbers of participants over multiple years, repeatedly examining them at regular time points through, for example, blood tests, body measurements, and health questionnaires, to detect changes over time.

Previously, large DNA sequence datasets have typically focused on children with rare conditions or adult population cohorts. This new data release focuses on sequencing ‘birth cohorts’, which are population-based cohorts of people followed from birth through to adolescence or early adulthood.

To produce this latest data release, researchers at the Sanger Institute sequenced all 20,000 genes in the human genome, known as exome sequencing, in samples from 7,667 children and 6,925 parents from MCS, 8,436 children and 3,215 parents from the Children of the 90s study, and 8,784 children and 2,875 parents from BiB.

These three UK longitudinal birth cohort studies are internationally recognised and data from these cohorts have already been used to study the contribution of common genetic variants on phenotypes ranging from childhood obesity to parental nurturing behaviours and anxiety and depression.

For example, by using Children of the 90s data, researchers found that a genetic variant in a gene called MC4R is associated with increased weight across childhood and studies like this could help design effective weight management interventions and change the way society views obesity.

The team has made the anonymised data as accessible as possible to approved researchers, including drafting a data note (available on Wellcome Open Research) and other materials to help support its use by those who are less familiar with large-scale sequencing data.

In coming months, this DNA sequence data resource will be expanded to encompass all participants in these cohorts as well as additional cohorts. The value of these data will be enhanced by harmonising the data across the different cohorts, providing a more powerful resource than could be achieved by one study in isolation.

“I am delighted that genetic data from the Millennium Cohort Study is part of this groundbreaking DNA database.”

Professor Emla Fitzsimons, Director of the Millennium Cohort Study

Professor Emla Fitzsimons, Director of the Millennium Cohort Study, said: “I am delighted that genetic data from the Millennium Cohort Study (MCS) is part of this groundbreaking DNA database. While previous genetic data resources have primarily focused on adult population cohorts, this new dataset will provide crucial insights into how genetic factors influence health and development throughout childhood and adolescence.

“By integrating genetic data from MCS participants with longitudinal information from questionnaires, health records, and cognitive assessments, this database offers an unparalleled opportunity to explore the lives of Generation Z.

“The inclusion of whole-exome sequencing data further strengthens our ability to investigate the genetic and environmental determinants of complex traits and diseases across the life course within a nationally representative population. This resource will be invaluable for researchers examining how genetic variation interacts with environmental factors to shape young people’s mental and physical health, educational attainment, socioeconomic outcomes, and overall well-being.”

Professor Matthew Hurles, Director of the Wellcome Sanger Institute, said: “Great science is built on collaboration and this release would not have been possible without the engagement of the families themselves, the hard work of teams managing these longitudinal studies, sustained investment in these cohorts, especially from Wellcome and the Medical Research Council, the sequencing and data analysis power of the Wellcome Sanger Institute, and the support of Population Research UK. We aim to continue to build on this resource and provide high-quality, accessible genomic data for researchers worldwide. This initiative further exemplifies the vast potential of bringing together the UK’s life science assets including committed research participants, researchers, governmental and charitable funding agencies, and genomic and computational capabilities.”

Media coverage

BBC Radio 4 – Today programme

Independent – New DNA database could unlock secrets behind childhood disease

Further information

Find out more about the data release on the Wellcome Sanger Institute website.

The data are available for free to approved researchers worldwide, via the European Genome-phenome Archive (EGA). To access, please visit the EGA website.

The EGA study accession numbers are:

MCS (study: EGAS00001007789): dataset EGAD00001015372

ALSPAC (study: EGAS00001005273): dataset EGAD00001015371,

BiB (study: EGAS00001006978): dataset EGAD00001015370

Millennium Cohort Study data included in first-ever DNA dataset for childhood research

High-quality genomic data

Sequencing birth cohorts

Media coverage

Further information

Media enquiries

Contact us

Funded by

Follow us