Global Genome Project

One of the major faultlines Covid-19 has revealed have been institutional. Poor leadership and outdated systems have meant that many Western nations were ill-prepared and too slow off the mark in responding. On the other hand nations such as Taiwan and South Korea that took the potential risk seriously and have technologically sophisticated cultures, were quick and agile. The divergence in health outcomes has been stark: as John Micklethwait and Adrian Wooldridge set out in The Wake Up Call, London and Seoul are cities of similar size. Yet, the former has seen more than 6,000 people lose their lives. The latter, less than 20.

Both places are now living with the virus, and will likely be doing so until a vaccine is deployed. But as most nations still grapple with Covid-19 in some form, some positive scientific stories are emerging. One of these is genomics, which in its simplest terms, is the study of an organisms complete set of DNA – or put another way, a QR code of our genes. After decades of progress – albeit slower than many anticipated – the reduced cost and ability to do parallel, rather this field has played a significant epidemiological role since the first reports of a novel pneumonia emerged in Wuhan, China, in December 2019.

By the 5th January the first viral genome had been obtained from a patient, which revealed that the virus was part of the Coronaviridae family. This was then submitted to the US National Center for Biotechnology Information’s (NCBI) genetic sequence database, GenBank, released on the open access site Virological, while the Chinese Center for Disease Control and Prevention (China CDC) provided SARS-CoV-2 genome sequences on the public access GISAID database. This meant scientists and researchers around the world could begin studying and analysing Covid-19. Today, this now has nearly 95,000 sequences, shared by around half the countries in the world, increasing our understanding of the virus, its spread, as well as in developing diagnostics and, perhaps more importantly, the vaccines we desperately need.

Large, open access data such as this has meant that open source projects such as NextStrain, which was originally developed to build a real-time understanding of the flu, have shown the pathogen evolution and its mutations, as well as the epidemic spread. Through this work, researchers are able to ascertain transmission chains, trace origin, as well as estimate its biological evolutionary rate. Nextstrain’s latest report for example, sets out some of the detail so far, including:

In Asia, there were numerous between-country transmissions early in the pandemic. As with most regions, this has now shift to within-country links today.
The virus spread extremely quickly through Europe, with a heavy mixing of samples making it hard to distinguish how it was introduced. Travel restrictions have meant more distinct variations can be identified.
A similar tale is also apparent in Africa, where travel restrictions have curtailed mixing. The USA on the other hand, which has not really implemented domestic travel restrictions, does have mixing.

There are some limitations to this data: the predominant bulk of data is still being generated in developed nations. However, where surveillance infrastructure is incomplete or limited, being able to sequence even just a handful of genomes can potentially provide valuable insights into transmission, origins and much more. Scaling up all nations ability to do this should therefore be a priority, particularly given the relatively low cost of sequencing today. But a particular focus needs to go in the developing world.

Chapter 1

Development of the field

How we got here is one of the great stories of international collaboration in natural sciences in recent history, even if it has also been subject to both over-hype and dashed expectations along the years. First announced by the US, France, Germany, the UK and Japan joined the initial consortium in 1996. By the time the full sequencing had been announced, 18 countries had contributed. Heralding the results, President Bill Clinton said it was “the language in which God created life,” while our Executive Chairman, Tony Blair, stated that it was the “first great technological triumph of the 21st Century,“ adding that ”every so often in the history of human endeavour there comes a breakthrough that takes mankind across the frontier and into a new era.“

The project cost around $3 billion in investment and as the National Institutes of Health has set out: it has already fuelled the discovery of more than 1,800 disease genes; researchers can today find a gene suspected of causing an inherited disease in a matter of days; and at least 350 biotechnology-based products resulting from the Human Genome Project are currently in clinical trials. The Cancer Genome Atlas which sprung out of this research also now holds more than 2.5 peta-bytes of data that has been collected from more than 30,000 patients. The impact was also not solely related to health – as the MIT economists Jonathan Gruber and Simon Johnson have laid out, by 2012 human genome sequencing accounted for “an estimated 280,000 jobs” in the US alone.

From costing $3 billion 20 years ago, firms such as Oxford Nanopore, 23andMe and Sophia Genetics now offer various forms of Genotyping – a less complete type of analysis - for around $1,000. However, the early hopes that the biomedical advances that it would bring would take place at the speed of computing hardware advances, have not been borne out, not least in the field of drug discovery. But with the significant decrease in costs as these technologies have been commercialized – as well as advances in Artificial Intelligence and Machine Learning to make sense of the scale of the data collected – there has been a renewed push by many countries to collect genetic data.

Chapter 2

Millions of genomes

In 2018, Australia announced $500 million over the next decade to the Genomics Health Futures Mission, the nation’s first national human genome project. This would focus on “new and expanded clinical flagship studies to tackle rare diseases, rare cancers and complex conditions,” as well as “new clinical trials and technology applications allowing Australian patients to benefit from the latest medical research.”

Similarly, China launched an 100,000 Genomes Project in 2017, identifying the biotech industry and genomics as one of the strategic emerging industries in the country’s 13th Five–Year Plan, while European nations are making significant efforts both individually and as part of the European Union’s wider plans. Specifically, as part of the Digital Single Market strategy, the bloc has announced a 1+ Million Genomes Initiative. The objective of this is to have one million sequenced genomes by 2022; link access to databases; and provide sufficient scale for clinical research. To join up these efforts, they are also looking at the use of AI to “support an initiative on linking genomics.”

In the US, the NIH’s All of Us is also aiming to enroll 1 million people to build their genetic database, while the UK is also currently one of the front runners in this area. The establishment of Genomics England in 2013, set out the landmark 100,000 Genomes Project to sequence patients with rare diseases and cancers.

Last year, the Health Secretary, Matt Hancock, then raised this ambition further. The aim is that the NHS and the UK Biobank will sequence five million genomes over the next five years, while as part of the NHS efforts to deliver personalised medicine it is the National Genomic Medicine Service. In a conference in November 2019, he also said it was his “ambition is that eventually every child will be able to receive whole genome sequencing along with the heel prick test.” As it stands, Genomics England has over 100,000 genomes and over 2.5 billion clinical data points, with researcher using this data to identify genetic mutation signatures associated with certain cancers. This could be transformational in delivering on a plan to diagnose three-quarters of all cancers at stage 1 or 2 by 2028.

As part of the government’s response to the current crisis, the government also announced the Covid-19 Genomics UK (COG-UK) consortium, which includes partners such as the Sanger Institute. The project has more than 30,000 SARS-CoV-2 genomes, making the UK the biggest producer of genome data. A study produced by the consortium has also shed light on the role inbound international travel played in transmission in the UK, identifying more than 1,300 cases in the spread of the virus, predominantly from Europe.

Chapter 3

Global genome project

This increased focus on genomics is part of a broader attempted shift towards personalised and preventative healthcare. Rather than focus on healthcare, it’s about keeping people healthy. For example, 80% of rare diseases have a genetic component – understanding DNA therefore provides the opportunity to find what it was responsible, potentially opening up the gateway to far more precise targeting of treatment, or even eradication.

The ultimate objective of pursuing science and discovery in this way is to not only expand lifespans, but also the quality of life. As we have written previously, this is the human grand challenge progressives should be pursuing today. However, achieving this needs to be a collective endeavour. And if the success of genomic epidemiology has been the result of open access and data sharing, far more efforts should go into wider collaboration in this field. Some of this has already begun.

For example, the UK has signed a Memorandum of Understanding with France, with cooperation between Genomics England and the Médecine Génomique focussed specifically on standardisation on the basis that the countries “share the ambition of building and operating the most advanced a competitive Genomic Healthcare and Research in the World.” An international policy and research organisation, the Global Alliance for Genomics and Health also exists and is funded and sponsored by various national genome bodies as well as private sector organisations.

Separately, the P3G2 research project is dedicated to developing the infrastructure to optimise cross-border access and use. In particular, it is looking at the ethical and governance frameworks of biobanks, which exist in more than 20 countries around the world today. Thinking needs to accelerate in this area, with the UK well-placed to be at the forefront of this conversation. A key element must also be on ensuring that developing nations are also not left behind, with a deep commitment to inclusiveness and diversity of data. The declining cost of sequencing should mean that everyone in the world should be able to obtain their own information and it be part of their health profile in the future.

Figure 1

Global network of biobanks

https://www.nature.com/articles/s41591-019-0727-5

However, if genetics alone is not going to be the answer the remit of organisations must expand beyond it. The potential of the multi-omics of transcriptomics, epigenomics, proteomics, microbiomics and more to increase our understanding of biology on a systems level, as well as the role that environmental, social and lifestyle factors play. For example, in looking at precision nutrition, my colleague Hermione Dace has written that:

“Breakthroughs in areas of science such as metabolomics and microbiomics, and new technologies such as apps and wearables enable us to more easily measure the factors that affect nutrition. These devices, in combination with algorithms, theoretically enable us to tailor nutritional advice to each person based on their individual profile.”

It shows that together, all these areas present a unique opportunity to delve deeper and deeper into discovering the individual elements that influence our health, but also how our whole system as well as the external environment does too.

It is important that these areas work in unison to greater understand how the flow of information; from the original cause of disease, whether that is genetic, environmental, or developmental, to the consequences of it. A National Multi-omics Consortium has been developed in the UK to look at this issue, but as we will explore in more detail soon, a globalomics group should be the ambition to accelerate our understanding.

Given the breakthroughs in AI and Machine Learning, we should be better placed to tackle the bigness of biomedicine in this way, but nations need to commit to building the right type of data infrastructure to enable this. Given the progress and success as well as the limitations of genomics alone, it should be the next frontier for health.

Article Tags

Science & Innovation