Whether in sickness or in health, the human body is a data-generating machine. In NHS consultation rooms, hospital beds and telemedicine portals across the country, information is gathered about a person’s health, history, outcomes and course of treatment at unimaginable scale. Effective use of these data allow clinicians to deliver personalised medicine – where treatment is customised to an individual patient – in a way not previously possible. However, from hurried notes on the back of a patient’s EEG to in-pharmacy vaccinations that never make it onto the medical record, the collection of this detail still leaves a lot to be desired.
Moreover, updated clinical-data requirements, multiple coding languages and fragmented processing systems have fractured the UK’s health-data landscape, making information sharing across settings and between allied professionals unnecessarily difficult. The data sets policymakers rely upon are rife with inconsistencies and inaccuracies as a result. Despite this, the government has grand designs for a data-enabled NHS; its “Data Saves Lives” policy paper commits to long-term investments in IT modernisation and secure research environments. Such news is welcome if the UK is to achieve its ambitions for more personalised forms of care.
However, technological reforms to the NHS entail the redundancy of preceding systems and new pressures for those on the front line of delivery. Care must be taken to ensure that the efficiencies innovation can offer do not come at the expense of the doctor-patient relationship.
This paper outlines the four main data barriers that undermine the UK’s prospects for personalised medicine:
Disconnection: a fragmented data landscape
Quality control: unrepresentative, poorly codified and unaudited-data sets
Reidentification risks: the disclosure of sensitive information
Bureaucratic burden: additional duties for overstretched staff
Intentionally tempering tech evangelism with human values, this paper recommends the following to address these data barriers.
To address issues of data disconnection:
Legislation must simplify data-access requests and bring consistency across documents where possible.
Policymakers must find ways of ringfencing deidentified health data away from general data-sharing restrictions such as General Data Protection Regulation (GDPR).
Government actors must focus their efforts on creating a work environment that is conducive to public-private-academic coalitions that focus on health, where innovation is centrally subsidised, legal liability is shared and public scrutiny is assured.
Regulatory bodies must finalise and publicise a well-defined accreditation process for Trusted Research Environment (TRE) status.
To address issues of quality control:
Researchers must declare their methods of data management to ensure errors have not been made. There must be a formal body or reporting mechanism responsible for oversight.
The experience of the private sector in big-data analytics needs to be leveraged for advanced clinical informatics.
The synthetic-data agenda must be taken with a pinch of salt: if the primary role of emergent technologies is to automate our second medical opinions, they cannot be based upon flawed reference-data sets.
Actors such as the Health Research Authority, NHS Digital and the UK Health Security Agency must catalogue and promote synthetic-data sets for those requiring instant access to training-set data.
Researchers are not to assume technology-enabled insights are superior to those collected from the field.
To address reidentification risks:
Government actors must sanction public consultations to confirm an acceptable risk threshold for data reidentification.
Researchers need to conduct regular ethical audits and motivated intruder tests to ensure prior approvals and security measures have not been breached.
To reduce bureaucratic burden:
Health-care decision-makers must include clinician burden and other human-centred metrics when evaluating new tech-enabled solutions for the NHS.
The private sector must aim to design or leverage products and technologies that reduce the bureaucratic burden of clinicians, prioritise and present only the most salient information for their consideration, automate a second medical opinion when a potential oversight has been detected and prevent runaway expenditure.
The centralised nature of the NHS, which boasts digital assets such as unique patient IDs, unified health records and ample opportunities for onward data linkage, puts it at an advantage for achieving more personalised forms of health care. Patients could be able to see everything from their personal medical history to their genotypic medical ancestry incorporated into clinical decision-making. Such bespoke offerings improve health outcomes while tackling inequalities, ensuring medical misogyny, racism and ableism do not persist.
However, there must be considerable reform to the UK’s health-data landscape for the NHS to unlock its potential in this regard. This paper presents four data challenges that are obstructing both the personalised-medicine agenda, as well as the actions that are capable of addressing them. These actions do not amount to tinkering around the edges or a “tech will fix it” complacency; they demand sincere cultural overhaul and sustained investment to back both the people who help us and the resources that help them.
Implementing these changes will not be easy and may in fact prove disruptive at first. However, we must trust in the positive externalities that only become visible with time – knowing that savings generated from avoided events and continuity of care are harder to quantify than those resulting from outsourcing care to private vendors or shaving minutes from already threadbare consultation times.
Health data are inherently multimedia, ranging from medical imaging to free-text information. This complexity is exacerbated by missing data, errors at the point of data entry and the diversity of clinical codification systems used to allocate numerical codes to medical events and prescriptions. Significant time is therefore invested into cleaning and harmonising disparate data sets into a common working language. Even so, some important data types for the personalised-medicine agenda, including data generated by remote monitoring devices, remain almost impossible to integrate into medical records due to technical incompatibilities.
Data access or sharing is also constrained by extensive regulations and security-clearance requirements. While useful for obstructing bad-faith actors, these restrictions curtail high-priority research. The time taken for researchers to navigate access requests can exceed granting periods, for example. These application procedures are also unclear, inconsistent, and dependent on multiple gatekeepers, each with their own set of expectations. The changeability of regulatory requirements also makes research and development precarious: work funded prior to even minor amendments are subject to auditing or recontracting. This instability disincentivises the public-private-academic partnerships that are best placed for high-impact and high-trust research. Until this legal liability is hedged in some way, or long-term grants are capable of responding to regulatory changes, the longitudinal, interdisciplinary and large-scale research that personalised medicine relies upon will be disrupted or deprioritised in favour of safer bets.
All this makes for a health-data landscape that is not so much siloed as it is bordered: much like international travel in a pandemic, for data to be accessed, transferred or linked, it must first overcome changing data-protection policies and foreign clinical languages upon arrival. The UK Health Data Alliance’s “Tube Map” – a visualisation of the convoluted pathways to data approval – makes this parallel especially apt. Such incoherence undermines the interoperability of health-data research and our chances of personalising health-care delivery. Meanwhile, overly narrow access permissions obstruct discovery. In a more collaborative data ecosystem, by comparison, unforeseen findings, like the predictive value of diabetic retinopathy for cardiovascular disease, might become the everyday, powering us towards personalised medicine in the process.
The cost of doing nothing to address this disconnect is too great: overbearing data restrictions can be frustrating to the point of insulting to our sickest patients, especially those volunteering their data to see their lives extended or even saved. For example, research by the Brain Tumour Charity found that nearly all patients (98 per cent of those surveyed) would be happy to give their medical health data to improve brain-tumour care, with nearly 99 per cent of this subset still willing to do so despite knowing the reidentification risk would be high. While privacy remains the bedrock of health-data stewardship – and it is clear that public attitudes towards sharing data (even among clinicians) still remain mixed – this example demonstrates that working to honour patient wishes for their data must be at the forefront of medical ethics. When patients give their time, trust and even tissue, it must be acted upon with urgency and integrity – not garrotted by bureaucracy without good cause.
Consistency must be improved between different types of data-access requests and sharing protocols.
The TRE or Secure Data Environment (SDE) agenda (providing approved researchers with remote access to a single, secure location for specific health-data sets) must maintain its momentum to simplify the NHS data landscape.
TRE or SDEs must be audited by an official body to ensure their activities reduce data monopolies and gatekeeping.
Accredited TREs must collaborate and harmonise via common working practices (including open code and common data and governance models) to facilitate data access and processing within or between them.
The data freedoms instituted under the Control of Patient Information (COPI) notices during the Covid-19 pandemic need to be maintained, with a view to ringfence deidentified health data away from generalist constraints in the future (for example, GDPR).
The Office of National Statistics’ Five Safes Framework can tailor forthcoming regulatory changes based on data sensitivity and the opportunities and risks inherent to its usage.
If actioned, these recommendations will help secure the coherent and collaborative research environment that is integral to personalised medicine.
“Bad data in, bad data out” is the defining mantra of data science. Despite the growing sophistication of our computational abilities, it remains difficult to rehabilitate input data that is not reflective of the problem we are trying to solve, the outcome we are trying to predict or the population we are trying to learn about.
The high stakes of health care make low-quality data potentially life threatening. Stubborn inaccuracies and instances of unrepresentativeness, incompleteness and inconsistency have affected the value of UK health-data sets for clinical research. For example, there is currently no formal oversight for how research groups collect, clean, curate (defining and constructing health variables or events), analyse or interpret their data. This has led to major embarrassments when such data are the sole contributor to sensitive decision-making; the prioritisation of healthy individuals for vaccination due to wildly inaccurate BMI data entry and the loss of tens of thousands of patient results due to Microsoft Excel spreadsheet limitations are just two Covid-specific examples of this.
Furthermore, at present, health data fails to capture the profiles, experiences and health risks of certain demographics. This is a result of poor codification (capturing the UK’s myriad ethnicities in a manner acceptable to all groups is an ongoing challenge) and differential levels of engagement with health-care services. Comorbidity scores for complex patients also do little to untangle precise health needs, and our heavy reliance on aggregate terms such as “BAME” or “immunocompromised” has the unintended consequence of erasing subgroup phenomena. All this sees the inverse care law replicated in the UK’s health-data landscape whereby health provision, or in this instance data representativeness and completeness, occurs in inverse proportion to need.
This leads to “evidence-based” insights that cannot always be trusted or generalised – to the extreme detriment of the personalised-medicine agenda. Even reference-data sets can be problematic benchmarks. Census data, considered the gold standard, are themselves vulnerable to misrepresentation, especially in exceptional circumstances (as seen during the pandemic) or when providing data on low-engagement groups.
Although sometimes portrayed as a cure-all for data-quality concerns, synthetic data (which are artificially generated to imitate real-world reference sets) have their own limitations. Far from rectifying flaws in reference-data sets, synthetic data have been seen to amplify inaccuracies or smooth over the abnormalities and biases that would otherwise be interrogated as subtrends. Synthetic data finds itself on especially shaky ground when used to model outcomes in those with complex or rare conditions; here there are insufficient sampling numbers or training data to model distributions or predict outcomes confidently. Moreover, the fidelity of these synthetic-data sets is not subject to review and there is no accreditation system to differentiate a robust imitation from a weak one. All this makes it unlikely that predictions for synthetic data-dominated research will be realised in health care any time soon. Until synthetic data are sophisticated enough to act as a remedy or replacement for poor-quality health data, the focus must be on promoting synthetic-data sets as alternatives for researchers waiting to access the real-world data they are based on. A disappointing consolation prize for some, these become a pragmatic choice in fast-evolving health emergencies (where population-level data are unavailable) or for those unlikely to obtain the security clearances necessary for handling real-world data.
Finally, it is important to emphasise that our appetites for automation must not come at the expense of routine public-engagement efforts. Sincere and sustained local outreach, consultation and public-health campaigns can achieve as much for health-data improvement as emerging technologies – especially when collaborating with trusted community or celebrity figures to address trust deficits.
Either an existing official body takes responsibility for the oversight of data cleaning and variable curation or a new one is established; in conjunction with the Open Science Agenda, researchers must publish their work in these areas and subject themselves to audit where necessary.
Researchers must utilise best practice from private-sector big-data analytics to interrogate subtrends in artificially aggregated groups such as “BAME” or “the immunosuppressed”, particularly those that affect the underrepresented or clinically vulnerable.
Public-health campaigns and awareness raising must still be prioritised, especially when it comes to promoting the value of sharing health data for the NHS.
Once actioned, these recommendations will see that only the right data informs treatment, policy and even personal lifestyle choices.
The terms anonymised, pseudonymised and encrypted are often used interchangeably despite the fact that they relate to very different masking procedures. For clarity: data are only said to be anonymised if reidentification is made impossible. Pseudonymisation, by comparison, still runs some risk of reidentification, as personal information can be revealed by accessing the original encryption scheme or by cross referencing to additional data sets with overlapping features. The latter process, known as triangulation, can see reidentification occur completely by accident: the reidentification of Netflix subscribers’ personal information using IMDb data is one well-known example of this.
Although it might be tempting to dismiss the risk of reidentified health data, it’s important to contextualise just how devastating its consequences can be when data fall into the wrong hands. While we can all appreciate the ammunition that the medical record of a political leader or CEO might amount to, we must not underestimate the prospective harms of data terrorism for the general public. The widespread leak of Australian health records as part of a recent ransomware attack is the perfect, and tragic, example of this. Patients had their most sensitive health information cemented into public record, including incidences of miscarriage, abortion and other procedures or diagnoses they might have otherwise wished to conceal from family, friends and employers.
However, by scrubbing data of its distinguishing features we also remove its prospects for personalising care. Thus, although the risks of reidentification must be mitigated to maintain the good faith of the public, efforts to do so must not disrupt the data linkages, sophisticated analytics and resultant innovations from which the public also benefits.
Opportunities must be created for dialogue between the research community, private sector and public to determine an acceptable risk threshold for reidentification.
Researchers must conduct regular motivated intruder tests to establish whether a pseudonymised health-data set is secure against reidentification and new methods of attack. This test interrogates encryption and firewalls at the level of a data-competent individual with sufficient motivation to reidentify information for malicious purposes. This test could be part of a wider package of ethical audit to ensure that initial approvals have not been out scoped.
Inspired by Google’s use of Model Cards to publicly evaluate their machine-learning algorithms, a central repository for assigning “reidentification risk scores” to major health-data sets needs to be developed. Researchers could then their intended configuration of health-data sets into a portal to return an estimate of how likely it to triangulate to reveal sensitive information.
Collectively, these recommendations create the foundation of security measures and a ceiling for risk acceptability that the UK health-data landscape needs before it can pursue personalised medicine in earnest.
Ensuring care remains responsive and personable at the patient level is as important to the personalised-medicine agenda as access to the diverse and joined-up data sets that improve patient–treatment matching. Actions taken under this remit must therefore aim to be relationship enhancing and not relationship eroding.
However, in 2020, the BBC reported that a revolving door of digital innovation and corresponding digital redundancy in the NHS had left doctors logging into as many as 15 different systems in a day to perform their work. These digital reforms can quickly amount to a “technological paradox” in settings as sensitive as the NHS – a term that describes the seen and unseen ways in which the costs of implementing new technologies (training staff, debugging inevitable crashes and ensuring interoperability) outprice the savings accrued.
One such negative externality is the way in which new digital tools can create distance and disconnect between health-care professionals and their patients. For example, the upkeep of electronic health records is currently dependent on manual data entry by those on the front lines of care. Such duties take doctors away from their patients or compromise their ability to be present in consultations. This corrodes the personability of care.
This is a phenomenon we can all relate to. We all appreciate how damaging technologies can be to both our concentration and sense of closeness with one another. The difference, however, is the fatal consequences such distractibility and disconnection can have in medical settings. With heads buried in screens for as much as a third of in-person consultation time and the full duration of telemedical appointments, doctors are less likely to pick up on subtle safeguarding cues, such as long sleeves in an unseasonably hot day or the tremor that can act as the smoking gun to everything from Parkinson’s to withdrawal.
This new bureaucratic burden is as dissatisfying for patients as it is for physicians: prioritising digital-centred care over human-centred care can leave patients feeling unheard or dismissed outright as conversational flow is interrupted to prioritise data capture. Mechanical forms of medical consultation undermine the doctor-patient relationship that has attracted talent to the profession and justifies physicians’ gruelling work environment. They also compromise the trust patients need to feel safe to share personal information or subject themselves to physical examination. Such things cannot be automated.
Digital reform in the NHS must not increase bureaucratic duties for health-care workers: new offerings must either reduce clinicians’ workload directly or allocate sufficient funding for the clinical-support staff and in-house data talent that will take responsibility for their ongoing success.
Depending on patient and clinician acceptability and consent, new ways of integrating large language models and natural-language processing methods into the clinical workflow must be found. Here, clinical communications and referral letters could be outsourced to products such as ChatGPT – generating efficiency savings and reducing overall pressures on care. Meanwhile, consultation recordings that are parsed into constituent codes or even personalised bedside manner statistics (turn-taking, interrupting and so on) could release clinicians from burdensome data-entry responsibilities and improve patient satisfaction by extension.
The NHS is an institution built upon relationships, not transactions. These recommendations aim to honour this and, if actioned, would allow the NHS to leverage the opportunities technology offers without compromising the human-centred care that is at the heart of patient satisfaction.
Interrelated issues of data disconnection, quality concerns, reidentification risks and bureaucratic burden are obstructing the UK’s potential for personalised medicine. Nevertheless, the data assets, capabilities and infrastructure that are still mostly unique to the UK health-care system give it an exceptional advantage in this domain – especially compared to fully privatised comparators. If the recommendations made in this paper are acted upon with urgency, there is every chance we might realise a personally prescriptive NHS in our lifetimes.
However, data is not a cure-all, especially not from the patient perspective, and medicine can never be a purely algorithmic affair. Patients display stubborn bias for in-person care. Their self-perceived exceptionalism, namely that their unique needs could not possibly be met by an artificial intelligence, makes them averse to these offerings even when presented with incontrovertible evidence that they produce better outcomes. Brick-and-mortar general practice will always be preferred to remote consultation, and the value of the holistic – via offerings including social prescribing and continuity of care – should not be dismissed.
In sum, the pursuit of personalising medicine cannot compromise what makes health care feel personal for patients. Data reforms must not be motivated to “hack” our human-centred preferences but should instead work with them. This will ensure the relationships that distinguish health care from health service are enhanced by technology, not replaced by it. In this light, new products and services become helpful tools for the doctor’s medicine bag and can offer the second medical opinion that assures safe practice for all – without becoming the only voice in the consultation room.
Lead Image: Getty Images