Abstract
In this article, we examine some of the expectations, frictions and uncertainties involved with the assetization of de-identified NHS patient data by (primary care) research services in UK. Pledges to Electronic Health Record (EHR) data-driven research attempt to reconfigure public health data as an asset for realizing multiple values across healthcare, research and finance. We introduce the concept of ‘asymmetrical divergence’ in public health data assetization to study the various practices of configuring and using this data, both as a continuously generated resource to be extracted and as an asset to be circulated in the knowledge economy. As data assetization and exploitations grow bigger and more diverse, the capitalization of these datasets may constitute EHR data-driven research in healthcare as an attractive technoscientific activity, but one limited to those actors with specific sociotechnical resources in place to fully exploit them at the required scale.
I believe, however, that as we seek to unlock a new age of enterprise, we might need to go further in exploring ways of unlocking new growth without increasing public spending. As with a business in a cash crisis, we need to shore up the profit and loss account by reducing waste, as the Government have so quickly done. Equally, as with real business growth, we need to look creatively at our balance sheet and think about our assets and our competitive advantage. Everyone in government, in every Department at every level, should be asking themselves, ‘What can we sell to the rest of the world, in order to repair our damaged public finances?’ … The third is the National Health Service. I know from my own experience that we are sitting on billions of pounds worth of patient data. Let us think about how we can unlock the value of those data around the world. (Freeman, 2010)
It was November 2010, in the aftermath of the last global financial crisis, and the UK House of Commons was debating the country’s growth policy. George William Freeman MP (Con), who four years later was appointed Minister for Life Sciences at the Department of Health and the Department for Business, Innovation and Skills, unwrapped his own vision for the country’s economic growth. Problematizing the state as a ‘business in a cash crisis’ that needs to enter ‘a new age of enterprise’, the MP asked public services to start looking for public assets they could sell to potential investors so as to restore public finances.
While national policy expectations and narratives of wealth creation out of NHS patient data are not new, they have intensified after the 2008–09 financial crisis as corporate biomedical innovation has been outsourced to the academic sector (Robinson, 2018) and personal data has emerged in the economic literature as the world economy’s new asset class (BIGT, 2003; Department of Health, 2011; HM Government, 2018; World Economic Forum, 2011). At the same time, national health systems are ‘morally obliged’ to become data-driven so as to stop ‘flying blind’ and start ‘saving lives’ and taxpayers’ money (Du Preez, 2015). They are now expected to become not only more (cost-)effective in personalized healthcare but also expand their wealth-creating role as investors in biomedical Research and Development (Department of Health, 2011; Kelsey and Cavendish, 2014).
The UK has been consistently advertised as the home to one of the best and biggest healthcare datasets in the world (HM Government, 2018). Its universal, largely computerized and centrally managed public healthcare system (the NHS) has created volumes of linked datasets from longitudinal Electronic Health Records (EHRs) across all levels of care. In fact, the demand for and availability of NHS patient data for research into anything from pharmacovigilance, drug prescribing and safety, standards of care and trial recruitment, continues to grow year after year (NHS Digital, 2018). Commercial models are currently being debated in the pursuit of lawful and publicly acceptable contractual arrangements between the NHS and private sectors for creating and realizing the values of this public asset (Harwich and Lasko-Skinner, 2018; HM Treasury, 2018). Recently, the economic value of curated NHS patient data was estimated at £9bn per year for both the NHS and the patients (Wayman and Hunerlach, 2019). These valuations neglect the investments in technoscientific and market infrastructures required to reap such financial and epistemic benefits.
Our aim here is to explore how narratives of innovation and normative pledges to the interrelatedness of health and wealth attempt to mobilize and configure public health data as an asset for realizing multiple values across scientific research, healthcare and economics (Hogarth, 2017; Welsh and Wynne, 2013). For this, we take as our case study the work of research services in England that aggregate and release NHS patient data to actors in healthcare and beyond, such as the Clinical Practice Research Datalink (CPRD), QResearch, The Health Improvement Network (THIN), CALIBER and ResearchOne. Understanding ‘valuation as a social practice’ (Helgesson and Lee, 2017: 533), we focus on the intertwined scientific, economic, social and material expectations and logics such research services translate and enact. These expectations enable them to realize and extract the values of this public asset for their organizational sustainability and for other actors across society (Birch, 2017a; Brown and Michael, 2003; Star, 1985). Responding to Birch’s (2019) call for more empirical investigations of how ‘things are turned into assets’ (Muniesa et al., 2017) and controlled for the extraction of rents, such as licenses or fees, we show that the assetization of NHS patient data by research services is a complex and laborious process. It involves the configuration of competing and complementary frictions, as well as uncertainties around coding practices, regulation, acceptability, supplies of datasets, technoscientific capital and user demand, among other things.
NHS patient data assetization by these research services serves four main purposes: maintaining organizational sustainability, developing their capability to continue assetizing data, developing the epistemic and human capital of this field (e.g. development of new scientific methods and training of new data scientists) and, consequently, strengthening their role in the valorization, performativity and financialization of EHR data-driven research. For this, we introduce the concept of asymmetrical divergence to explore their expectations and valuations as well as their normative, scientific and economic discourses and practices for assetizing public health data (Birch et al., 2020; Muniesa et al., 2017). In this way, we elucidate the asymmetries around the sociotechnical and financial infrastructures that are configured for this assetization, including the control of data flows for research, the public’s participation in decision-making and the various knowledge assets (e.g. phenotypes, biomarkers, quality improvement reports, recruitment pools of research participants, clinical risk prediction algorithms and scientific publications) that are made (im)possible for healthcare, academic and biomedical networks of actors.
This article draws mostly from 27 interviews conducted in 2016 with seven GPs who were involved in clinical commissioning, information governance, medical ethics teaching or academic EHR data-driven research, nine citizens who reported having opted out from and/or had campaigned against programmes of NHS patient data capitalization (i.e. care.data) and eleven health data researchers. The last group of participants comprised statisticians, epidemiologists, data architects and research facilitators who have worked with data from and/or for one of the more established research services in England that have been collecting and curating de-identified NHS patient data from a network of contributing GP practices across the country to support EHR data-driven research (Vezyridis and Timmons, 2016). These research services included:
QResearch, a partnership between the IT supplier EMIS Health and (since 2019) the University of Oxford (previously the University of Nottingham), which holds data from 1500 (out of approximately 9800) GP practices in the UK,
the CPRD, the oldest research service, with a history dating back to 1987, which extracts and curates data from GP practices using IMS Health’s computer system and (since 2018) also from EMIS Health,
THIN by Cegedim SA that extracts data from over 550 GP practices that use the Vision primary care software (approximately 6% of the UK population),
CALIBER at the University College London, which has a license with CPRD since 2012, and
ResearchOne by TPP and the University of Leeds, which (since 2013) houses data from both primary and secondary care providers using the company’s SystemOne software.
All interviews focused mainly on the opportunities and the technical, social and ethical challenges of realizing the benefits of NHS patient data-driven research, particularly in English primary care. Health data researchers and GPs were also asked more specific questions around the challenges of developing and maintaining such research services, including issues of sociotechnical infrastructure and information governance, as well as of conducting observational studies with NHS patient data. We supplemented these interviews with documents, reports and online material from these research services’ website. The first author also was a non-participant observer of team meetings at some of these research services, national health data analytics workshops and public consultations on the ethics of NHS patient data exploitations, and completed university training courses for data researchers related to the opportunities and challenges of conducting observational studies with EHR primary care data in the UK.
In the next section, we approach data assetization from the sociology of expectations and science and techology studies (STS) (Birch, 2019; Birch et al., 2020; Borup et al., 2006). This forms our analytical approach for studying the performative promises, normativities and sociotechnical practices involved in these situated processes of NHS patient data capitalization (Muniesa et al., 2017). Following this, we examine expectations and uncertainties as well as frictions and risks in the process of transforming NHS patient datasets into research assets. We focus not only on the scientific but also on the economic, political, social and ethical valuations these research services have to navigate. We conclude by speculating on the future of EHR data-driven research and argue that asymmetrical divergence may indeed foster innovation in EHR data-driven research and development (Kleinman and Vallas, 2001). However, it is also creating unequal configurations of access to knowledge production and public scrutiny as these research services compete for data and funding. More centrally controlled flows of data and assets for the extraction of rents may, in the end, benefit only those networked actors that have the socio-material resources (including financial capital) and knowledge expertise to capitalize on these public datasets for specified purposes (Birch, 2019; Muniesa et al., 2017).
Theorizing research expectations, service valuations, and data assetizations
From an STS perspective, expectations are both products and producers of innovation, mobilizing and institutionalizing people and resources (Borup et al., 2006; Brown, 2003; Brown and Michael, 2003; Tutton, 2011). Scholars in this field have argued that expectations produce, in their own material-discursive way, the necessary ‘dynamism and momentum’ (Brown and Michael, 2003) as well as the ‘incentives and obligations’ (Brown, 2003) for human and non-human actors to come together and ‘wishfully enact’ (Tutton, 2011) particular versions of the future.
Research that examined the role of biomedical initiatives in the creation of new forms of scientific, social, political and economic expectations has demonstrated, for example, how human tissue, medical data and genomics are assetized for scientific collaborations and circulations in knowledge economies (Cooper and Waldby, 2014; Dagiral and Peerbaye, 2016; Tarkkala et al., 2018). Geiger and Gross (2019) have shown how consumer genomics firms mobilize a ‘platform business model’ in order to assetize genomics information via specific processes of ‘accumulation’, ‘augmentation’ and ‘obscuration’ of related uncertainties. These platforms simultaneously maintain constant flows of data and values between sellers and customers across different markets. Barrett et al. (2016) have shown how specific socio-material configurations enacted by digital platforms create multiple epistemic, ethical and financial expectations from online health communities. Likewise, we have shown (Timmons and Vezyridis, 2017; Vezyridis and Timmons, 2017) how such expectations (re)configure a range of socio-material practices and relationships between patients, hospitals, universities and biomedical industries in the process of valuating and assetizing medical waste and patient data.
A main purpose of this article is to understand how these research services enact NHS patient data assetizations in order to frame, sustain and expand the ‘liminal space’ between hopes and promises and concrete products and assets (Brown, 2003; Hogarth, 2017). For this, we conceptualize assetization as a sociotechnical process whereby data and knowledge are transformed by organizations into closely governed resources for the extraction of use and exchange values (Birch et al., 2020). We treat any scientific, ethical and economic values around NHS patient data assetization not as something ‘stable and predefined, but rather as something grappled with, articulated, and made in concrete practices’ (Dussauge et al., 2015: 2). Values, as Birch (2017a: 462, 466) asserts, are ‘immanent or latent in material things (e.g. commodity) and/or discursive claims (e.g. hope)’ and they require ‘active, ongoing, and performative management’. Both conceptually and empirically, as Brown (2003: 5) argues, expectations and values eventually become both ‘inseparable’ and ‘tradable’, forming ‘the basis of exchange relationships within “communities of promise”’.
Expectations of EHR data-driven research for speculative valuations of health and wealth have, therefore, their own important role in how GPs, NHS patients and the data they co-produce are reimagined and valued for EHR data-driven research within a high-stakes biomedical knowledge economy (Birch et al., 2020; Dussauge et al., 2015; Wienroth et al., 2019). They are fundamental in mobilizing various state, commercial and academic actors as well as capital for the assetization of NHS patient data and the enactment of a new health data access market (Birch et al., 2020; Brown, 2003; Vezyridis and Timmons, 2017).
During our fieldwork, we noted how health data scientists valued the availability of EHR data to answer different types of research questions. Their anticipatory discourses highlighted the unique opportunities now available to researchers to identify causes of diseases, complication and management rates and prescribing practices from the mining of data. They were often excited about the potential for life sciences in the UK to lead the development of new, EHR data-driven, practices for the prevention and treatment of diseases or for conducting more ‘pragmatic’ randomized controlled trials to assess the effectiveness of medications (Powell et al., 2017). And research services appear to have assumed the technoscientific and economic role of materializing and capitalizing on the multiple scientific, normative and economic promises and values of EHR data-driven biomedicine (Birch, 2019; Brown and Michael, 2003; Martin, 2015). They do so by enacting the negotiated and contested infrastructural work, that is, the situated narratives, heterogeneous relationships and socio-material practices, necessary to bring this imagined future forward (Brown, 2003; Dagiral and Peerbaye, 2016).
Thus, for the present study we understand the capitalization of these public datasets as a sociotechnical performance that translates imaginaries and turns resources into assets in order to realize the expectations of specific actors for value and capital, that is, future earnings whether that is ‘money, or something comparable’ (Birch, 2019; Birch et al., 2020; Dussauge et al., 2015; Muniesa et al., 2017: 12). Following Muniesa et al. (2017), we treat assetization as a practice that extends organizational boundaries to include social, technological, regulatory and economic infrastructuring of institutions, practices and people for the production of those resources and assets deemed appropriate for capitalization. Research services studied here mediate and act ‘across boundaries between different scales, levels, times and communities’ (Borup et al., 2006: 293; Dagiral and Peerbaye, 2016), that is, between patients, GP practices, IT suppliers and researchers, for new, contemporary and future, ‘secondary data uses’.
As we show below, GPs (or other healthcare professionals) and the data they co-produce from consultations and treatments are disentangled from the local socio-material networks and transformed, via EHRs, from objects of clinical labour to objects and means of scientific inquiry and economics (Denis and Goëta, 2017). It is through specific relational socio-material infrastructures and market arrangements that they are turned into valued tangible (e.g. aggregate databases, risk calculators) and intangible (e.g. epidemiological and computational expertise) knowledge assets for financialized EHR data-driven research and development (Birch et al., 2020; Dagiral and Peerbaye, 2016; Muniesa et al., 2017; Wienroth et al., 2019).
Research services navigate through various political, economic and epistemological promises and valuations of NHS patient data-driven research. As they pursue their organizational sustainability and/or profitability for the capitalization of NHS patient data, they are being hybridized (Kleinman and Vallas, 2001) through various creative configurations of norms and practices (Lilley and Papadopoulos, 2014). They ‘asymmetrically converge’ their market, academic and public healthcare expectations and logics in the process of transforming NHS patient data into ‘promissory assets’ for monetary circulations in knowledge economies (Cooper and Waldby, 2014; Kleinman and Vallas, 2001; Martin, 2015). As they practice different modi operandi to financialize their operations, which increasingly resemble those of other digital platforms (Birch et al., 2020; Geiger and Gross, 2019), different economic and non-economic outcomes are made possible and impossible (Dussauge et al., 2015; Helgesson and Lee, 2017).
Central to our study, therefore, is how, during the process of assetization, such research services embed specific value assumptions, structure practices and (re-)configure relations of co-production with GPs and patients. How they order and re-order data flows, legitimize data uses and validate data users (Dagiral and Peerbaye, 2016; Martin, 2015) for multiple purposes: creating their asset-based incomes via the licensing of curated datasets (Birch, 2019), supporting the further development of healthcare knowledge products and services (e.g. scientific publications and quality improvement reports) (Dagiral and Peerbaye, 2016), and pursuing their own research and other innovation projects (e.g. analytic scripts and phenotypes for GP practices and the research community). To explore and describe the ‘diverging registers of value’ (Dussauge et al., 2015) and the (in)commensurate valuations these research services are making, the asymmetries in knowledge practices (Tsoukas, 1997) and the configurations of sociotechnical networks involved in the assetization of NHS patient data, we introduce the situated concept of asymmetrical divergence.
We use the above concept to consider, first, asymmetries between the operational (including financial) and the scientific and public health logics and valuations that affect assetization (Dussauge et al., 2015; Kleinman and Vallas, 2001). Second, we use it to consider whether the co-producers and providers of this public data, i.e. patients and healthcare professionals (GPs), have enough information and power to participate in the shaping and direction of this assetization (Brown, 2003). Third, we consider whether data assetization can diverge because some of the processes involved can take place at different scales between these research services based on resources that are available to them, for example, funding, access to data, human and technical resources, as well as networks of data providers and customers. Following that, we reflect on the kind of new or existing asymmetries and inequalities of access to resources, information and knowledge research services may produce and reproduce as they assetize NHS patient data (Brown, 2003).
Expectations of networked materialities and alternative epistemologies for divergent assetizations
Researchers and some GPs shared expectations that collecting and analyzing data from healthcare at a large scale potentially creates huge opportunities for the improvement of human health. The data tracking of individual patient events (e.g. treatments), as they navigate the healthcare system, was portrayed as a new and unique approach to reconstruct medical histories and follow patient journeys ‘from cradle to the grave’. They asserted their determination to study ‘pretty much every disease that there is out there’ and make sense of the complexity of human health and illness based on the unprecedented availability of (decontextualised) information stored in EHRs (Tsoukas, 1997). By mobilizing economies of scale and converging diverse datasets—from the many isolated and small data repositories across (primary and secondary) care into fewer centralized ones—they anticipated the reduction of the time, cost and the uncertainties involved with research and the provision of healthcare. By narrowing temporal, spatial and epistemological asymmetries in the collection and analysis of data, the speed of knowledge production was expected to increase, as multiple reuses of these datasets from greater pools of participants and for a variety of purposes were being made possible: [E]verything in a unified database, everything in one place, so it’s accessible, it is structured, is very convenient … because you don’t know what might be important later on down the line and to collect everything from time zero is a more efficient way of collecting data than collecting your data, then finding out that you need like seven new different measurements, and to go back and collect that would be very difficult, very costly so collecting everything with the hope that one day it might be useful to someone is a very bad idea (laughs) but it’s efficient. (Researcher 2)
While for more ‘traditional’ epidemiological studies of rare or long-term conditions and treatments, researchers have to collect data from hundreds or thousands of individual participants, with routinely collected EHR data the number of research participants can be millions. It is the size, breadth and representativeness of the populations covered that constituted, in the eyes of these stakeholders, these datasets as unique resources for conducting observational studies in a much faster, cheaper and more ‘pragmatic’ way than ever before (Harwich and Lasko-Skinner, 2018; Powell et al., 2017). NHS patient data was, thus, valuated and valorized by ascertaining the effectiveness of EHR data-driven research in the modelling of healthcare by other means: a very promising alternative approach to research and innovation.
The assertion that ‘size is an advantage’ that science and society cannot just ignore constituted a strong motivation for the EHR data-driven research communities. However, the significant issues of data quality and the susceptibility of these datasets to many biases were acknowledged by data researchers (we will return to these issues in the next section). The growing concerns around the epistemological misconception that bigger is necessarily better, and critiques of the reproducibility of EHR data-driven research were accepted (Lipworth et al., 2017). However, the risks and uncertainties that this asymmetrical quality of EHRs introduce in EHR data-driven research (e.g. research bias) were considered epistemological challenges (rather than actual barriers to EHR data use) that do not have to be addressed in full before pursuing the establishment of a research service for the assetization of NHS patient data.
There is no uniform approach to be followed. For instance, some research services had proceeded more cautiously in this area and have been trying to recruit GP practices with ‘good’ coding practices, in order to then carefully curate the extracted data. Others have worried less about data quality and more about finding a way to gather everything in one place first and deal with issues of data quality later. Researchers in the field overwhelmingly anticipated the overcoming of such challenges soon enough and as datafication of healthcare and coding standardization across the NHS continue to grow (Brown, 2003).
[W]e’ve got this wealth of data that we can get hold of; we don’t know quite how to use it. We must use it because there must be stuff in there that is going to be really valuable basically, and from [the IT supplier’s] side, they knew that and they want it to be used. (Researcher 9)
The above motivation that there must be a way forward with this data because now we can, which Fourcade and Healy (2016: 16) aptly assert is ‘the ceremonial aspect of the data imperative’, was based not only on arguments of size, availability and effectiveness but also on an anticipatory uncertainty, for example, around (already) problematic numbers participating in research. Welsh and Wynne (2013: 543) have argued that ‘scientific authorities … declare the public meaning of technoscientific innovations and controversies to be matters of risk or science’. Here, we noted declarations around data assetization, and by extension around the support of the work of such research services, as a matter of concern in overcoming risks within the life sciences, that is, low trial recruitment rates (Powell et al., 2017). The ‘reworking [of] epistemological asymmetries’ (Brown, 2003: 18) was, therefore, framed as a necessity both for society and the rest of the technoscientific communities ‘because ten years from now we won’t have any other cohorts remaining’ and the future of population studies will be compromised. Consequently, for data researchers the development of new methodologies for EHR data-driven research are materializing their expectations of ‘getting all that we can out of this data’ and continuing to advance biomedical knowledge and improve healthcare against an uncertain epistemological future.
At the same time, there are also other stakeholders that come with their own expectations for the assetization of this public asset. For example, there is the role of, and expectations by, state actors to drive and facilitate, via investment and regulatory frameworks, the assetization and capitalization of NHS patient data for the benefit of the national economy (Timmons and Vezyridis, 2017). As one data researcher noted: Government Ministers have been quite open about it, wanting to make the UK and UK data sources world leaders in this kind of research, which usually means they want it to be a money spinner. (Researcher 5)
For suppliers of primary care computer systems, contributing to a research service can materialize unique, albeit diversified, expectations. While our participants talked about certain research services that just want to capitalize EHR data and ‘license just about everything it’s possible to license’, thus expanding their presence in the biomedical research data market, they also identified opportunities for the other stakeholders. For example, they discussed how the establishment of ‘knowledge transfer partnerships’ with academic institutions that can analyse these datasets has allowed them to advance their portfolio of services they provide to GP practices that purchase their system. GP practices have now been given a unique opportunity to contribute to epidemiological research and also to the development of new digital clinical decision support tools to improve healthcare. In exchange, they offer the research services new (and asymmetrical) advantages for the assetization of NHS patient data, when compared to smaller research services that rely on bespoke data extractions for research. Through their arrangements with contributing GP practices and other healthcare organizations, whole networks of local data providers and infrastructures for data extractions can now be maintained, making the continuous updating of research databases a relatively seamless and near-real time process. These databases can then be managed by either the IT supplier, the IT supplier in conjunction with the academic or governmental partner, or by the academic partner.
For instance, CPRD and THIN provide quality improvement reports to collaborating GP practices. CALIBER has developed a pool of more than 50 computable phenotypes for researchers to identify and analyse the EHR of patients with particular conditions. QResearch has developed a number of risk calculators (e.g. for fractures, cancer, stroke, diabetes) (Hippisley-Cox et al., 2017). These and other research assets can then be provided to the research and GP communities (and some of them to the public) either free of charge and/or as licensed products via private limited companies (e.g. ClinRisk, 2020). In the process of realizing the values of EHR data-driven research, research studies and publications that have used research services’ data are also getting assetized and valorized under specific targets of ‘knowledge transfer’, increasing their reputational and financial value in this technoscientific market (Barman, 2002; Vezyridis and Timmons, 2016). For instance, QResearch has supported more than 165 peer-reviewed scientific publications as of March 2019, THIN more than 600 and, CPRD more than 2200, with targets of numbers of supported studies set annually (see MHRA, 2016).
Thus, there is not a single but multiple problematizations (Callon, 1984) for the assetization of NHS patient data and the divergent assets that can be made possible to fulfil expectations and valuations of this asset. On the other hand, expectations, no matter how effective they might first appear in mobilizing stakeholders and capital (e.g. human, technical, financial), they do not necessarily secure stable socio-material entanglements across science and/or the economy. While they become indispensable parts of data valuations and assetizations within particular communities, the gap between hype and reality has to be maintained by such research services in order to keep capitalizations going (Brown, 2003).
Socio-material frictions and organizational uncertainties
In this section, we look into how research services and researchers attempt to internalize and render manageable the various uncertainties and asymmetries involved in the production, assetization, and use of patient data (Dagiral and Peerbaye, 2016; Denis and Goëta, 2017; Powell et al., 2017; Star, 1985). We explore how processes of assetization reveal and mitigate epistemological, economic, social and material ‘frictions’ (Edwards, 2010) in data production and analysis. They also shape how medical phenomena are expected to be coded so as to increase the quality and value of this asset for research (Petersen et al., 2019).
During our interviews, we were repeatedly exposed to data researchers’ frustration with and, in some cases, limited understanding of, the way GPs code medical phenomena in EHRs. They often expressed their surprise, for example, at the fact that the codes GPs use to record even the same disease could vary substantially from one GP practice to another. They emphasized the impact such inadequate coding practices have on the overall quality and integrity of the datasets, as well as the time and effort needed to mitigate discrepancies in preparation for analysis. As one researcher highlighted: [A] woman with prostate cancer, okay, fine, I have to remove this observation, it’s kind of most obvious, but … when you’re thinking whether this person has osteoporosis or not, whether they have prescriptions, maybe these prescriptions are not right, maybe they’re for cancer, not for osteoporosis …. When it’s definitely bad data, you remove the observation. For example, if it’s 50 people altogether when you use a sample of 50,000, that’s okay, so it doesn’t affect the conclusion. (Researcher 11)
Researchers and GPs attributed the problem with data inconsistency and redundancy mainly to the fact that the data is not collected primarily for research purposes. As they explained, data is collected to do the GP’s clinical and ‘business’ work (Petersen et al., 2019). Coding is a practice highly contingent on the resources and effort (e.g. staff, time, information technology and skills) that each GP practice allocates for this purpose (Verheij et al., 2018). This is why research services often aim at recruiting mostly ‘big’ GP practices with ‘good’ management that can contribute quality patient data . GP coding practices are also sensitive to the particularities and (unintentional) influence of the particular GP computer system in use (Verheij et al., 2018). One experienced researcher described this: I’m sure you’ve seen those patterns of prescribing trends and how they differ nationally when actually it’s just picked up that different clinical systems order their drugs differently, so obviously clinicians pick the drugs at the top of the screen and it’s just picked up a difference in the clinical systems, rather than GPs’ underlying behaviour, which was the same. (Researcher 10)
Moreover, GP computer systems allow the recording of structured (‘coded’) and unstructured (‘free text’) patient health and administrative data to diverse levels of granularity but without the use of a widely-adopted health coding scheme (i.e. ICD-10). Between regions within the UK, different classification systems or versions of the same system (e.g. Read codes—version 2 and 3) have been adopted by IT suppliers and, thus, GPs. To overcome this geographical clustering of systems and datasets (Kontopantelis et al., 2018), ‘simply a side effect of where IT suppliers have been able to sell their software’, a research service has to go to great lengths to map these coding schemes used by healthcare providers across the country into more ‘formalized and universal structures’ (Denis and Goëta, 2017).
Thus extracted datasets are not just clinical artefacts ready for research after technical translations of variables. And the availability of more EHR data does not necessarily make the NHS more ‘transparent’ to outsiders for research purposes. Our interviews underline the situatedness of coding practices and how entangled they are with other diverse local sociotechnical practices, normativities and valuations beyond the clinical aspects of healthcare provision (Denis and Goëta, 2017). To address and mitigate the uncertainty introduced by such ‘knowledge asymmetries’ (Tsoukas, 1997) between the observer (i.e. data researcher) and the observed (i.e. healthcare professional), research services also work on the introduction and mapping of more advanced classification systems (i.e. SNOMED CT). In the future, greater standardization of coding of even more data is expected to allow the assetization of more symmetrical and, therefore, more researchable data (Dagiral and Peerbaye, 2016).
Until then, data researchers continue to be uncertain, and often not transparent in their scientific publications, as to whether any patterns in disease prevalence and trends they observe in the data should be attributed to the actual disease, demographic shifts, the patient management software, changes of clinical coding guidelines, preferences for information disclosure and coding during patient consultations, or the particular pay-for-performance scheme, such as the Quality and Outcomes Framework (QOF), used to incentivize specific coding practices (Verheij et al., 2018). QOF has, in fact, been instrumental in improving data coding and collection across GP practices contracted by the NHS. Introduced in 2004, this point-based reward and incentive national programme for GPs, which currently constitutes up to 10% of their income, aims at standardizing and improving the care they provide by having them meet specific quality targets in for example, coronary heart disease, heart failure, stroke, diabetes mellitus, chronic obstructive pulmonary disease, cancer, dementia, depression and obesity). However, QOF has not actually achieved its primary purpose (Forbes et al. (2016)), as it conflated high performance with high quality of the provided care: A few years ago they brought in QOF around depression, so that if you coded somebody as having depression, you then had to set on a pathway and had to do certain things by certain times, and if you didn’t, you lost money. So there was a huge drop in anybody being diagnosed, coded as depression, even though that’s what they may have been treated for. So you can have knock-on effects. People also might ask not to be coded, because of the stigma, you don’t want something in your medical record that says something that you think might have an adverse effect on, I don’t know, insurance. So I think things aren’t ever black and white, are they? (Researcher 9)
It is because of such intricacies of the entangled socio-material and economic practices of data coding for divergent purposes and often conflicting local assetizations and capitalizations (e.g. QOF remuneration) that researchers and GPs emphasized the considerable scientific and technical expertise required to use NHS patient data for research. Consequently, this data assetization is not only about investments and maintenance of technologies of data storage, curatorial practices and analytical methods. It is also about investments in interdisciplinary teams of ‘knowledge workers’ (Kleinman and Vallas, 2001) who are not only experts in computer and data science but also in healthcare in order to make sense of the knowledge and practice that shape the coding of medical phenomena and, thus, the local production and use of datasets.
Among data researchers there was a wide acknowledgment of ‘the huge shortage of skills across the board’ to fully realize the values of these datasets. While university training was thought to be improving, issues around its quality and cost as well as ‘letting the younger generations know that [health data science] is a career path that’s available’ remain a challenge. They were worried that pitfalls may occur in the future if training and recruitment are not addressed, particularly as more data providers are joining these (or developing their own) research services and more researchers ‘with less and less skill’ from other data-driven (e.g. financial) industries are getting access to these datasets. Other experienced researchers underlined the asymmetries of capability some (smaller) research services have been facing as they compete for expertise and resources within this field. As EHR data-driven research is expanding and datasets collected ‘grow dramatically’ in size and variety (e.g. genomics), the acquisition and maintenance, for example, of ‘high performance computer systems’ to analyze these datasets becomes a significant challenge.
The skills required to design and model the systems are the same skills that are required to model systems in finance. And the salaries of science, they’re much lower, so it’s really difficult to find the right people to do the job. Plus the resources, a lot of the software, hardware … they’re under license, for instance … and when you want to use some good software packages, in order to be able to do your work, in the academic and public sector, they’re not readily available, and that puts a barrier up of the things that you can do. (Researcher 2)
The process of assetizing NHS patient data, therefore, reveals and involves several noteworthy asymmetries of information and practice. There are asymmetries regarding the maintenance of local socio-material infrastructures and processes, including GP practices, NHS patients and IT suppliers, capable enough to facilitate the production of datasets of acceptable levels of quality for research. The mitigation of such asymmetries and uncertainties often involves finding ways to overcome other processes of assetization and capitalization of data and coding (e.g. QOF, computer systems and their market distribution, consultation and coding preferences). Lastly, it requires the maintenance of appropriate and capable research infrastructure, such as resources and skills, to clean, prepare and guarantee quality datasets for use.
Politics, economics, and societal valuations for asymmetrical asset flows
As we started identifying research services’ own normative understandings of how and by whom these datasets should be capitalized, we became interested in exploring the way they are gradually maintaining an essential, mediating, role (Callon, 1984) in the assetization and circulation of data. Here, our focus is on the particular political, economic, social and professional environment within which they operate but also enact to normatively increase the responsibility of other stakeholders (e.g. GPs, patients, regulators) for the disentanglement of NHS patient data from its local sites of production. We also explore some of the scientific, organizational, societal and ethical asymmetries of this assetization, produced as research services are re-entangled with data producers and other infrastructures required for the continuous extraction of data and the management of expectations, and values (Birch et al., 2020; Brown, 2003).
As we showed previously, there has been considerable interest from all stakeholders to share and analyse NHS patient data for diverse purposes. However, notions of the perceived usefulness of EHR data-driven research were not shared uniformly across stakeholders, particularly when compared with more established types of scientific inquiry. For instance, data researchers were frustrated by the fact that few people outside their academic communities fully ‘understood’, and prioritized in their work the production and sharing of good quality data. This was often due to the fact that, as some GPs explained, the data GP practices provide to research services was not expected to come back to them in a form ‘useful’ to their practice, for example, clinical guidelines for treating patients. While GPs generally acknowledged some of the capabilities of EHR data-driven research for improved healthcare, such as for healthcare planning and identification of public health trends, some of them were also more cautious about placing this type of research among the so called ‘cutting edge medicine’: Certainly the capabilities of [EHR data-driven research] and the scope to look at information is great. I mean I think if you’re looking at genuine treatments for diseases, the bottom line comes down to the hard graft with looking at molecules and cancer and such. It comes down to laboratory technicians testing on cells and in petri dishes and microscopes … to try and work out and find drugs that are effective. (GP 3)
Moreover, the competitive environment of expectations and anticipations in which the capitalization of NHS patient data takes place does not appear to be maintained only between the various groups of stakeholders that may benefit. The process of assetizing intangible resources, within an increasingly marketized environment, reveals specific kinds of competition within particular groups of stakeholders. For instance, even within the data research communities there can be a combination of factors that makes the sharing of this data a difficult task across their networks. Issues such as personal agendas and career aspirations, fears of litigation, intellectual property rights and competition for research funds, (often attributed to the marketization of higher education and public healthcare) were thought to impede the establishment of a culture of openness in data sharing for the wider benefit of science and society (Longo and Drazen, 2016).
One particularly problematic situation that stood out during interviews was around the extent to which a research service could support an increasing number of applications for data access. The capabilities of research services to support other data researchers, especially non-affiliated ones, with data and expertise are not identical. Whether it is fees or research outputs, some form of mutually beneficial collaboration and/or exchange has to take place to compensate for the work involved in preparing datasets for analyses.
It’s true that people that have access to the databases do tend to set themselves up as gatekeepers and be a bit choosy about who they work with. Sometimes that’s in furtherance of their own careers, other times it’s because the data are complicated and they can’t support every person to do a project, so inevitably you support the ones that you’re interested in. (Researcher 5)
Contrary to some common expectations that any research team with a project approved by the research services’ own Independent Scientific Advisory Committee for its scientific benefits and information governance compliance could access data, some of these research services have their own additional restrictions on eligibility. Various asymmetries of collaboration between research services and the wider academic and biomedical research communities shape divergent forms of NHS patient data assetization. For example, CPRD can provide data to any research team from across the world as long as some of its members have experience working in UK primary care. QResearch, on the other hand, provides access to data mainly to teams from UK universities, with at least one member registered with the General Medical Council, while pharmaceutical companies are excluded unless the research project is about drug safety.
At the same time, the capitalization of NHS patient data under an industrialized environment means that data researchers and research services are expected to internalize and enact the financial logics of EHR data-driven research in the UK. Research services operate usually, but not exclusively, on a not-for-profit basis. While they do not usually charge directly for the datasets, they do charge for the support they provide to researchers (e.g. infrastructure, application and project set-up, training, data preparation and release). They may charge fixed or variable fees for individual projects or for their annual commercial and non-commercial licenses of data access and support. For instance, on its new pricing model CPRD charges £75,000 for its non-commercial (i.e. academic, government, charity) multi-study annual license and more than four times that for the commercial one. CALIBER charges a fixed fee of £25,000 per approved project. In cases where researchers want to link GP records with hospital records and mortality data, these research services have to go through the statutory trusted third party responsible for the handling and linking of NHS patient records, for an additional fee (NHS Digital, 2018).
Public information on how these cost-recovery charges are calculated is limited. This flexibly interpretive valorization of competitive access to quality data, however, allows a research service to generate income towards its organizational financial sustainability. It mitigates any asymmetries in resources and capital available to a research service compared with other research services, such as direct financial support from research charities and commercial or governmental partners, research grants, personnel and infrastructure hosting (e.g. by a university), rents for corporate, governmental or academic office spaces and, any other special financial arrangements with GP practices for the number of active patients they maintain in their database.
As demand for up-to-date datasets is constant, data assetization requires not only new circulations of data to more users but also continuous data production, facilitated by an appropriate regulatory framework for the lawful handling of sensitive NHS patient data. In England, this framework revolves around a rather outdated legislative piece of executive power that allows the Secretary of State for Health to overide the common law duty of confidentiality, and the need for patient consent, for specified medical purposes (e.g. medical research) (Section 251 of the NHS Act 2006 and Health Service Regulations 2002). While it has facilitated an unprecedented expansion of the so called ‘secondary use’ of NHS patient data, interview subjects still problematized it as an unavoidable but also necessary burden within organizational and scientific routines, a bureaucratic activity separate from the wider societal and historical context of unconsented medical research and part of the costs involved with the assetization of these datasets.
Some researchers understood information governance in this area to be a ‘hygienic’ practice in ethics (Vezyridis and Timmons, 2019) of critical importance for minimizing moral frictions and, thus, maintaining ‘social sustainability’ (Brown and Michael, 2003; Tupasela, 2017) with the public and the healthcare professionals that co-produce the data. Others, acknowledging their limited understanding of the complicated legal framework surrounding the use of these datasets, suggested the introduction of new disciplines within research teams (e.g. information governance experts) to advise researchers on the legal and ethical legitimacy of research designs. Others directed their frustration towards regulators who could ‘tighten up on things’ for faster and cheaper access to curated datasets and, therefore, to knowledge production, especially in an industry still operating under a fragmented framework of multiple public and private providers acting as ‘owners’ of these datasets: [W]henever we go to for a question to one of these [data providers], they say that they have to do the linkage for us … so if you go to NHS they say they have to do the linkage, if you go to the GP data, they say they have to do it, and other sources like clinical registries for cancer they are not easy to access and even if you [have] a very good high quality protocol then the problem is how should we link the data, because they will not hand it to you. (Researcher 4)
Intertwined financial and regulatory asymmetries of access become more evident as data researchers struggle to conduct their work within a competitive technoscientific environment that requires them to be at the forefront of scientific knowledge production (Brown and Michael, 2003). While they ‘understand’ that research services ‘sell their product’, the ‘whacking costs’ for buying bespoke extractions and linked datasets, particularly by those not affiliated with these research services, have created considerable and asymmetrical financial barriers to research (Gilbert et al., 2015). In the dynamic field of healthcare, where practices and objects of study are constantly in flux (e.g. prescriptions), data is not only ‘expensive to work with because of all the processes that are involved to generate it, keep it clean and keep it secure’ but it becomes ‘old’ quickly. Speed of knowledge production becomes of paramount importance for EHR data-driven research: It all costs money and if you have money, then it’s easier. … I’ve been doing lots of studies without funding, because it’s all for hot topics, and then by the time you get funding, it’s gone, the public is not interested. But then if you have funding you can buy linkages to other sources, like a cancer registry or other registers. (Researcher 11)
Lastly, and notwithstanding some criticism on issues of medical confidentiality (Brown et al., 2010), these research services have not attracted the public outcry provoked by other exploitations of these public datasets, namely NHS England’s defunct care.data (see Freeman, 2016; Vezyridis and Timmons, 2017), and Alphabet’s DeepMind unlawful contractual data arrangements with certain London NHS Trusts (see Powles and Hodson, 2017). Citizens interviewed were generally supportive of research that benefits the common good but were increasingly skeptical of NHS patient data exploitations that often take place away from the public eye, especially after the aforementioned debacles (Skovgaard et al., 2019; Vezyridis and Timmons, 2019). However, they emphasized the great challenge they face, as NHS patients, to get access to their EHR in order to understand what is in there that such research services continuously assetize. They were frustrated that information about such research services is ‘basically non-existent’, highlighting the NHS’s characteristic information ‘paternalism’ around NHS patient data capitalizations. This is not to say that such research services do not advocate openness and transparency about their operations. However, there is limited information to be found about which GP practices contribute to them with NHS patient data and/or which research projects they have approved or rejected for data access. As such, it is only through scientific publications that an interested citizen could conclude how NHS patient data is used by these research services, while having to rely solely on their GP practice to let them ‘know whether [they are] on these databases’ and how they could opt-out from a research service.
Thus, research services’ simultaneous assetization of highly complex data and management of data assets flows becomes a competitive and contradictory exercise in organizational stability (Denis and Goëta, 2017; Geiger and Gross, 2019; Tupasela, 2017). In the process, they incorporate new and existing asymmetries in the capitalization of NHS patient data and healthcare research: prioritization of EHR data-driven research over other disciplines, competition for funding and scientific career development, cumbersome information governance frameworks, gated access to data and linkages, competition for human and technological capital within and beyond the sector, variable distribution of benefits, limited public awareness and engagement.
Conclusion
In this article, we examined some of the expectations, frictions and uncertainties involved in the assetization and capitalization of NHS patient data by UK research services. Drawing on the sociology of expectations and economic STS literature, we brought into focus the way these heterogeneous assemblages (Callon, 1984) reconfigure practices, responsibilities and accountabilities for materializing the specific promises of innovative data-driven healthcare research (Brown, 2003; Petty and Heimer, 2011). We showed that the assetization of NHS patient data is both the outcome and the driver of various competing epistemic and economic expectations and valuations (Birch, 2019; Brown, 2003). Divergent assets for healthcare and biomedical research are then produced within a specific competitive and regulated environment increasingly entangled into financial industry expectations, logics and practices (Martin, 2015; Robinson, 2018). In this way, research services maintain a balance between establishing and expanding the health data science disciplinary field, securing their organizational and financial sustainability, supporting biomedical and healthcare innovation, while also maintaining their public acceptability.
NHS patient data is configured and used both as a continuously generated resource ready for extraction as well as an asset for circulation in an asset-based biomedical knowledge economy and the enactment of a range of other ontologies (Dagiral and Peerbaye, 2016; Denis and Goëta, 2017). These include observational studies, patient recruitment for research, pragmatic trials, clinical risk predictions, income and so on. As producers, users and brokers of such assets, at the intersection of public healthcare, academia and the biomedical industry (Timmons and Vezyridis, 2017), these organizational entities are gradually becoming ‘obligatory passage points’ (Callon, 1984) for how these public healthcare datasets are (socially, politically, and economically) valuated and EHR data-driven research and development is conducted (Birch et al., 2020).
These research services are constantly attempting to realize and capitalize on their multiple epistemic, reputational, platform, ethical and financial values out of asymmetrical and divergent data assetizations (Barrett et al., 2016). They converge and translate expectations and promises of innovative EHR data-driven research as a pragmatic and more effective alternative to the study of population health for healthcare improvement. At the same time, they actively participate in the reconfiguration of healthcare as a datafied technoscientific and ‘transparent’ practice (Tsoukas, 1997), the NHS as both the means and the subject of data labour, and patients-citizens as scientific, corporate and moral resource for continuous data co-production (with the NHS) (Brown, 2003). The numbers of contributing GP practices, sizes of populations covered, data fields, time periods included and speeds of databases updates with new NHS patient data are valuated and valorized. Information governance, including de-identification of EHRs, consent and public engagement are mobilized as an epistemic and political economic apparatus (Birch, 2017b) of institutionalized altruism and volunteerism towards uncontested data extractions and exploitations.
In effect, research services have assumed the overall role of ‘de-risking’ the costly assetization of these datasets for actors in the academic and biomedical industries (Brown, 2003; Cooper and Waldby, 2014; Robinson, 2018). They internalize the uncertainties and risks involved—which are externalized by the NHS (Robinson, 2018) —in valuating, creating, qualifying and mobilizing the data assets, as well as the capital necessary for shaping renewed promises of health and wealth (Tarkkala et al., 2018). They are pragmatic in the frictions and challenges they face around datasets’ completeness and integrity, access delays and costs (Powell et al., 2017), clustered geographical coverage of populations, varied cultures of collaboration and data sharing or inadequate public engagement. At the same time, however, they participate in the externalization of other risks and uncertainties: costly datasets and infrastructural incapacity, inadequate standardization of coding practices and extractions, burdensome information governance for data extraction and linkage, limited public or professional acceptability, restricted funding and investments and, inadequate supplies of data scientists (Denis and Goëta, 2017; Kleinman and Vallas, 2001). In this way, they attempt to commit stakeholders in ‘extensified’ data valuations and assetizations across society (Borup et al., 2006; Lilley and Papadopoulos, 2014; Robinson, 2018).
Lastly, we found that this assetization of NHS patient data is asymmetrical in several noteworthy ways. First, although research services attempt to maintain their organizational and financial sustainability, the rents for these assets are seen as prohibitive for researchers conducting observational studies (Gilbert et al., 2015). They now have to compete for funding in order to acquire increasingly diverse and complex datasets, produced out of various asymmetries of knowledge and coding practices (Barman, 2002; Tsoukas, 1997). There is also limited information and opportunity for GP practices and the public to participate in the normative shaping of NHS patient data capitalization. Finally, as these research services embark on their own scientific endeavors and compete for data and customers they asymmetrically diverge in their assetization of NHS patient data. They operate under different agreements with IT or data suppliers and experience challenges of access to resources, including human capital. In an attempt to differentiate their service for intertwined epistemic and market sustainability and dominance (Barman, 2002), they not only provide access to carefully curated datasets but also to other diverse digital assets of objectified knowledge about healthcare (Birch, 2019; Tsoukas, 1997) (e.g. disease-specific code lists, phenotypes, biomarkers, quality improvement reports, recruitment pools of research participants, clinical risk prediction algorithms, analytic scripts, data dictionaries, scientific publications).
Data assetization continues to grow bigger and more diverse in size, complexity and scope (e.g. genomics, wearables and medical imaging) under a political-economic environment of industrialized scientific competition and production of expectations, assets and capital (Brown, 2003). The capitalization of NHS datasets by such hybridized e-infrastructures (Kleinman and Vallas, 2001) is replacing restrictions of access to dispersed data from the many individual organizational silos (e.g. GP practices) to fewer and bigger silos where data is now aggregated out of various closed-source, proprietary clinical systems and converged into licensed databases. Such restrictions around NHS patient data-driven research may, in the end, stabilize an oligopolistic ‘data acess market’, attractive only to those actors that have the capital and the socio-material infrastructure to undertake this kind of research at the required scale and depth (Powles and Hodson, 2017).
We suggest checks and balances regarding research services’ ontological, epistemological and ethical roles as biomedical knowledge makers and brokers should be considered. Transparent, accountable, inclusive and equitable research agendas and knowledge-making practices should be elaborated for the benefit of society at large. This is especially urgent at a time when the role of EHR data-driven research is gradually moving beyond the concept of ‘secondary use’ and onto the longitudinal surveillance and management of population health and the contemporary planning of health systems.
Footnotes
Acknowledgements
We are grateful to the study participants for their time and patience. We would also like to thank the SSS reviewers and the editors Adam Hedgecoe and Sergio Sismondo.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the European Commission (H2020-MSCA-IF-EF-2014-659478).
