Abstract
Keywords
Introduction
Two different, very high-profile studies reported in two of the world’s most prestigious medical journals were recently retracted. Both articles related to coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which is driving an unprecedented urgency in the dissemination of research findings to inform medical doctors, medical scientists and others across the globe.
The articles in context
One article, retracted by the Editors of the New England Journal of Medicine at the authors’ request, described a study of the ‘potential harmful effect of angiotensin-converting-enzyme (ACE) inhibitors and angiotensin-receptor blockers (ARBs)’ in hospitalised COVID-19 patients (Mehra et al., 2020c: 1). The authors’ rationale for the retraction lay in their inability to validate the primary data sources underpinning the study because ‘not all the authors could gain access to the raw data and the raw data could not be made available to a third-party auditor’ (Mehra et al., 2020b: 1).
The other article, in The Lancet, reported on a study involving 96,032 subjects from a registry-database comprising data from 671 hospitals across six continents, for the period 20 December 2019 to 14 April 2020. The authors indicated that their study provided ‘the most robust real-world evidence to date on the usefulness’ of ‘hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19’ (Mehra et al., 2020e: 2).
Three authors were common to both studies and one of these (Desai) is reported to be founder and chief executive of Surgisphere Corporation (Surgical Outcomes Collaborative), an American healthcare data analytics company that provided the data for both studies (Davey, 2020a; Davey et al., 2020; Fouquet and Langreth, 2020; Mehra et al., 2020e: 2).
On 28 May 2020, an international group of 183 eminent clinicians, medical scientists and ethicists wrote an open letter to the authors and the Editor of The Lancet outlining their concerns about methodological and data integrity aspects of the hydroxychloroquine study, including apparent lack of ethics review (Watson et al., 2020). They pointed out, for example, that the Australian data on in-hospital COVID-19 deaths during the period under study were inconsistent with official government statistics; Mehra et al. (2020d) submitted a correction to the Australian mortality statistic on 29 May. The international experts called for transparency of the aggregated patient data and their sources, independent validation of the analyses and confirmation that any mined data had been ethically and legally obtained and patient privacy maintained.
On 2 June, the Lancet Editors published an ‘Expression of Concern’ alerting readers to the surfacing of ‘important scientific questions…about the data reported in the paper’ (2020: 1). On 4 June, the authors (excepting Desai) reported having commissioned an independent, private review to evaluate and confirm the Surgisphere database attributes – respectively, the origins of its elements and its completeness – and replication of the analyses. These reviewers withdrew after Surgisphere advised its inability to transfer the relevant data owing to potential violation of ‘client agreements and confidentiality requirements’, whereupon Mehra et al. (2020a: 1) promptly retracted the article.
The issues
This discussion focuses, via the lens of professional health information management in Australia, on hospital practices and some aspects of the health administrative data categories reported in the (hydroxychloroquine and chloroquine) study. It is agnostic regarding the merits of the research, whose clinical, methodological and analytical aspects have been comprehensively addressed by medical, statistical and scientific experts (Watson et al., 2020). General aspects have been reported elsewhere (Davey, 2020a, 2020b; Davey et al., 2020; Ledford and Van Noorden, 2020; Wise, 2020).
A conundrum of data provenance
When describing the Surgisphere database, Mehra et al. explained that ‘Real-world data are collected through automated data transfers that capture 100% of the data from each healthcare entity at regular, predetermined intervals…’ (2020e: 2). Reporting for Bloomberg, Fouquet and Langreth (2020) stated that Surgisphere advised its data came from: “a registry, with data obtained from electronic health records” of a “very specific group of hospitalized patients with COVID-19.” The company “directly integrates with the EHRs of our hospital customers,” and “has permission to include these hospitals’ [electronic health record] EHR data in its query-able registry/database of real-world, real-time patient encounters.”
It is useful to note that Australian hospital medical records and the data therein and derived therefrom, whether in electronic, digitised (scanned) or paper format, are owned by the hospital. They are not the property of clinicians, HIMs, administrators, IT workers, patients or anybody else. They cannot be mined or ‘exported’ – in part or in full, for commercial gain or gratis – to any external agency or person except for legitimate clinical treatment and follow-up, legal mandate and otherwise per legislation underpinning statutory reporting. Medical record and other derived data may be extracted for clinical teaching and for internal and external research purposes, including for approved clinical registries and trials; strict requirements apply, even to the level of data elements provided and their intended use. Importantly, their use externally is subject initially to explicit approval by the institutional Human Research Ethics Committee (HREC).
Ethics approval
The authors stated that ‘The data collection and analyses are deemed exempt from ethics review’ (Mehra et al., 2020e: 3).
Requests for data in Australian hospitals are reviewed by the HREC which would not allow ongoing, open-ended access to the EHR by researchers. Hospitals and their HIMs are obliged to ensure legislative compliance concerning privacy and access to records and data. Analytics-type access by internally accredited doctors to the ‘back end’ of the EHR is similarly regulated by institutional health information governance policy and ethics requirements. External requests are channelled via gatekeepers, specifically the HIMs and, depending on the hospital, Business Intelligence and/or Research-Ethics units, who act collaboratively to ensure requests are directed to the HREC for consideration.
The authors reported that ‘Collection of a 100% sample from each healthcare entity is validated against financial records and external databases to minimise selection bias’ (Mehra et al., 2020e: 3). This level of validation via financial records would be unfeasible in the Australian context as hospital medical records do not hold financial-billing and clinical costing data.
Privacy and de-identification
According to Bloomberg, Dr Desai stated: We are not responsible for the source data, thus the labor intensive task required for exporting the data from an EHR (electronic health record), converting it into the format required by our data dictionary, and fully de-identifying the data is done by the healthcare partner. Surgisphere does not reconcile languages or coding systems (Fouquet and Langreth, 2020).
The hospital sources
The Guardian’s journalists identified and contacted five Melbourne hospitals and two Sydney hospitals whose cooperation, they estimated, ‘would have been essential for the Australian patient numbers in the database to be reached’. They reported that none of these hospitals had heard of Surgisphere or been involved with any such database (Davey et al., 2020).
Administrative data items
Patients who are subjects in HREC-approved externally and internally conducted trials and other studies are asked individually for non-standard data items, as occurs via assessment and documentary practices undertaken by clinician-researchers upon a patient’s enrolment in an approved registry, trial or other study, or during the study.
On the basis of the now retracted paper, it seems that some data items reported for subjects from Australia could not have been obtained from hospital standard data or extracted from hospital medical records. Items such as ‘race or ethnicity’ are not collected at admission, with the exception of a routine question to all patients by Admissions Department clerks on whether they identify as Aboriginal or Torres Strait Islander (part of health authorities’ ongoing efforts to enhance data quality to inform monitoring and improvement of the health status of Indigenous Australians). Similarly, ‘continent of origin’ and ‘age’ are not collected routinely. Data items collected and reported internally and to government Departments of Health are ‘country of birth’ and ‘date of birth’. ‘[C]ontinent’ could be derived for this study from the source of each data tranche and ‘age’ (actual, or for a standardised birthday) calculated from dates of admission and discharge or death, which appear to have been collected.
Disease classification
The article appears to imply that codes were extracted direct from EHRs as the authors report having collected ‘[u]nderlying comorbidities based on International Classification of Diseases, [T]enth [R]evision, [C]linical [M]odification codes present in either the inpatient or outpatient electronic health record’ (Mehra et al., 2020e: 3) to obtain a range of specified diagnoses, clinical history factors, therapeutic interventions and health-related behaviours. Australian hospitals do not code Outpatient Department diagnoses and, depending on the state, Emergency Departments use a cut-down version of the ICD-10-AM code-set or SNOMED – neither of which is ICD-10-CM.
It is not clear from the article how the code translations were effected; for example, if, as implied, raw data were extracted from the ‘back end’ of the EHR system, presumably in the form of EHR-embedded SNOMED or other clinical terminology codes, these would need then to have been translated into ICD-10-CM. Raw data extraction presents a risk that the authors could have used data that represent documentation by clinicians that may have been speculative rather than evidence-based diagnoses. If, contrary to the procedure as reported by the authors, the coded data were not ‘extracted’ direct from the EHR then it might be inferred that, in the Australian data, the original ICD-10-AM codes were obtained and mapped to ICD-10-CM. If this were the case, it is worth noting that the use of ICD-10-CM in a multi-country study is potentially problematic: this version of the ICD is used in the United States of America and some other countries. In contrast, Canada, however, has its own modification (ICD-10-CA), France has CIM-10 FR and many countries use the World Health Organization’s base, skeletal version of the ICD which is far more limited than the modifications. Australia has used ICD-10-AM, its own relatively complex modification of the ICD, since 1998; various editions of ICD-10-AM are also used in other countries, for example, Ireland, Singapore, Saudi Arabia and elsewhere (IHPA, 2019).
There are substantial differences between ICD-10-AM and ICD-10-CM, and cross-mapping requires the classificatory skills and experience of specialist HIMs familiar with both code-sets and associated standards. In the absence of current, expertly designed auto-mapping tables that account for within-country, idiosyncratic coding rules, standards and different editions, coded data would lack integrity; essentially, they could be ‘garbage’ data. The article does not mention translation of the coded data to ICD-10-CM from a range of different versions of the ICD. Regardless of the classificatory status (SNOMED or ICD-based) of the coded data used for the study, it is usual research reporting protocol to explain the classification (coding and recoding) procedures, systems and versions as these may affect data quality and, potentially, results.
Implications
Improbable claims about direct, automated data transfers from confidential hospital EHRs to supply ‘big’ proprietary health databases are increasingly appearing in the grey literature. These need to be tempered with the realities of confidential patient information and its management, and the actual data items held. The fact remains that in Australia, medical research, including COVID-19 research, is highly prioritised and robustly supported by hospitals. Notwithstanding this, hospital patient records and derivative data are subject to protective privacy laws and institutional ethical and health information management policies.
Conclusions
Two lessons that might be learned from the example of the retracted study data reported by Mehra et al. (2020e) are that hospital HRECs should routinely include the senior, expert health data custodian (Director of Health Information Services/Chief Health Information Manager) in their memberships to provide informed advice; and medical researchers should consider including a qualified Health Information Manager as a data-expert member of the research team.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
