Evidence of Gender Differences in the Diagnosis and Management of Coronavirus Disease 2019 Patients: An Analysis of Electronic Health Records Using Natural Language Processing and Machine Learning

Abstract

Background:

The impact of sex and gender in the incidence and severity of coronavirus disease 2019 (COVID-19) remains controversial. Here, we aim to describe the characteristics of COVID-19 patients at disease onset, with special focus on the diagnosis and management of female patients with COVID-19.

Methods:

We explored the unstructured free text in the electronic health records (EHRs) within the SESCAM Healthcare Network (Castilla La-Mancha, Spain). The study sample comprised the entire population with available EHRs (1,446,452 patients) from January 1st to May 1st, 2020. We extracted patients' clinical information upon diagnosis, progression, and outcome for all COVID-19 cases.

Results:

A total of 4,780 patients with a confirmed diagnosis of COVID-19 were identified. Of these, 2,443 (51%) were female, who were on average 1.5 years younger than male patients (61.7 ± 19.4 vs. 63.3 ± 18.3, p = 0.0025). There were more female COVID-19 cases in the 15–59-year-old interval, with the greatest sex ratio (95% confidence interval) observed in the 30–39-year-old range (1.69; 1.35–2.11). Upon diagnosis, headache, anosmia, and ageusia were significantly more frequent in females than males. Imaging by chest X-ray or blood tests were performed less frequently in females (65.5% vs. 78.3% and 49.5% vs. 63.7%, respectively), all p < 0.001. Regarding hospital resource use, females showed less frequency of hospitalization (44.3% vs. 62.0%) and intensive care unit admission (2.8% vs. 6.3%) than males, all p < 0.001.

Conclusion:

Our results indicate important sex-dependent differences in the diagnosis, clinical manifestation, and treatment of patients with COVID-19. These results warrant further research to identify and close the gender gap in the ongoing pandemic.

Introduction

As of July 2020, the World Health Organization (WHO) has declared that the coronavirus disease 2019 (COVID-19) pandemic is far from controlled. The cumulative number of confirmed COVID-19 cases across 216 countries worldwide amounts to over 11,874,226; 545,481 confirmed deaths have been reported to date.¹ Daily numbers of both infections and casualties are reaching record highs in many countries, with many already experiencing ‘second waves’ after lockdowns lift.²

Ever since COVID-19 was initially identified on December 31, 2019 in Wuhan (Hubei Province, China),³ there remain many unknowns regarding the epidemiology, clinical characteristics, prognosis, and management of the disease.⁴ Although substantial efforts have been aimed at improving our clinical understanding of the disease, less is known about the gendered impact of the current pandemic. Indeed, investigating sex- and gender-related issues in health care is an ongoing and unmet need,⁵ and it is considered a research priority issue within the WHO's Sustainable Development Goals, a strategic opportunity to promote human rights, and achieve health for all.⁶

Characterizing the extent to which COVID-19 impacts women and men differently is of vital importance to better understand the consequences of the pandemic and to design equitable health policies and effective therapeutic strategies. In this line, recent evidence suggests that there are indeed sex differences in the clinical outcomes of COVID-19.^7
–9 Some hypotheses underscore the influence of hormonal factors,¹⁰ immune response,¹¹ differential distribution of the angiotensin-converting enzyme 2 (ACE-2) receptors, and smoking habits,¹² among others.¹³

To further characterize the gendered impact of COVID-19, here, we aimed to address whether the frequency and severity of COVID-19 affect women differently than men. In addition, we sought to explore the factors underlying these differences. To achieve these goals, we used natural language processing (NLP) and artificial intelligence to explore the unstructured, free-text clinical information captured in the electronic health records (EHRs) of a large series of test-confirmed COVID-19 cases.

Methods

This study is part of the BigCOVIData initiative¹⁴ and was conducted in compliance with legal and regulatory requirements.¹⁵ This study was classified as a “non-post-authorization study” (EPA) by the Spanish Agency of Medicines and Health Products (AEMPS), and was approved by the Research Ethics Committee at the University Hospital of Guadalajara (Spain). We have followed the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidance for reporting observational research.¹⁶

Study design, data source, and patient population

This was a retrospective, multicenter study using secondary free-text data from patients' EHRs within the SESCAM Healthcare Network in Castilla-La Mancha, Spain. Data were retrieved from all available departments, including inpatient hospital, outpatient hospital, and emergency room, for virtually all types of provided services in each participating hospital. The study period was January 1, 2020–May 1, 2020.

The study database was fully anonymized and aggregated, so it did not contain patients' personally identifiable information. Given that clinical information was handled in an aggregate, anonymized, and irreversibly dissociated manner, patient consent regulations do not apply to the present study.

The study sample included all patients in the source population with test-confirmed COVID-19 (mainly polymerase chain reaction [PCR] + but also IgG/IgM+).

Extracting free text from EHRs: EHRead^®

To meet the study objectives, we used EHRead, a technology developed by SAVANA that applies NLP, machine learning, and deep learning to access and analyze the unstructured, free-text information jotted down by health professionals in EHRs. The process used for the extraction of clinical data by EHRead has been previously described.¹⁷ In brief, all extracted clinical terms are standardized according to a unique terminology. This custom-made terminology is based on systematized nomenclature of medicine-clinical terms (SNOMED-CT) and includes more than 400,000 medical concepts, acronyms, and laboratory parameters aggregated over the course of 5 years of free-text mining. These clinical entities are detected in the unstructured free text are then classified based on EHRs' sections using a combination of regular expression rules and machine learning models. Deep learning classification methods, which rely on word embeddings and context information, are also used to determine whether the clinical information is expressed in terms of negative, speculative, or affirmative statements.

Internal validation

For particular cases where extra specifications are required (e.g., to differentiate COVID cases from other mentions of the term related to fear of the disease or potential contact), the detection output was manually reviewed in more than 5,000 reports to avoid any ambiguity associated with free-text reporting. All NLP deep learning models used here were validated using the standard training/validation/testing approach; we used a 75/12/13 split ratio in the available annotated data (between 2,000 and 3,000 records, depending on the model) to ensure efficient generalization on unseen cases. For the linguistic validation of analyzed variables regarding COVID-19 mentions, signs/symptoms (e.g., dyspnea, tachypnea, pneumonia), laboratory values (e.g., ferritin, lactate dehydrogenase [LDH]), and treatments (e.g., hydroxychloroquine, cyclosporine, Lopinavir/Ritonavir), we obtained F-scores (the harmonic mean between precision and recall) >0.80 in all cases. However, the validation of “PCR-confirmed COVID-19” returned a F-score of 0.64; although the precision in the identification of this concept was very high (0.90), the recall value was 0.5. This means that even though our model accurately identifies PCR+ cases (i.e., very low number of false positives), the prevalence data reported here may be underestimated. Importantly, out of a subsample of 964 manually reviewed clinical reports (532 from males and 432 from females), a total of 158 PCR+ cases (16.4%) were missed by the system. The proportion of female patients among the detected and missed cases was 46.2% and 38.0%, respectively; a chi-square test of independence revealed no significant differences between the two groups (p = 0.07). These data indicate that there was not a clear bias toward females in the proportion of undetected cases.

Data analyses

We generated frequency tables to display the information regarding comorbidities, symptoms, and other categorical variables. Continuous variables (e.g., age) were described using summary tables containing mean, standard deviation, median, minimum and maximum values, and quartiles for each variable. To test for possible statistically significant differences in the distribution of categorical variables between males and females, we used Yates-corrected chi-square tests for percentages or analysis of variance for normally distributed continuous variables. Sex ratios (SRs) and their 95% confidence intervals (CIs) of several epidemiological and clinical indicators are presented. To determine whether the SRs of confirmed COVID-19 cases significantly varied across time, we performed linear regression analyses to test the null hypothesis that the slope is equal to zero. Sex differences in COVID-19-related clinical outcomes (i.e., confirmed cases, hospitalization, and intensive care unit [ICU] admission) were further confirmed via multivariate analyses, adjusting for age. All statistical inferences were performed at the 5% significance level using two-sided tests or 95% CIs.

Results

From a source population of 2,045,385 individuals, we extracted and analyzed the clinical information of 1,446,452 patients with available EHRs from January 1st to May 1st, 2020. Among these, we then retrieved the clinical information upon diagnosis, progression, and outcome for 4,780 patients with a test-confirmed diagnosis of COVID-19, of whom 2,443 (51%) were women. The patient flowchart for female and male patients is depicted in Figure 1. The female/male SRs (95% CI) for hospitalization and ICU admission were 0.49 (0.43–0.55) and 0.60 (0.44–0.80), respectively. To further confirm the sex-dependent differences in the clinical outcomes related to COVID-19, we performed a multivariate analysis of the explored outcomes adjusted by age. These analyses revealed that higher risk for hospital admission and ICU use in men was sustained after controlling for patients' age, with female/male SRs (95% CI) of 0.73 (0.65–0.83) and 0.48 (0.31–0.76), respectively. Regarding confirmed diagnosis, sex-dependent differences remained nonsignificant in the multivariate analysis (female/male SRs of 0.88, 95% CI: 0.69–0.13).

FIG. 1.

Patient flowchart. Flowchart depicting the total number of inhabitants in the source population, the number (%) of patients with available electronic health records analyzed, the number of patients diagnosed with COVID-19, and of those, the number of hospitalizations and intensive care unit admissions. ♂ = male patients; ♀ = female patients. *Confirmed cases based on laboratory results (mainly PCR+ but also IgG/IgM+). COVID-19, coronavirus disease 2019; PCR, polymerase chain reaction; IgG, immunoglobulin G; IgM, immunoglobulin M.

Isolated COVID-19 cases were already identified in the SESCAM system early in January and February 2020, yet, they were scarce up to the first week of March 2020. Shortly after, confirmed cases raised exponentially and reached a daily maximum at the end of March/early April, 2020. This peak in newly reported cases was followed by a slow decrease; by early May 2020, confirmed cases went close to near-zero levels (Fig. 2A). As shown in Figure 2B, the proportion of COVID-19 cases in females remained stable throughout the beginning of the outbreak up to the plateau; by the end of the study period, the number of diagnosed female patients markedly increased. Linear regression analyses showed that the SR of confirmed cases (newly identified cases in females over new cases in males) significantly increased over time, p < 0.001.

FIG. 2.

Epidemiological curve and SRs showing COVID-19 cases within the study period. (A) Epidemiological curve showing test-confirmed COVID-19 cases (i.e., PCR+/IgG/IgM+) across time within the study period in male (blue) and female (green) patients. (B) SRs depicting the variation of confirmed COVID-19 cases over time within the study period, calculated as the number of diagnosed female patients over male patients. The dotted red line indicates a SR of 1 (i.e., equal proportion of diagnosed male and female patients); a SR >1 indicates higher proportion of diagnosed female patients over male patients. As indicated by the linear regression plot, the SR increases over time, indicating a growing number of diagnosed women (in relationship to men). *p < 0.001 (slope). Shaded gray area indicates confidence interval (95%). SR, sex ratio.

Female COVID-19 patients were on average 1.5 years younger than males (61.7 ± 19.4 vs. 63.3 ± 18.3, p = 0.0025). In addition, there were more female patients in the 15–59-year-old interval (Fig. 3), with the greatest SR (95% CI) observed in the 30–39-year-old interval (1.686; 1.351–2.113) (Table 1).

FIG. 3.

Age and sex distribution of COVID-19 patients. Age distribution of incident cases of COVID-19 in females (left) and males (right) in the study population for the period comprised between January 1, 2020 and May 1, 2020.

Table 1.

Number of Coronavirus Disease 2019 Cases by Age Group and Sex

Age (years-old)	Total population^a		Cases (per 100,000)
Age (years-old)	Female	Male	Female	Male	Sex ratio^b	95% CI	p-Value^c
Total	1,018,707	1,026,678	239.7	227.6	1.054	0.995–1.115	0.0741
<15	148,133	157,505	11.5	17.1	0.672	0.358–1.225	0.2486
15–29	156,432	168,664	67.1	40.3	1.663	1.229–2.267	0.0012
30–39	128,166	136,230	156.0	92.5	1.686	1.351–2.113	<0.001
40–49	159,660	169,961	217.3	172.4	1.261	1.079–1.474	0.0039
50–59	150,689	157,227	329.2	280.5	1.173	1.032–1.335	0.0159
60–69	108,557	109,862	342.7	409.6	0.837	0.729–0.960	0.0121
70–79	85,197	73,926	400.2	562.7	0.711	0.616–0.821	<0.001
>79	81,970	53,301	689.3	968.1	0.712	0.632–0.803	<0.001

Total population of Castilla La-Mancha (Spain).

A SR of 1 indicates equal proportion of male and female patients; a SR >1 indicates higher proportion of female patients than male patients.

p-Values from Yates-corrected chi-square test on percentage difference of female versus male COVID-19 patients.

CI, confidence interval; SR, sex ratio; COVID-19, coronavirus disease 2019.

We did not observe any sex-dependent differences in the number of COVID-19 cases per 100,000 individuals; the prevalence rates for female and male patients was 239.7 and 227.6, respectively, with a corresponding SR (95% CI) of 1.054 (0.995–1.115), p = 0.0741 (Table 1). The data shown in Table 2 indicate an age-dependent increase in reported cases in both males and females, being patients aged >79 years the most affected with rates of 968.1 in men and 689.3 in women, and corresponding SR (95% CI) of 0.712 (0.632–0.803), p < 0.001.

Table 2.

Clinical Manifestations of Coronavirus Disease 2019 Upon Diagnosis

	Female, n (%) (N = 2,443)	Male, n (%) (N = 2,337)	Total, n (%) (N = 4,780)	Sex ratio^a	95% CI	p-Value^b
Age (years-old)
Mean (SD)	61.7 (19.4)	63.3 (18.3)	62.5 (18.9)			0.0025
Signs and symptoms, n (%)
Patients with no symptoms	705(28.9)	486 (20.8)	1,191 (24.9)	1.387	1.22–1.579	<0.001
Cough	1,094 (44.8)	1,199 (51.3)	2,293 (48.0)	0.873	0.79–0.964	0.0080
Fever	878 (36.0)	1,169 (50.0)	2,047 (42.8)	0.719	0.647–0.797	<0.001
Dyspnea	759 (31.1)	914 (39.1)	1,673 (35.0)	0.794	0.71–0.888	<0.001
Respiratory crackles	472 (19.3)	627 (26.8)	1,099 (23.0)	0.720	0.631–0.822	<0.001
Diarrhea	385 (15.8)	350 (15.0)	735 (15.4)	1.052	0.901–1.23	0.5467
Headache	277 (11.3)	166 (7.1)	443 (9.3)	1.596	1.307–1.953	<0.001
Myalgia	230 (9.4)	207 (8.9)	437 (9.1)	1.063	0.874–1.294	0.5757
Lymphopenia	147 (6.0)	186 (8.0)	333 (7.0)	0.756	0.604–0.945	0.0163
Rhonchus	133 (5.4)	179 (7.7)	312 (6.5)	0.711	0.563–0.896	0.0044
Chest pain	158 (6.5)	153 (6.5)	311 (6.5)	0.988	0.785–1.243	0.9635
Anosmia	153 (6.3)	109 (4.7)	262 (5.5)	1.342	1.044–1.731	0.0254
Tachypnea	74 (3.0)	133 (5.7)	207 (4.3)	0.533	0.397–0.71	<0.001
Wheezing	69 (2.8)	86 (3.7)	155 (3.2)	0.768	0.555–1.059	0.1250
Skin symptoms	39 (1.6)	34 (1.5)	73 (1.5)	1.097	0.689–1.753	0.7834
Rhinitis	24 (1.0)	24 (1.0)	48 (1.0)	0.957	0.538–1.701	0.9938
Ageusia	31(1.3)	15 (0.6)	46 (1.0)	1.966	1.073–3.766	0.0403
Sore throat	27 (1.1)	18 (0.8)	45 (0.9)	1.431	0.789–2.657	0.2993
Dysphagia	12 (0.5)	20 (0.9)	32 (0.7)	0.577	0.272–1.173	0.1746
Neuralgia	16 (0.7)	13 (0.6)	29 (0.6)	1.175	0.561–2.507	0.8025
Hemoptysis	9 (0.4)	12 (0.5)	21 (0.4)	0.721	0.29–1.723	0.5919
Ophthalmologic symptoms	9(0.4)	9 (0.4)	18 (0.4)	0.957	0.368–2.487	1
Splenomegaly	3 (0.1)	6 (0.3)	9 (0.2)	0.491	0.098–1.924	0.4641
Hepatomegaly	2 (0.1)	5 (0.2)	7 (0.1)	0.400	0.051–1.943	0.4159
Respiratory rate (bpm)
N	249	339	588
Mean (SD)	23.3 (14.4)	23.9 (12.6)	23.7 (13.4)
Patients (n, %) with high RR (>20)	105 (42.3)	167 (49.3)	272 (46.3)	0.856	0.637–1.148	0.3356
Radiological findings, n (%)
Chest X-ray	1,600 (65.5)	1,829 (78.3)	3,429 (71.7)	0.837	0.766–0.914	<0.001
No abnormalities	552 (34.5)	450 (24.6)	1,002 (29.2)	1.402	1.217–1.615	<0.001
Any abnormality	1,048 (65.5)	1,379 (75.4)	2,427 (70.8)	0.869	0.782–0.965	0.0091
Bilateral infiltrates	902 (56.4)	1,213 (66.3)	2,115 (61.7)	0.850	0.762–0.948	0.0039
Ground-glass opacities	277 (17.3)	380 (20.8)	657 (19.2)	0.833	0.704–0.986	0.0378
Interstitial pattern	152 (9.5)	175 (9.6)	327 (9.5)	0.993	0.790–1.246	0.9972
Alveolar bilateral infiltrates	60 (3.8)	88 (4.8)	148 (4.3)	0.780	0.556–1.088	0.1683
Unilateral infiltrates	7 (0.4)	15 (0.8)	22 (0.6)	0.540	0.203–1.295	0.2393
Arterial blood gases, n (%)
pH
N	476	646	1,122
Mean (SD)	7.4 (0.1)	7.4 (0.1)	7.4 (0.1)
Patients (n, %) with pH >7.42	298 (62.6)	449 (69.5)	747 (66.6)	0.901	0.746–1.087	0.2982
pO₂ (mmHg)
N	553	778	1,331
Mean (SD)	72.1 (24.8)	70.5 (26.9)	71.2 (26.1)
Patients (n, %) with pO₂ <60	144 (26.0)	249 (32.0)	393 (29.5)	0.814	0.644–1.026	0.0924
pCO₂ (mmHg)
N	443	622	1,065
Mean (SD)	35.8 (7.3)	34.5 (7.9)	35.1 (7.7)
Patients (n, %) with pCO₂ >45	46 (10.5)	42 (6.8)	88 (8.4)	1.538	0.993–2.387	0.0661
O₂ Sat (%)
N	1,188	1,336	2,524
Mean (SD)	94.1 (5.7)	93.3 (5.6)	93.7 (5.6)
Patients (n, %) with O₂ Sat <94	385 (32.4)	528 (39.5)	913 (36.2)	0.820	0.704–0.955	0.0122

A SR of 1 indicates equal proportion of male and female patients; a SR >1 indicates higher proportion of female patients than male patients.

p-Values from Yates-corrected chi-square test of difference between percentage of patients (female vs. male) presenting with the sign/symptom. All tests were performed individually for each variable sign/symptom.

RR, respiratory rate; SD, standard deviation; bpm, beats per minute.

Regarding symptoms upon diagnosis, headache, anosmia, and ageusia were significantly more frequent in women than men, all p < 0.001 (Table 2). Interestingly, imaging by chest X-ray or blood tests were performed less frequently in females (65.5% vs. 78.3% and 49.5% vs. 63.7%, respectively), all p < 0.001. Regarding hospital resource use, female COVID-19 patients showed less frequency of hospitalization (44.3% vs. 62.0%) and ICU admission (2.8% vs. 6.3%) than males, all p < 0.001.

As expected, comorbidities upon COVID-19 diagnosis were more often reported in men, whereas 78.9% of female patients had at least one of the studied comorbidities at diagnosis, this percentage was 87.4% in males (p = 0.0183) (Table 3). However, depressive disorders and asthma were significantly more frequent in females, with associated ratios of 2.030 (1.616–2.565) and of 1.743 (1.363–2.241), respectively.

Table 3.

Comorbidities of Coronavirus Disease 2019 Patients Upon Diagnosis

	Female (N = 2,443)	Male (N = 2,337)	Total (N = 4,780)	Sex ratio^a	95% CI	p-Value^b
Any comorbidity, n (%)	1,928 (78.9)	2,043 (87.4)	3,971 (83.1)	0.903	0.830–0.982	0.0183
Hypertension	843 (34.5)	1,027 (43.9)	1,870 (39.1)	0.785	0.705–0.874	<0.001
Heart disease	681 (27.9)	908 (38.9)	1,589 (33.2)	0.718	0.640–0.804	<0.001
Ischemic heart disease	72 (2.9)	243 (10.4)	315 (6.6)	0.284	0.215–0.370	<0.001
Heart failure	145 (5.9)	151 (6.5)	296 (6.2)	0.919	0.726–1.162	0.5164
Diabetes	349 (14.3)	502 (21.5)	851 (17.8)	0.665	0.573–0.771	<0.001
Kidney disease	295 (12.1)	463 (19.8)	758 (15.9)	0.610	0.521–0.713	<0.001
Chronic kidney disease	123 (5.0)	202 (8.6)	325 (6.8)	0.583	0.461–0.733	<0.001
Obesity	279 (11.4)	244 (10.4)	523 (10.9)	1.094	0.913–1.311	0.3545
Cancer	200 (8.2)	310 (13.3)	510 (10.7)	0.617	0.512–0.743	<0.001
Hematologic malignancies	46 (1.9)	55 (2.4)	101 (2.1)	0.801	0.537–1.189	0.3142
Prostate cancer	—	83 (3.6)	83 (1.7)	—	—	—
Breast cancer	50 (2.0)	1 (0.0)	51 (1.1)	41.852	9.322–974.645	<0.001
Colon cancer	15 (0.6)	30 (1.3)	45 (0.9)	0.481	0.25–0.885	0.0261
Lung cancer	6 (0.2)	36 (1.5)	42 (0.9)	0.163	0.061–0.361	<0.001
Depressive disorder	240 (9.8)	113 (4.8)	353 (7.4)	2.030	1.616–2.565	<0.001
Cerebrovascular disease	152 (6.2)	182 (7.8)	334 (7.0)	0.799	0.639–0.998	0.0545
Ischemic stroke	68 (2.8)	108 (4.6)	176 (3.7)	0.603	0.441–0.819	0.0015
COPD	64 (2.6)	266 (11.4)	330 (6.9)	0.231	0.173–0.303	<0.001
Asthma	186 (7.6)	102 (4.4)	288 (6.0)	1.743	1.363–2.241	<0.001
Autoimmune disease	111 (4.5)	43 (1.8)	154 (3.2)	2.463	1.737–3.556	<0.001
Obstructive sleep apnea syndrome	42 (1.7)	66 (2.8)	108 (2.3)	0.610	0.409–0.898	0.0157
Alzheimer Disease	48 (2.0)	39 (1.7)	87 (1.8)	1.176	0.768–1.812	0.5201
Epilepsy	30 (1.2)	42 (1.8)	72 (1.5)	0.684	0.423–1.095	0.141
Chronic liver disease	26 (1.1)	41(1.8)	67 (1.4)	0.608	0.366–0.993	0.0605
Parkinson Disease	37 (1.5)	27 (1.2)	64 (1.3)	1.309	0.796–2.18	0.3473
Bronchiectasis	18 (0.7)	45 (1.9)	63 (1.3)	0.385	0.216–0.656	<0.001
Immunodeficiency disorder	29 (1.2)	30 (1.3)	59 (1.2)	0.925	0.55–1.552	0.8668
HIV	6 (0.2)	12 (0.5)	18 (0.4)	0.485	0.165–1.265	0.2042

A SR of 1 indicates equal proportion of male and female patients; a SR >1 indicates higher proportion of female patients than male patients.

p-Values from Yates-corrected chi square test of difference between percentage of patients (female vs. male) diagnosed with each condition or disease. All tests were performed individually for each comorbidity.

COPD, chronic obstructive pulmonary disease; HIV, human immunodeficiency virus.

According to the laboratory parameters upon COVID-19 diagnosis, men significantly suffered more from lymphopenia and worse renal function (as per creatinine and urea values but not glomerular filtration rate) than women (Table 4). On the contrary, all liver function parameters, as well as D-dimer and all acute phase reactants (except for higher C-reactive protein levels in men) were also evenly distributed by sex.

Table 4.

Laboratory Parameters of Coronavirus Disease 2019 Patients Upon Diagnosis

	Female (N = 2,443)	Male (N = 2,337)	Total (N = 4,780)	Sex ratio^a	95% CI	p-Value^b
Patients with blood test n (%)	1,210 (49.5)	1,489 (63.7)	2,699 (56.5)	0.777	0.707–0.855	<0.001
Hematology
White blood cell ( × 10³/mm³)
N	749	939	1,688
Mean (SD)	11.2 (45.3)	11.2 (49.0)	11.2 (47.4)
Patients (n, %) with high white blood cell count (>9.5 males/ >11.1 females)	178 (23.8)	249 (26.5)	427 (25.3)	0.896	0.722–1.111	0.3448
Neutrophil ( × 10³/mm³)
N	350	443	793
Mean (SD)	5.6 (3.0)	5.7 (2.9)	5.7 (2.9)
Patients (n, %) with high neutrophil count (>6.1 males/ >7.5 females)	218 (37.3)	314 (41.9)	532 (39.9)	0.892	0.727–1.093	0.2930
Lymphocyte ( × 10^e3/mm³)
N	520	692	1,212
Mean (SD)	1.5 (1.5)	1.5 (1.8)	1.5 (1.6)
Patients (n, %) with low lymphocyte count (<1.1)	375 (46.0)	578 (57.0)	953 (52.1)	0.807	0.688–0.947	0.0094
Liver function
Bilirubin (mg/dL)
N	352	497	849
Mean (SD)	0.7 (0.8)	0.8 (0.8)	0.8 (0.8)
Patients (n, %) with high levels (>1.2)	23 (6.5)	61 (12.3)	84 (9.9)	0.535	0.318–0.870	0.0167
ALT (U/L)
N	913	1,171	2,084
Mean (SD)	40.3 (102.0)	52.1 (57.5)	46.9 (80.3)
Patients (n, %) with high levels (>55 male/ >53 female)	162 (17.7)	305 (26.0)	467 (22.4)	0.682	0.552–0.839	<0.001
AST (U/L)
N	735	922	1,657
Mean (SD)	49.9 (247.2)	52.8 (45.6)	51.5 (168.0)
Patients (n, %) with high levels (>40 male/ >37 female)	275 (37.4)	454 (49.2)	729 (44.0)	0.760	0.635–0.908	0.0029
GGT (U/L)
N	198	315	513
Mean (SD)	74.1 (82.2)	112.7 (156.4)	97.8 (134.0)
Patients (n, %) with high levels (>64 male/ >36 female)	124 (62.6)	154 (48.9)	278 (54.2)	1.281	0.952–1.722	0.1173
Renal function
Creatinine (mg/dL)
N	1,015	1,280	2,295
Mean (SD)	1.0 (0.8)	1.2 (1.3)	1.1 (1.1)
Patients (n, %) with high levels (>1.3)	142 (14.0)	285 (22.3)	427 (18.6)	0.629	0.505–0.780	<0.001
Urea (mg/dL)
N	879	1,129	2,008
Mean (SD)	50.8 (47.4)	53.6 (38.8)	52.4 (42.8)
Patients (n, %) with low levels (<20)	75 (8.5)	33 (2.9)	108 (5.4)
Patients (n, %) with high levels (>48)	265 (30.1)	422 (37.4)	687 (34.2)	0.807	0.675–0.963	0.0195
Glomerular Filtration rate (mL/min/1.73 m²)
N	304	372	676
Mean (SD)	60.2 (30.7)	62.6 (30.8)	61.5 (30.7)
Patients (n, %) with low rate (<60)	111 (36.5)	125 (33.6)	236 (34.9)	1.087	0.807–1.463	0.6368
Coagulation, inflammatory, and tissue damage markers
D-Dimer (mg/L)
N	831	1,006	1,837
Mean (SD)	461 (773.2)	492 (851.2)	478 (816.7)
Median (min–max)	14.1 (0–4,860)	7.2 (0–4,976)	8.9 (0–4,976)
(Q1–Q3)	0.6–649	0.7–737.8	0.7–691
Patients (n, %) with high levels (>0.49)	683 (82.2)	843 (83.8)	1,526 (83.1)	0.981	0.856–1.124	0.8078
C-reactive protein (mg/L)
N	1,173	1,439	2,612
Mean (SD)	51.4 (77.3)	70.6 (92.0)	62 (86.2)
Median (min–max)	18 (0–524)	29 (0–690)	22.9 (0–690)
(Q1–Q3)	4.7–64.8	8.0–96.8	6.0–79.8
Patients (n, %) with high levels (>8)	763 (65.0)	1,071 (74.4)	1,834 (70.2)	0.874	0.775–0.986	0.031
Ferritin (ng/mL)
N	365	470	835
Mean (SD)	520 (762.5)	1037.2 (1211.7)	811.1 (1070.2)
Median (min–max)	362 (4–9,559)	745 (11–19,391)	524.7 (4–19,391)
(Q1–Q3)	164.0–606.0	434.8–1299.2	271–1054.5
Patients (n,%) with high levels (>250 male/>120 female)	298 (81.6)	417(88.7)	715 (85.6)	0.920	0.752–1.126	0.45
LDH (U/L)
N	885	1,133	2,018
Mean (SD)	373.1 (484.1)	402.8 (259.1)	389.8 (374.9)
Median (min–max)	302 (1–13,260)	336 (16–4,276)	319 (1–13,260)
(Q1–Q3)	230–434	243–494	236–466
Patients (n, %) with high levels (>243)	619 (69.9)	847 (74.8)	1,466 (72.6)	0.936	0.817–1.072	0.3548
Fibrinogen (mg/dL)
N	394	499	893
Mean (SD)	544.8 (201.2)	593.4 (235.3)	571.9 (222.1)
Median (min–max)	530.5 (24–1,496)	577.9 (156–1,579)	548 (24–1,579)
(Q1–Q3)	370–674.8	370–763	370–720
Patients (n, %) with high levels (>400)	276 (70.1)	343 (68.7)	619 (69.3)	1.019	0.829–1.253	0.8988
Procalcitonin (ng/mL)
N	404	543	947
Mean (SD)	0.7 (4.5)	0.9 (3.5)	0.8 (4)
Median (min–max)	0.1 (0–71.2)	0.1 (0–50.1)	0.1 (0–71.2)
(Q1–Q3)	0.1–0.2	0.1–0.3	0.1–0.2
Patients (n, %) with high levels (>0.05)	308 (76.2)	485 (89.3)	793 (83.7)	0.854	0.704–1.035	0.1174

A SR of 1 indicates equal proportion of male and female patients; a SR >1 indicates higher proportion of female patients than male patients.

p-Values from Yates-corrected chi square test of difference between percentage of patients (female vs. male) in either outcome group (high levels). All tests were performed individually for each parameter.

ALT, alanine transaminase; AST, aspartate transaminase; GGT, gamma-glutamyl transpeptidase; LDH, lactate dehydrogenase.

Regarding treatments received by COVID-19 patients (Table 5), our results indicate that except chloroquine, the SR for all treatments analyzed was <1. Notably, most of these comparisons were statistically significant against female patients with COVID-19 (Table 5).

Table 5.

Treatments Used in Coronavirus Disease 2019 Patients

	Female, N (%) (N = 2,443)	Male, N (%) (N = 2,337)	Total (%) (N = 4,780)	Sex ratio^a	95% CI	p-Value^b
Antibacterials
Azithromycin	1,340 (54.9)	1,465 (62.7)	2,805 (58.7)	0.875	0.797–0.961	0.0054
Azithromycin + hydroxychloroquine	1,015 (41.6)	1,220 (52.2)	2,235 (46.8)	0.796	0.720–0.880	<0.001
Ceftriaxone	736 (30.1)	1,033 (44.2)	1,769 (37.0)	0.682	0.610–0.761	<0.001
Levofloxacin	205 (8.4)	298 (12.8)	503 (10.5)	0.658	0.546–0.793	<0.001
Amoxicillin	169 (6.9)	209 (8.9)	378 (7.9)	0.774	0.626–0.955	0.0192
Clarithromycin	35 (1.4)	36 (1.5)	71 (1.5)	0.930	0.580–1.491	0.8542
Doxycycline	10 (0.4)	22 (0.9)	32 (0.7)	0.439	0.197–0.909	0.0392
Antithrombotic agents	1,459 (59.7)	1,662 (71.1)	3,121 (65.3)	0.840	0.767–0.919	<0.001
Vitamin K antagonists	125 (5.1)	151 (6.5)	276 (5.8)	0.792	0.620–1.010	0.069
Heparins	776 (31.8)	1,032 (44.2)	1,808 (37.8)	0.719	0.645–0.802	<0.001
Platelet aggregation inhibitors	313 (12.8)	519 (22.2)	832 (17.4)	0.577	0.496–0.671	<0.001
Direct factor Xa inhibitors	81 (3.3)	93 (4.0)	174 (3.6)	0.833	0.614–1.129	0.2696
Direct thrombin inhibitors	7 (0.3)	18 (0.8)	25 (0.5)	0.377	0.145–0.874	0.0353
Enzymes	3 (0.1)	6 (0.3)	9 (0.2)	0.491	0.098–1.924	0.4641
Antimalarials
Hydroxychloroquine	1,207 (49.4)	1,478 (63.2)	2,685 (56.2)	0.781	0.71–0.859	<0.001
Chloroquine	33 (1.4)	29 (1.2)	62 (1.3)	1.088	0.657–1.81	0.8388
Antivirals
Ritonavir	439 (18.0)	656 (28.1)	1,095 (22.9)	0.640	0.560–0.731	<0.001
Darunavir and cobicistat	24 (1.0)	31 (1.3)	55 (1.2)	0.742	0.429–1.267	0.3337
Darunavir	0 (0)	5 (0.2)	5 (0.1)			-
Mucolytics
Acetylcysteine	572 (23.4)	626 (26.8)	1,199 (25.1)	0.876	0.771–0.994	0.0431
Immunosuppressants
Glucocorticoids	682 (27.9)	1,019 (43.6)	1,701 (35.6)	0.640	0.572–0.716	<0.001
Tocilizumab	37 (1.5)	89 (3.8)	126 (2.6)	0.399	0.267–0.583	<0.001
Selective immunosuppressants	25 (1.0)	54 (2.3)	79 (1.7)	0.444	0.271–0.709	<0.001
Ciclosporin	1 (0)	6 (0.3)	7 (0.1)	0.178	0.007–1.081	0.1166
Immunostimulants
Interferon beta 1b	40 (1.6)	60 (2.6)	100 (2.1)	0.639	0.423–0.954	0.0359

A SR of 1 indicates equal proportion of male and female patients; a SR >1 indicates higher proportion of female patients than male patients.

p-Values from Yates-corrected chi square test of difference between percentage of patients prescribed with the therapeutic agents (male vs. female). All tests were performed individually for each treatment.

Discussion

Using a big data approach and from a population perspective, we have identified important sex-dependent differences in the clinical manifestation, diagnosis, management, and hospital resource use associated with COVID-19. Specifically, female teenagers and young adult women were significantly more affected by COVID-19 than their male counterparts in the same age ranges; In addition, our results indicate that headache, as well as ear, nose, and throat (ENT) symptoms were significantly more frequent in female COVID-19 patients. Regarding medical outcomes, both hospitalization and ICU admission were less common in females than males. Unfortunately, basic diagnostic tests such as blood tests or imaging were less used in women.

Our results provide further evidence of the inherent gender bias in the Health System, which is thought to originate in medical school and impacts all aspects of health care.^18,19 Although this bias well established the context of cardiovascular,²⁰ respiratory,^21
–23 and infectious diseases (particularly, sexually transmitted diseases²⁴), the impact of sex and gender in the ongoing COVID-19 pandemic is just beginning to be unraveled.^8,25,26 Beyond mechanistic and molecular studies,^5

–9 more subtle and general events may already play a role in the sex-dependent management of COVID-19 patients.^27,28 One key question is whether COVID-19 affects women's reproductive health; in other coronavirus-related infectious diseases such as the severe acute respiratory syndrome and the Middle East respiratory syndrome, pregnancy has been identified as a risk factor for developing severe complications.^29,30 Finally, ovarian hormones influence inflammation, immunity, and many other aspects of women's health,^13,31 as well as the expression of ACE-2 receptors, which seem to play a role in the progression of COVID-19.³² These effects are lost after menopause (due to ovarian insufficiency), which in most women occurs around 50 years of age. Interestingly, as shown in Table 3, mood disorders (e.g., depression) and asthma were more frequent in women than in men among COVID-19 patients. These results warrant further research on the effects of menopause in COVID-19-related health outcomes.

The increased vulnerability of women to COVID-19 is also associated with occupational risks. It is well established that most frontline health care professionals are women, which puts them at a higher risk for infection and negative clinical outcomes.³³ Further, women are more likely to serve as the primary caregivers within a household, thus becoming more exposed to the disease. This becomes worrying in disadvantaged populations and resource-poor communities, as well as countries without the benefits of a universal, free-for-all health care system.

Strengths and limitations

The main strengths of our research include immediacy, large sample size, and direct access to real-world evidence. Of note, our methodology ensures absence of any bias in patient selection as our hypothesis that gender impacts diagnosis and management of COVID-19 was assessed a posteriori. The observed change in the SR of confirmed cases at the tail of this first wave of the pandemic should be further confirmed in other cohorts and geographical locations.³⁴ Finally, it is unlikely that our conclusions are impacted by the limitations of pay- or copay-systems, as Spain enjoys a universal, free-for-all health care system.

Our results should be interpreted in light of the following limitations. First, this was an observational, retrospective study; therefore, any causal inferences based on the present results must be carefully interpreted. Second, given the variation in COVID-19 severity, it is possible that the free-text information available in EHRs is not homogeneous across patients seen in different points of care (i.e., primary-to-tertiary care). For instance, care providers could have been more likely to further explore (and report more often) milder symptoms in women, who in turn are more likely to be seen in primary care; on the contrary, the more severe symptoms reported in men may be related to the fact that they were more likely to be hospitalized or visit the ICU. Third, it is possible that women were more likely than men to report ENT symptoms.³⁵ Finally, as indicated in the methods section, our reported COVID-19 prevalence rates are probably lower than real, as some cases might be missed by the system due to heterogeneous reporting in EHRs. However, the observed low recall metrics in variables related to the identification of PCR-confirmed patients do not affect the quality of the descriptive results since our precision metrics for these concepts were optimal.

Implications for future research

The well-established gender bias in cardiovascular,²⁰ respiratory,^21
–23 and other diseases should be further investigated in COVID-19 patients. Despite recent regulations and partial improvements, the attention paid to sex and gender differences in biomedical and health research is far from optimal.³⁶ As pointed out in recent reviews, occupational gender segregation makes women particularly vulnerable to COVID-19 since two-thirds of the health and social care workforce worldwide are women.³⁷ Crucially, any gender bias in the use of diagnostic testing and imaging, as evidenced in our research from a country with universal, free-for-all health care, might be magnified in less privileged settings.

Conclusion

The biological, behavioral, social, and systemic factors underlying the differences in how women and men may experience COVID-19 and its consequences cannot be oversimplified.³⁸ Regrettably, most research studies are systematically failing to offer comparisons between women and men, girls and boys, and people with diverse gender identities.³⁹ Based on the results presented here, we conclude that women were more heavily impacted by COVID-19 than men (specifically teenagers and young adults). In addition, women presented different symptoms at disease onset, clinical outcomes, and treatment patterns. These results warrant further research to identify and close the gender gap in the diagnosis and treatment of COVID-19.

Footnotes

Acknowledgments

We thank all the Savaners for helping accelerate health science with their daily work. We also thank SESCAM (Healthcare Network in Castilla-La Mancha) for its participation in the study and for supporting the development of cutting-edge technology in real time.

Authors' Contributions

J.A., I.H.M., J.L.I., A.P., M.S., and J.B.S. had the original idea of the study and developed the concept protocol; A.P., S.L., S.M., I.S., and I.Z. developed the analytical plan and conducted the statistical analyses; A.P., J.A., J.L.I., J.B.S., Y.G., S.L., S.M., I.H.M., C.D.R.-B., and I.Z. interpreted the results; C.D.R.-B. and J.B.S. wrote and edited the article; C.D.R.-B. and I.Z. are responsible for figures and data visualization; all authors contributed to drafting and interpretation, and they approved the final version.

Author Disclosure Statement

Savana employees contributed to the design, data analysis, and writing of the present study. All authors declare that there are no other direct or indirect potential conflicts to disclose.

Funding Information

The BigCOVIData study was funded by Savana.

References

WHO coronavirus disease (COVID-19) pandemic. Available at: https://www.who.int/emergencies/diseases/novel-coronavirus-2019 Accessed July 9, 2020 .

Mahase

. Covid-19: Medical leaders call for rapid review to prepare for second wave. BMJ, 2020; 369:m2529.

Emergencies preparedness, response. Pneumonia of unknown cause—China. Available at: https://www.who.int/csr/don/05-january-2020-pneumonia-of-unkown-cause-china/en/ Accessed January 5, 2020 .

Harris

, Carson

, Baillie

, Horby

, Nair

. An evidence-based framework for priority clinical research questions for COVID-19. J Glob Health, 2020; 10:011001.

Mello

, Jagsi

. Standing up against gender bias and harassment—a matter of professional ethics. N Engl J Med, 2020; 382:1385–1387.

Gupta

, Oomman

, Grown

, et al. Gender equality, norms, and health steering committee. Gender equality and gender norms: Framing the opportunities for health. Lancet, 2019; 393:2550–2562.

Klein

, Dhakal

, Ursin

, Deshpande

, Sandberg

, Mauvais-Jarvis

. Biological sex impacts COVID-19 outcomes. PLoS Pathog, 2020; 16:e1008570.

Maleki Dana

, Sadoughi

, Hallajzadeh

, et al. An insight into the sex differences in COVID-19 patients: What are the possible causes?. Prehosp Disaster Med, 2020:1–4 [Epub ahead of print]; DOI: 10.1017/S1049023X20000837.

Gebhard

, Regitz-Zagrosek

, Neuhauser

, Morgan

, Klein

. Impact of sex and gender on COVID-19 outcomes in Europe. Version 2. Biol Sex Differ, 2020; 11:29.

10.

Grandi

, Facchinetti

, Bitzer

. The gendered impact of coronavirus disease (COVID-19): Do estrogens play a role?. Eur J Contracept Reprod Health Care, 2020; 25:233–234.

11.

Takahashi

, Wong

, Ellingson

, et al. Sex differences in immune responses to SARS-CoV-2 that underlie disease outcomes. medRxiv, 2020 [Epub ahead of print]; DOI: 10.1101/2020.06.06.20123414.

12.

Leung

, Yang

, Tam

, et al. ACE-2 expression in the small airway epithelia of smokers and COPD patients: Implications for COVID-19. Eur Respir J, 2020; 55:2000688.

13.

Gargaglioni

, Marques

. Let's talk about sex in the context of COVID-19. J Appl Physiol (1985), 2020; 128:1533–1538.

14.

Available at: www.bigcovidata.savanamed.com Accessed July 14, 2020 .

15.

Hernandez Medrano

ITG

, Belda

, Urena

, et al. Re-using electronic health records with artificial intelligence. Int J Interact Multimed Artif Intell, 2017; 4:8–12.

16.

von Elm

, Altman

, Egger

, Pocock

, Gøtzsche

, Vandenbroucke JP; STROBE

Initiative

. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol, 2008; 61:344–349.

17.

Luis Izquierdo

, Ancochea

, Savana COVID-19 Research Group, Soriano

. Clinical characteristics and prognostic factors for ICU admission of patients with COVID-19 using machine learning and natural language processing. J Med Internet Res, 2020; 22:e21801.

18.

Dijkstra

, Verdonk

, Lagro-Janssen

. Gender bias in medical textbooks: Examples from coronary heart disease, depression, alcohol abuse and pharmacology. Med Educ, 2008; 42:1021–1028.

19.

Gopalakrishnan

, Ragland

, Tak

. Gender differences in coronary artery disease: Review of diagnostic challenges and current treatment. Postgrad Med, 2009; 121:60–68.

20.

Bugiardini

, Estrada

, Nikus

, Hall

, Manfrini

. Gender bias in acute coronary syndromes. Curr Vasc Pharmacol, 2010; 8:276–284.

21.

Martinez

, Raparla

, Plauschinat

, et al. Gender differences in symptoms and care delivery for chronic obstructive pulmonary disease. J Womens Health (Larchmt), 2012; 21:1267–1274.

22.

Aryal

, Diaz-Guzman

, Mannino

. COPD and gender differences: An update. Transl Res, 2013; 162:208–218.

23.

Assayag

, Morisset

, Johannson

, Wells

, Walsh

SLF

. Patient gender bias on the diagnosis of idiopathic pulmonary fibrosis. Thorax, 2020; 75:407–412.

24.

Kane

, Guillaume

AWD

, Evans

, et al. Gender differences in CDC guideline compliance for STIs in emergency departments. West J Emerg Med, 2017; 18:390–397.

25.

Walter

, McGregor

. Sex- and gender-specific observations and implications for COVID-19. West J Emerg Med, 2020; 21:507–509.

26.

Pinho-Gomes

, Peters

, Thompson

, et al. Where are the women? Gender inequalities in COVID-19 research authorship. BMJ Glob Health, 2020; 5:e002922.

27.

Munch

. Gender-biased diagnosing of women's medical complaints:contributions of feminist thought, 1970–1995. Women Health, 2004; 40:101–121.

28.

Barsky

, Peekna

, Borus

. Somatic symptom reporting in women and men. J Gen Intern Med, 2001; 16:266–275.

29.

Wong

, Chow

, Leung

, et al. Pregnancy and perinatal outcomes of women with severe acute respiratory syndrome. Am J Obstet Gynecol, 2004; 191:292–297.

30.

Alfaraj

, Al-Tawfiq

, Memish

. Middle East respiratory syndrome coronavirus (MERS-CoV) infection during pregnancy: Report of two cases and review of the literature. J Microbiol Immunol Infect, 2019; 52:501–503.

31.

Al-Lami

, Urban

, Volpi

, Algburi

AMA

, Baillargeon

. Sex hormones and novel corona virus infectious disease (COVID-19). Mayo Clin Proc, 2020; 95:1710–1714.

32.

Buckley

, Cheng

JWM

, Desai

. Cardiovascular pharmacology in the time of COVID-19: A focus on angiotensin-converting enzyme 2. J Cardiovasc Pharmacol, 2020; 75(6):526–529.

33.

Langer

, Meleis

, Knaul

, et al. Women and health: The key for sustainable development. Lancet, 2015; 386:1165–1210.

34.

Altman

, Royston

. The hidden effect of time. Stat Med, 1988; 7:629–637.

35.

Enck

, Klosterhalfen

. Does sex/gender play a role in placebo and nocebo effects? Conflicting evidence from clinical trials and experimental studies. Front Neurosci, 2019; 13:160.

36.

Nieuwenhoven

, Klinge

. Scientific excellence in applying sex- and gender-sensitive methods in biomedical and health research. J Womens Health (Larchmt), 2010; 19:313–321.

37.

King

, Hewitt

, Crammond

, Sutherland

, Maheen

, Kavanagh

. Reordering gender systems: Can COVID-19 lead to improved gender equality and health?. Lancet, 2020; 396:80–81.

38.

Gausman

, Langer

. Sex and gender disparities in the COVID-19 pandemic. J Womens Health (Larchmt), 2020; 29:465–466.

39.

López-Alcalde

, Stallings

, Cabir Nunes

, et al. Consideration of sex and gender in Cochrane reviews of interventions for preventing healthcare-associated infections: A methodology study. BMC Health Serv Res, 2019; 19:169.

Evidence of Gender Differences in the Diagnosis and Management of Coronavirus Disease 2019 Patients: An Analysis of Electronic Health Records Using Natural Language Processing and Machine Learning

Abstract

Background:

Methods:

Results:

Conclusion:

Introduction

Methods

Study design, data source, and patient population

Extracting free text from EHRs: EHRead®

Internal validation

Data analyses

Results

Discussion

Strengths and limitations

Implications for future research

Conclusion

Footnotes

Acknowledgments

Authors' Contributions

Author Disclosure Statement

Funding Information

References

Extracting free text from EHRs: EHRead^®