Abstract
The correlation between CRISPR-Cas systems and plasmid-mediated bacterial antibiotic resistance is increasingly growing attention. However, currently no reports exist on the relationship between the CRISPR-Cas systems and the carriage of blaNDM or plasmids in E. coli. Here, molecular characterization and phylogenetic analysis of 639 E. coli isolated from humans in China were carried out. Depending on similarity in sequence, the type I-E CRISPR-Cas systems in E. coli can be grouped into two distinct clades, which we refer to for descriptive purposes within this study as the type I-E-S1 and I-E-S2, whereas the type I-E-S2 CRISPR-Cas system is further divided into I-E-S2a and I-E-S2b systems based on the presence of cas8e and cas11. ST167 (phylogroup A) and ST410 (phylogroup C) E. coli were observed bearing the type I-E-S1 and I-E-S2b systems, respectively. Compared with strains carrying the I-E-S1 type CRISPR-Cas system, the blaNDM carrying rate, the positive rate of IncX3 plasmid, and the positive rate of IncF plasmid of strains with the I-E-S2a type CRISPR-Cas system were evidently lower (p < 0.05); the blaNDM carrying rate and the positive rate of IncF plasmid of strains with the I-E-S2b type CRISPR-Cas system were evidently higher (p < 0.05). The blaNDM positive rate and IncF plasmid positive rate of strains carrying the I-E-S2a type CRISPR-Cas system were significantly lower than those of strains carrying the I-E-S2b type CRISPR-Cas system (p < 0.001). It proves that the I-E-S1, I-E-S2a, and I-E-S2b type CRISPR-Cas systems are beneficial for spreading blaNDM and IncX3 plasmids. We found significant differences in the cas gene sequences of the I-E-S1 and I-E-S2 type CRISPR loci. The type I-E CRISPR-Cas systems in E. coli isolated from Chinese sources are classified further for the first time, revealing their high correlation with blaNDM, phylogenetic groups, and multilocus sequence typing. This work paves the way for a deeper understanding of the role that CRISPR-Cas systems play in the rise of resistant E. coli ST167 and ST410.
Introduction
Escherichia coli, an opportunistic pathogen, can cause various infections. 1 Recently, the increasing isolation of carbapenem-resistant E. coli (CREC) in China and globally has raised major public health concerns considering the limited clinical treatment options. 2 Therefore, CREC resistance mechanism is of global concern. E. coli has seven phylogenetic groups: A, B1, B2, C, D, E, and F. 3 Based on previous reports, CREC are more common in phylogroups A and C, and carbapenem-sensitive E. coli (CSEC) are more common in phylogroup B2, which is significantly associated with extraintestinal infections in humans.4–7 Multilocus sequence typing (MLST) is a bacterial molecular typing technique based on nucleic acid sequencing. Its core principle is to amplify specific internal fragments (approximately 450–500 bp) of multiple (usually 7) housekeeping genes in the bacterial genome through PCR, determine their nucleotide sequences, and analyze the mutation sites to determine the allele combination (sequence type, ST) of the strain, achieving bacterial classification, identification, and genetic evolution research. 8 MLST has showed that the most common types of Chinese isolated CREC are ST167 and ST410, belonging to phylogenetic groups A and C, respectively.9–12 New Delhi metallo-β-lactamase (NDM) is a novel metal β-lactam enzyme (MBL) produced by bacteria, which can hydrolyze all β-lactam drugs except ampicillin, and mediate bacterial tolerance to penicillin, cephalosporins, and carbapenems. 13 Carbapenemase production, particularly NDM, is the primary mechanism contributing to carbapenem resistance in Chinese CREC strains. 14 In 2017, an analysis of CREC monitoring data from 25 hospitals in 14 provinces in China showed that > 70% of CREC strains carried blaNDM. 15 Previous studies have shown that CREC mainly relies on plasmid-mediated transmission to acquire or transfer resistant genes. 10 blaNDM was mainly carried by IncX3 plasmid, followed by IncF plasmid, in China and globally.10,16
Clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (Cas) systems provide sequence-directed adaptive immunity to foreign elements. 17 CRISPR-Cas systems cut foreign DNA in a programmable and sequence-specific manner that is detrimental to the long-term persistence of foreign genes (phages, plasmids, and mobile genetic elements) and is implicated in antibiotic resistance.18–22 Recently, increasing attention has been paid to the correlation between CRISPR-Cas systems and bacterial antibiotic resistance, especially the acquisition of plasmid-mediated carbapenem resistance genes and the immune defence against plasmids mediated by the CRISPR-Cas systems.23–25
The relationship between the CRISPR-Cas systems and carbapenem resistance has been demonstrated in Klebsiella pneumoniae. 21 The K. pneumoniae carbapenemase (KPC)-positive clinical isolates carry significantly less CRISPR-Cas systems than negative isolates (12/247 vs. 78/212, p < 0.0001); similar results have been obtained in the Genbank database (9/72 vs. 62/131, p < 0.0001). Subsequent molecular experiments have demonstrated that the CRISPR-Cas systems in K. pneumoniae impede blaKPC-harboring IncF plasmid conjugation. 21 It should be noted that the canonical classification of CRISPR-Cas systems is continually refined, with the most current and widely adopted standard being the 2025 update by Makarova et al. 26 The type I-E are the main CRISPR-Cas system subtypes in K. pneumoniae and E. coli.27,28 Spacers and nucleases encoded by cas genes are the most important in the immune function of the type I-E CRISPR-Cas system. The immune pathway can be divided into three stages: (1) Cas protein recognizes and captures short DNA fragments from the “prospacer” of foreign nucleic acids and integrates them into the host CRISPR array as spacers; (2) pre-CRISPR RNA (pre-crRNA) is produced by transcription of CRISPR arrays. This RNA is further cut and processed to produce CRISPR RNA (crRNA), which can bind to Cas3 protein proteins to form a surveillance complex; (3) the surveillance complex searches the cell based on the crRNA spacer for a target protospacer, that is, a sequence complementary to the spacer base, and triggers nucleic acid cleavage and subsequent degradation of the target through Cas3, immunizing against DNA infection. 28 Studies have found that K. pneumoniae strains lacking the CRISPR-Cas system enhance the strain’s ability to obtain the IncF plasmid carrying the blaKPC 29 ; however, no evidence is available for a similar link between plasmids carrying resistance genes and the CRISPR-Cas systems in E. coli.
The CRISPR-Cas system in E. coli has been partially characterized.29–31 Approximately 50% of clinical CSEC strains harbor type I-E CRISPR-Cas systems, which were found in multiple phylogenetic groups and ST strains. 4 A few studies have suggested that the CRECs carry significantly more CRISPR-Cas systems (29/35) than CSECs, suggesting that the CRISPR-Cas systems in CREC have lost such endogenous barriers to horizontal gene transfer.5,6 However, few reports exist on the structure and distribution characteristics of the CRISPR-Cas systems in E. coli, as well as the relationship between the CRISPR-Cas systems and blaNDM or IncX3/IncF plasmid carriage in E. coli.
Our aim was to explore the CRISPR-Cas systems in E. coli and the underlying correlation with blaNDM and the common resistant plasmids (IncX3 and IncF) in E. coli. Moreover, the molecular features were explored and phylogenetic analysis of isolated-human E. coli in China was carried out. The high correlation between CRISPR-Cas systems and all seven phylogroups as well as ST types was also explored. According to our findings, the CRISPR-Cas system in E. coli can be divided further into various subtypes, and strains with various subtype CRISPR-Cas system harbor unique ST distributions and distinct preferences for NDM and plasmid carrying.
Materials and Methods
Identification and WGS of clinical strain isolation
All clinically isolated E. coli strains of inpatients involved in this study were retrospectively collected from strains previously stored in the clinical diagnostic department of hospitals admitted to five grade-A hospitals in Guangdong Province, Zhejiang Province, and Beijing City between 2018 and 2021. All isolates were identified as E. coli using a VITEK 2 Compact system (BioMérieux, Marcy l’Etoile, France). The isolated year and specimen type of all E. coli were collected via electronic medical records. WGS was performed for all E. coli collected in the present study. All genome sequencing data for this work were deposited at the National Center for Biotechnology Information (NCBI) under BioProject accession no.
Search strategy of E. coli
Search the uploaded strains in the EnteroBase database using the keywords “Escherichia coli,” “China,” and “human,” download and analyze the whole genome sequencing (WGS) data of the strains and their basic information (including isolation location, specimen source and isolation time).
The inclusion criteria for strains were as follows: (1) involving Escherichia coli isolated from humans; (2) the strains were isolated between 2010 and 2021; (3) WGS data for the strains were available.
Exclusion criteria: (1) The isolation location is not from China; (2) WGS data of the strain are not available; (3) Duplicate strains. The collected E. coli were designated “L-” to distinguish them from the reported strains (e.g., L-A1).
Genome data analysis
The genome sequence of each strain was annotated using Prokka v1.14.6. 33 Mlst v2.22.1 was used to determine the ST of each strain. 34 ABRicate v1.0.1 was used to identify the resistance genes and plasmid replicons in the NCBI and PlasmidFinder databases. 35 The phylogroups were verified online at http://clermontyping.iame-research.center/. 36
Identification of CRISPR-Cas systems and CRISPR spacer target analysis
CRISPRCasFinder v4.3.1 was used to identify cas genes and CRISPR loci.37–39 Only CRISPR loci with an evidence level of 4 or ≥1 and associated with cas genes were considered credible and included in subsequent analyses. 6 Strains were considered to carry a CRISPR-Cas system when they detected at least five cas genes and one complete CRISPR locus. To detect the possible spacer targets, CRISPRTarget (https://github.com/davidchyou/CRISPRTarget) was used to predict the possible protospacers by searching the phage and plasmid databases. An expected value (E-value) ≤0.0001 was used to identify matches. Spacer sequences were considered similar when they were at least 90% identical.
Phylogenetic analysis of genome data, cas genes, and CRISPR loci
Phylogenetic analysis of all genome data was based on single nucleotide polymorphism analysis. E. coli strain YD786 (CP013112.1) was used as a reference. 28 All assembly sequencing reads were mapped to the GENBANK-formatted reference using Snippy v3.2 with default settings. 40 Gubbins v2.3.4 was used to identify and remove recombination regions in the single nucleotide polymorphism output, 41 and the phylogenetic tree was built and visualized using FastTree and iTOL.42–44
To construct phylogenetic trees of all the cas genes in strains carrying CRISPR-Cas systems, the same prokka-annotated amino acid files were used to extract the target genes. The sequence of CRISPR loci included in the analysis was extracted from the CRISPRCasFinder v4.3.1 output for phylogenetic construction to observe the association. All sequences of cas genes and CRISPR loci were aligned in the same transcriptional direction to avoid additional errors. Muscle v5 was used to generate multiple sequence alignments with maxiters of 2 for all sets. 45 IQtree v2.2.0 was used to find the best-fit substitution model and build a maximum-likelihood-based phylogenetic tree. 44 Based on the sequence similarity of 8 cas loci (cas1, cas2, cas3, cas5, cas6, cas7, cas8e, cas11), the type I-E CRISPR-Cas systems were classified as I-E-S1 and I-E-S2 subtypes, with over 90% similarity within the same subtype. The I-E-S2 subtype was further divided into I-E-S2a and I-E-S2b subclusters according to their compositional and structural features (lacking cas8e and cas11 genes).
Statistical analysis
Continuous variables are described as the mean. Student’s t-test was used to compare normally distributed continuous variables. Categorical variables are described as the number of isolates and percentages (%), and the chi-squared or Fisher’s exact tests were used. N is the total number of cases in the four tables, and T is the number of cases in each table in the four tables. (1) When N > 40 and all T > 5, use chi-square test. (2) When N > 40 and 1 ≤ T < 5, use the correction formula for chi-square test. (3) When N < 40 or T < 1, use Fisher’s exact test. P values were calculated for all associations. Variables with a two-sided p-value of < 0.05 were considered statistically significant. All statistical analyses were performed using IBM SPSS Statistics 26 (IBM Corp., Armonk, NY, USA).
Results
Strains, molecular characteristics, and the epidemiology characteristics of CRISPR-Cas systems in E. coli
Overall, 164 E. coli were isolated from admitted patients. In addition, as of July 20, 2023, the keywords “Escherichia coli,” “China,” and “human” were used to search for uploaded strains in the EnteroBase database. After screening through the inclusion and exclusion criteria, 475 strains of E. coli were randomly included in analysis (Supplementary Table S1). In the phylogenetic tree constructed based on the WGS data of all 639 E. coli, the distribution of strains from the same source was dispersed, indicating the diversity of E. coli in this study (Fig. 1). All E. coli were primarily isolated from stool, urine, blood, sputum, and other specimens (Table 1). Samples were divided into blaNDM-carrying E. coli (NDM-EC) and non-blaNDM-carrying E. coli (non-NDM-EC). Regarding phylogenetic groups, the most common phylogenetic group in NDM-EC is group A (42.91%, 124/289), followed by group C (23.88%, 69/289), group B1 (13.84%, 40/289). The most common phylogenetic group in non-NDM-EC is group A (31.34%, 105/335), followed by group B1 (22.99%, 77/335) and group B2 (18.21%, 61/335). Compared with non-NDM-EC, group C was more common in NDM-EC (23.88% vs. 5.67%, p < 0.001). Compared with NDM-EC, group B2 was more prevalent in non-NDM-EC (18.21% vs. 2.42%, p < 0.001) (Table 2).

Phylogenetic tree of all 639 E. coli. A circular phylogenomic tree of all strains (n = 639) was inferred using YD786 (
Characteristics of E. coli strains included in the present study
Data are number (n); percentage (%).
Strains collected and sequenced.
Strains collected from 25 studies in China or EnteroBase database.
Characteristics of NEM-EC and non-NDM-EC strains included in the present study
Data are number (n); percentage (%). Only the seven most common ST types (ST410, ST167, ST405, ST354, ST101, ST131, and ST617) among the strains included in this study are listed in the table.
For ST types, the common ST types in NDM-EC are ST410 (22.84%, 66/289), ST167 (21.15%, 64/289), and ST405 (4.15%, 12/289), while the common ST types in non-NDM-EC The ST types were ST131 (6.87%, 23/335), ST10 (5.97%, 20/335), ST38 (4.48%, 15/335), and ST410 (4.48%, 15/335) (Table 2).
The cas genes and the CRISPR loci detected in the 639 E. coli are presented in Supplementary Table S2.
In this study, when E. coli was detected to have at least five cas genes and a complete CRISPR locus, the strain was considered to carry the CRISPR-Cas system.
8, 6, 7, 5, and ≤3 cas genes were identified in 391, 82, 21, 1, and 144 strains, respectively, and 507 (79.34%) strains were identified to carry the CRISPR-Cas system. 488 strains (76.37%) carry only type I-E CRISPR-Cas systems. Additionally, 19 strains (2.97%) carry the type I-F CRISPR-Cas systems, among which 7 strains (1.10%) harbor both type I-E and I-F CRISPR-Cas systems. All these dual-system strains belong to group B1, with their MLST distributed among rare sequence types such as ST448 and ST2083. The remaining 12 strains carrying only type I-F CRISPR-Cas systems all belong to group B2, exhibiting ST95. Given that Type I-E CRISPR-Cas systems represent the predominant subtype, we next focus our analysis on this system. The structures of type I-E and I-F CRISPR-Cas systems detected in this study are shown in Supplementary Fig. S1A, B.
Epidemiology characteristics of type I-E CRISPR-Cas systems in diverse types of E. coli
Among the 639 strains of E. coli, there were differences in the carriage rates of type I-E CRISPR-Cas systems for strains of different phylogenetic groups (Table 3). For type I-E CRISPR-Cas systems, the carrying rates of group C (88/88) and group G (6/6) strains were 100.00%, group A strains were 81.5% (190/233), group B1 strains were 89.7% (105/117), group D strains were 97.2% (70/72), group E strains were 57.1% (4/7), and F group strains was 78.0% (32/41). However, none of the group B2 strains carry type I-E CRISPR-Cas systems.
Type I-E CRISPR-Cas systems, cas genes, and CRISPR loci in strains
Data are number (n); percentage (%). Only the seven most common ST types (ST410, ST167, ST405, ST354, ST101, ST131, and ST617) among the strains included in this study are listed in the table.
The carriage rate of type I-E CRISPR-Cas systems also differs between strains of different ST types. All ST410(81/81), ST167(71/71), ST405(20/20), ST354(15/15), and ST617(11/11) strains carry the type I-E CRISPR-Cas system, while none of the ST131(0/33) strains carry the type I-E CRISPR-Cas system. Among the 384 strains of other 148 ST types, 290 (75.52%) strains carried type I-E CRISPR-Cas systems.
Statistical analysis found that the carrying rate of blaNDM was significantly higher in strains carrying type I-E CRISPR-Cas systems than those not carrying type I-E CRISPR-Cas systems (52.3% vs. 22.0%, p < 0.0001).
Regarding the positive rate of IncX3 plasmid, strains carrying type I-E CRISPR-Cas system were significantly higher than those not carrying type I-E CRISPR-Cas system (36.4% vs. 16.0%, p < 0.001). The positive rate of IncF plasmid between strains carrying type I-E CRISPR-Cas system and strains not carrying type I-E CRISPR-Cas system was similar (83.2% vs. 88.6%, p > 0.05) (Table 3).
Type I-E CRISPR-Cas systems in E. coli were divided into two subtypes
A phylogenetic analysis of the cas genes and CRISPR loci sequences in E. coli carrying the type I-E CRISPR-Cas systems was performed. The phylogenetic tree suggests that the cas loci and CRISPR loci in the same phylogroups or ST strains are often clustered, indicating strong homology (Fig. 2–3). Notably, the cas loci are split into two distinct clades. The average similarity of all eight cas genes in strains belonging to the same clade was at least 93.0% (Supplementary Table S3), whereas no similarity was observed between strains of different clade (0%). To facilitate discussion within this study, we designate these clades as type I-E-subtype1 (I-E-S1) and type I-E-subtype2 (I-E-S2) (Fig. 2). Similarly, a phylogenetic tree based on each cas gene (cas1, cas2, cas3, cas5, cas6, cas7, cas8e, and cas11) suggests that cas genes of the same subtype CRISPR-Cas system are intensely clustered (Supplementary Fig. S2). The average similarity of all eight cas genes in same subtype systems was at least 93.0% (Supplementary Table S3), whereas no similarity was observed between the different subtype systems.

Phylogenetic tree of Cas genes of E. coli with five or more Cas genes. Rectangles were used to indicate the presence of Cas genes, and the green and yellow regions represent clade 1 and clade 2, respectively. The 19 colored bars represent phylogroups, the presence of eight Cas genes in I–E CRISPR-Cas systems, and the ten most common MLSTs (ST410, ST167, ST131, ST10, ST38, ST405, ST354, ST1193, ST48, and ST101) in this tree.

Phylogenetic tree of CRISPR loci in E. coli carried CRISPR loci. The ten most common MLSTs (ST410, ST167, ST131, ST10, ST38, ST405, ST354, ST1193, ST48, and ST101) are highlighted with differently colored regions. The colored circle represents the phylogroups of E. coli. The length of the simple purple bar around the tree represents the number of spacers.
Characteristics of two subtypes of type I-E CRISPR-Cas system in E. coli
Type I-E-S1 CRISPR-Cas systems
The type I-E-S1 systems were identified in 169/495 (34.1%) E. coli carrying the type I-E CRISPR-Cas systems. Except 9 strains, all other strains were carrying a complete set of eight cas genes, (Fig. 2, Supplementary Table S2). Table 3 presents the molecular characterization of E. coli with the type I-E-S1 systems. The majority (151/169,89.3%) of the E. coli strains with the type I-E-S1 CRISPR-Cas systems belonged to phylogroup A, while 13 (7.7%) belonged to phylogroups F. For the ST type, 71 (42.0%) strains belonged to ST167, while the other 98 strains were scattered among 35 different ST types. In addition, 77 (45.6%) strains carried IncX3 plasmid, 147 (87.0%) strains carried IncF plasmid, and 100 (59.2%) strains carried blaNDM (Table 3).
To confirm the correlation between the type I-E-S1 systems and blaNDM or plasmids, the carriage of blaNDM and IncX3/IncF plasmids in strains with and without the type I-E CRISPR-Cas systems was further analyzed (Table 4). Compared with the type I-E CRISPR-Cas-negative strains, I-E-S1 carriers exhibited significantly higher blaNDM carriage (59.2% vs. 22.0%, p < 0.001) and IncX3 plasmid prevalence (45.6% vs. 16.0%, p < 0.001). However, IncF plasmid rates were comparable between I-E-S1 and CRISPR-Cas-negative groups (87.0% vs. 88.6%, p > 0.05; Supplementary Table S4).
Carbapenemase-producing types and plasmid profile characteristics of strains carrying I-E-S1, I-E-S2a, and subtypes and strains not carrying the CRISPR-Cas system
Data are number (n); percentage (%). S1, Subtype1; S2, Subtype2; S2a, Subtype2a; S2b, Subtype2b.
The chi-squared or Fisher’s exact tests with two-tailed hypothesis were used to calculate the p value. N is the total number of cases in the four tables, and T is the number of cases in each table in the four tables. (1) When N > 40 and all T > 5, use chi-square test. (2) When N > 40 and 1 ≤ T < 5, use the correction formula for chi-square test. (3) When N < 40 or T < 1, use Fisher’s exact test.
Differences with p < 0.05 are considered significant and shown in bold for emphasis.
Type I-E-S2 CRISPR-Cas systems
A total of 326 (65.9%) E. coli carried the type I-E-S2 systems, of which 104 (32.0%), 88 (27.0%), 69 (21.2%), 39 (12.0%), 19 (5.8%), 5 (1.5%), and 2 (0.6%) strains belonged to phylogroups B1, C, D, A, F, and G, respectively (Table 3). The most common ST was ST410 (81, 24.8%), followed by ST405 (20, 6.1%), ST354 (15, 4.6%), ST617 (11, 3.4%) and ST101 (7, 2.1%); 192 strains (58.9%) belonged to 81 other ST types.
Type I-E-S2 systems could be further categorized into two subtypes based on the structural features of cas loci. Among the 326 strains, 245 strains carried all 8 cas genes, and 81 strains carried only 6 cas genes (lacking cas8e and cas11 genes). Therefore, the I-E-S2 type CRISPR-Cas system is subdivided into I-E-Subtype2a (I-E-S2a, carrying 8 cas genes) and I-E-Subtype2b (I-E-S2b, carrying 6 cas genes) based on the deletion of cas8e and cas11 genes (Fig. 2, Supplementary Table S2).
Phylogroups and ST distributions differed between strains harboring the type I-E-S2a and I-E-S2b systems. The 245 strains carrying the type I-E-S2a systems were dispersed across six phylogroups, with 104 (42.4%), 69 (28.2%), 39 (15.9%), 19 (7.8%), 7 (2.9%), 5 (2.0%) and 2 strains (0.8%) belonging to phylogroups B1, D, A, F, C, G and E, respectively. However, all 81 strains carrying the type I-E-S2b systems belonged to phylogroup C.
Similarly, ST of strains carrying the type I-E-S2a systems including ST405 (20 strains, 8.2%), ST354 (15 strains, 6.1%), ST617 (11 strains, 4.5%), and ST101 (7 strains, 2.9%), with another 173 strains (70.6%) belonging to the other 80 ST; whereas all strains carrying the type I-E-S2b systems belonged to ST410 (100%; Table 3).
The carriage rates of blaNDM by strains carrying I-E-S2a and I-E-S2b CRISPR-Cas systems were 38.0% (93/245) and 81.5% (66/81), respectively. Regarding the IncX3 plasmid carrying rate, it was 28.6% (70/245) and 40.7% (33/81) in strains carrying I-E-S2a and I-E-S2b type CRISPR-Cas systems, respectively. IncF plasmids were detected in 75.5% (185/245) and 98.8% (80/81) of strains carrying I-E-S2a and I-E-S2b CRISPR-Cas systems, respectively.
Statistical validation of subtype-specific associations revealed stark contrasts in blaNDM carriage (Table 4). Strains harboring the I-E-S2b subtype exhibited an 81.5% blaNDM prevalence, significantly surpassing the I-E-S2a subtype (38.0%, p < 0.001). The I-E-S2a subtype showed moderately elevated blaNDM carriage compared to CRISPR-Cas-negative strains (38.0% vs. 22.0%, p < 0.05). And the I-E-S2b subtype demonstrated the highest blaNDM prevalence, exceeding CRISPR-Cas-negative strains by nearly 4-fold (81.5% vs. 22.0%, p < 0.001).
For IncX3 plasmids, prevalence was comparable between I-E-S2a versus I-E-S2b subtypes (28.6% vs. 40.7%, p > 0.05). Both I-E-S2a (28.6% vs. 16.0%, p < 0.001) and I-E-S2b (40.7% vs. 16.0%, p < 0.001) exhibited significantly higher IncX3 carriage than CRISPR-Cas-negative strains, with I-E-S2b showing nearly 2-fold enrichment (Table 4). For IncF plasmids, the I-E-S2b subtype demonstrated near-universal prevalence (98.8%), significantly surpassing both the I-E-S2a subtype (75.5%, p < 0.001) and CRISPR-Cas-negative strains (88.6%, p < 0.05) (Table 4).
Comparative analysis of E. coli with type I-E-S1 and I-E-S2 CRISPR-Cas systems
Among the 639 strains of E. coli, for phylogenetic groups, 64.8% (151/233) of group A and 31.7% (13/41) of group F strains carry strains of the I-E-S1 type CRISPR-Cas system, while all strains of group C, 95.8% (69/72) of group D, 88.9% (104/117) of group B1, 83.3% (5/6) of group G and 46.3% (19/41) of group F carry Type I-E-S2 CRISPR-Cas system.
Similarly, for the ST type, all ST167 strains carry the I-E-S1 type CRISPR-Cas system, and all ST405, ST354, ST617, and 58.3% (7/12) of the ST101 strains carry the I-E-S2a type CRISPR-Cas system. All ST410 strains carry the I-E-S2b type CRISPR-Cas system.
The differences in blaNDM and IncX3/IncF plasmid carrying rates between strains carrying I-E-S1 and I-E-S2 type systems were further analyzed. Comparative analysis revealed divergent blaNDM carriage patterns between CRISPR subtypes (Table 4): blaNDM prevalence was significantly higher in I-E-S1 strains than I-E-S2a strains (59.2% vs. 38.0%, p < 0.05). Conversely, I-E-S1 strains showed lower blaNDM carriage compared to I-E-S2b (59.2% vs. 81.5%, p < 0.05, Table 4). For IncX3 plasmids, prevalence was significantly higher in I-E-S1 strains than I-E-S2a strains (45.6% vs. 28.6%, p < 0.05), but no significant difference was observed between I-E-S1 vs. I-E-S2b strains (45.6% vs. 40.7%, p > 0.05). For IncF plasmids, the I-E-S2a subtype exhibited significantly lower plasmid carriage compared with I-E-S1 (75.5% vs. 87.0%, p < 0.05), and the I-E-S2b subtype demonstrated near-universal prevalence (98.8% vs. 87.0%, p < 0.05).
Spacers in strains with diverse phylogroups, ST, and CRISPR-Cas systems
The amount of spacers in the CRISPR loci of strains with the same ST type were similar, but this similarity was not significant within the same phylogroups. Additionally, the CRISPR loci in the same type CRISPR-Cas systems clustered (Fig. 3). Spacers in phylogroups A and C strains were fewer than in other phylogroups; notably, no spacers in phylogroup C strains were identified as homologous to plasmids or phages (Table 5; Supplementary Fig. S3). Among the common ST types, spacer amounts in ST410, ST167, ST405, and ST617 were similar and less than those of ST354 and ST101 (16.75, 15.96, 15.80, and 15.36 vs. 20.33, 25.58, respectively) (Supplementary Fig. S3); no CRISPR loci were detected in the ST131 strains. Among the 639 strains of E. coli, spacer sequences of 19 isolates (3.0%) demonstrated > 90% similarity to genomic regions of IncF plasmids, whereas no significant matches were detected between the spacer sequences and IncX3-type plasmids across all strains. These results indicate that the 19 strains displayed dispersed molecular characteristics, with none belonging to predominant clonal groups. (Supplementary Table S6). Notably, none of the spacers in the ST167 and ST410 strains were detected as homologous to plasmids or phages, while 2.0% to 15.0% of the spacers of the other four ST types were detected as homologous to plasmids and phages, respectively (Table 5). Furthermore, among strains harboring the type I-E CRISPR-Cas systems, ST410 isolates exhibit a markedly reduced number of spacers and significantly lower sequence homology to plasmids, phages, and E. coli chromosomes compared to strains of other MLST lineages (p < 0.0001; Supplementary Table S5).
Spacers of the type I-E CRISPR-Cas systems in E. coli carried CRISPR loci
Data are number (n); percentage (%).
Total number of spacers for all isolates in each cluster.
Spacers were matched to CRISPR Target databases. 46
Spacers in the type I-E-S1 and I-E-S2 CRISPR-Cas systems were further compared (Table 5). The I-E-S1 systems had an average of 19.5 spacers, with 6.7% and 4.9% homologous to plasmids and phages, respectively. The type I-E-S2a and I-E-S2b systems had on average 22.36 and 16.75 spacers, respectively. Among them, 13.2% and 4.2% spacers in the type I-E-S2a systems were homologous to plasmids and phages and no homologs of the spacers in type I-E-S2b systems were identified.
Furthermore, comparative analysis was conducted on the spacer differences between I-E-S1 and I-E-S2 type CRISPR-Cas systems. For the average number of spacers, the strains with type I-E-S1 CRISPR-Cas system were significantly lower than those with type I-E-S2a CRISPR-Cas system (19.50 vs. 22.36, p < 0.05). The average number of spacers of strains with type I-E-S1 CRISPR-Cas system was significantly higher than that of strains with type I-E-S2b CRISPR-Cas system (19.50 vs. 16.75, p < 0.001). Strains with type I-E-S2a CRISPR-Cas system carry more spacers than strains with type I-E-S2b CRISPR-Cas system (22.36 vs. 16.75, p < 0.001) (Table 5).
Regarding the number of spacers that are homologous to the plasmid, there is no significant difference between strains with type I-E-S1 CRISPR-Cas system and strains with type I-E-S2a CRISPR-Cas system (6.7% vs. 13.2%, p > 0.05). Strains with type I-E-S1 CRISPR-Cas system matched more spacers homologous to plasmids than strains with type I-E-S2b CRISPR-Cas system (6.7% vs. 0.0%, p < 0.05). Notably, none of the spacers in the ST167 and ST410 strains showed homology to plasmids, phages or E.coli chromosomes, while for the other four ST types, 2.0%–15.0% of the spacers were detected as homologous to plasmids, phages, or E. coli chromosomes, respectively (Table 5).
Discussion
The CRISPR-Cas systems defend bacteria against mobile genetic elements, including plasmids and bacteriophages. This study explored the molecular characterization and phylogenetic analysis of the CRISPR-Cas system in human E. coli isolated from China. The results showed that 77.46% (495/639) of E. coli carried the I-E type CRISPR-Cas system, indicating that the I-E type CRISPR-Cas system dominates E. coli, which is consistent with previous studies.4,5 In contrast, strains harboring type I-F CRISPR-Cas systems are rarely observed and sporadically distributed among rare MLST types, suggesting they have a limited contribution to the transmission of CREC resistance.
Furthermore, results suggest that compared with non-NDM-EC, NDM-EC carried more type I-E CRISPR-Cas systems (89.6% vs. 68.1%, p < 0.001), which differs from the fact that KPC-negative strains in K. pneumoniae carry more CRISPR-Cas systems than KPC-positive strains. 29 The following two factors may have jointly caused the phenomenon. First, we found that the carriage rate of type I-E CRISPR-Cas systems in Escherichia coli is related to ST types and phylogenetic groups. Among them, all the common C group strains in NDM-EC carry type I-E CRISPR-Cas systems (100%); on the contrary, none of the common B2 group strains in non-NDM-EC carry type I-E CRISPR-Cas systems. Concurrently, we observed that the most prevalent non-NDM-EC strains belong to ST131, and none of these strains harbor the Type I-E CRISPR-Cas system. Notably, ST131 strains represent the predominant clone within B2 group. Previous global studies have consistently demonstrated that ST131 strains isolated from diverse geographical regions lack any CRISPR-Cas systems.29,47,48 This finding further supports the strong correlation between the presence of CRISPR-Cas systems and the MLST profiles as well as the phylogroup background of bacterial strains. Secondly, this study also found that Escherichia coli carrying type I-E CRISPR-Cas system may be beneficial to the existence of blaNDM to a certain extent. But further studies should be conducted with more strain number. We speculate that it may be because blaNDM is usually located on the IncX3 plasmid carrying a histone-like nuclear structural protein, which can effectively inhibit CRISPR-Cas system activity.
Our analysis revealed two major variants within type I-E systems, labeled I-E-S1 and I-E-S2 for clarity in this report. The type I-E-S2 systems are further subdivided into I-E-S2a and I-E-S2b systems based on the presence of cas8e and cas11. Previous studies have also divided the type I-E CRISPR-Cas systems in K. pneumoniae into types I-E and I-E* systems based on the arrangements and sequence similarity of cas genes. The number of plasmids and resistance genes in strains with the type I-E* CRISPR-Cas systems were significantly lower than in strains with the type I-E systems, suggesting that the different subtypes of CRISPR-Cas systems in K. pneumoniae exert different immunological activities.49,50 Given the fixed arrangement of the cas genes in the type I-E cas loci in E. coli, the type I-E CRISPR-Cas systems in E. coli were categorized further into the type I-E-S1 and type I-E-S2 subtypes based on differences in the cas gene type. It is crucial to contextualize these findings within the broader classification framework. Recent efforts have standardized the naming of CRISPR-Cas systems based on evolutionary and mechanistic criteria. 26 The S1/S2 subtypes in this study highlight significant sequence divergence and gene content differences within the type I-E family in clinical E. coli. While these groups may represent functionally distinct lineages, their formal relationship to established subtypes requires further phylogenetic analysis across a broader bacterial dataset. Notably, strains harboring different subtype systems had distinct ST and phylogroup distributions. In NDM-EC, except for strains from group A, strains from all other phylogenetic groups mainly carried the I-E-S2 type CRISPR-Cas system. Among the seven common ST-type strains, only the ST167 strain carries the I-E-S1 type CRISPR-Cas system. Considering that ST167 is the main clonotype belonging to group A strains, the high positivity rate of the I-E-S1 type CRISPR-Cas system of group A strains can be attributed to ST167. Other common ST-type strains (such as ST405, ST354, ST101 and ST617) all carry the I-E-S2a type CRISPR-Cas system, except for the ST410 strain, because all ST410 strains carry the I-E-S2b type CRISPR-Cas system.
We also observed that the type I-E-S1, I-E-S2a, and I-E-S2b CRISPR-Cas systems may have different activities in E. coli. The cas gene sequences in the cas loci of type I-E-S1 and I-E-S2 differed significantly. Previous studies have suggested that differences in sequences of cas genes contribute to functional differentiation of the CRISPR-Cas systems.49,50 Compared with the type I-E-S2a systems, the type S2b systems lack cas8e and cas11. The Cas8e protein recognizes its target sequence and recruits the Cas3 protein to cleave the target sequence. Cas11 stabilizes non-target DNA strands and supports CRISPR-Cas system function. 28 The loss of cas8e and cas11 compromises the immune function of the type I-E-S2b CRISPR-Cas systems, thereby allowing the accumulation and persistence of horizontally acquired genes in the bacterium. Here, the differences in the activities of the three types of CRISPR-Cas systems are primarily reflected in two aspects: (1) differences in the preferences of strains with different CRISPR-Cas systems for carrying blaNDM and plasmids, and (2) differences in the spacers and homology of spacers with plasmids and phages. The number of spacers and the protospacers inferred based on sequence homology can, to some extent, reflect the activity of the CRISPR-Cas system. 6
Here, compared with strains without the type I-E systems, positivity rates of blaNDM and plasmids in strains showed that the type I-E-S1 systems favored the presence of the blaNDM and IncX3 plasmids, the type I-E-S2b systems favored the presence of the blaNDM and IncF plasmids, and no significant differences were observed in strains with the type I-E-S2a systems. Similarly, when the three subtypes were compared, the strains carrying the type I-E-S1 systems were more likely to carry blaNDM than strains carrying the type I-E-S2a systems; strains carrying the type I-E-S1 systems were the most likely to carry the IncX3 plasmids; and strains carrying the type I-E-S2b systems were the most likely to carry the IncF plasmids. The findings suggest that the presence of the type I-E-S1 and I-E-S2b CRISPR-Cas systems is more favorable for the spread of resistance than I-E-S2a system type. The intrinsic mechanisms of the different preferences remain to be investigated. Fewer spacers were observed in the type I-E-S1 and I-E-S2b systems than in the I-E-S2a systems. Additionally, the homology to plasmids and phages detected in the spacers of the type I-E-S1 systems was significantly lower than that of the type I-E-S2a systems, leading us to speculate that the I-E-S2a systems may have stronger immune activity. Furthermore, no homology to plasmids or phages was observed in the spacers of the type I-E-S2b systems, suggesting that the I-E-S2a systems may possess the strongest immune activity, while the I-E-S2b systems may exhibit none due to structural differences. It is noteworthy that although statistical differences were observed, these results should be interpreted with caution due to the limited sample size of CRISPR-Cas-negative strains. Consequently, the observed associations require validation in larger cohorts.
In contrast to earlier studies, our analysis provides a contemporary and clinically grounded perspective on the role of CRISPR-Cas systems in facilitating antimicrobial resistance spread in E. coli, with specific emphasis on the clonal expansion of ST167/ST410, the dissemination of IncX3/IncF plasmids, and the carriage of blaNDM.
We hypothesized that different subtypes of the CRISPR-Cas systems in the ST167/ST410 strains may partially reflect their emergence as dominant clonotypes. First, spacers in ST167 and ST410 strains were significantly different from other ST strains. Compared with other STs, ST167 and ST410 strains possess fewer spacers, and their spacers exhibit no detectable homology to plasmids, phages, or E. coli chromosomes. This observation suggests that the type I-E CRISPR-Cas systems in ST410 and ST167 strains may have remained non-functional in blocking the invasion of foreign genetic elements since their initial chromosomal integration, thereby resulting in stagnant spacer acquisition. Furthermore, the CRISPR systems carried by ST167 and ST410 strains appear to coexist more readily with the dissemination of antibiotic resistance. Specifically, ST167 strains harbor the I-E-S1 system, which is conducive to the presence of blaNDM and its common vector, the IncX3 plasmid. In contrast, ST410 strains carry the I-E-S2b system, which is more permissive for the persistence of blaNDM and its prevalent vehicle, the IncF plasmid, compared with other strains carrying the I-E-S2a system. Future studies are required to confirm the role of the CRISPR-Cas systems in the prevalence of ST167 and ST410, particularly the study of CRISPR-Cas systems for plasmid immunization in E. coli.
Conclusion
Our study explored the molecular characterization of type I-E CRISPR-Cas systems in E. coli isolated in China and divided them into two subtypes. The correlation between the type I-E CRISPR-Cas systems and blaNDM in E. coli was elucidated, and there was a high correlation with phylogroups and ST. Furthermore, the I-E-S1 and I-E-S2b systems appear more prone to coexist with the dissemination of blaNDM and the IncX3/IncF plasmids. Our findings could facilitate further research on the role of the type I-E CRISPR-Cas system in promoting ST167 and ST410 as the dominant clonotype of E. coli.
Ethics Approval And Consent To Participate
This study was approved by the medical ethics committee of the First Affiliated Hospital of Guangzhou Medical University (GMU).
Availability of Data and Materials
All the data used in this study are publicly accessible in the EnteroBase database.
Authors’ Contributions
Conceptualization: X.Y. and Y.G. Data curation: C.Z., N.H., and Y.L. Visualization: C.Z., J.C., L.Y., Y.W., and J.W. Project administration: S.X. Writing—original draft: Y.G. and J.L. Writing—review and editing: Y.G., J.L., and A.H.
Footnotes
Author Disclosure Statement
The authors declare no competing interest.
Funding Information
This study was financially supported by the National Natural Science Foundation of China (Project No. 8217082227 and Project No. 81861138056).
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
