Abstract
Cytomegalovirus (CMV) has long been thought to have an association with glioblastoma multiforme (GBM), although the exact role of CMV and any subsequent implications for treatment have yet to be fully understood. This study addressed whether IGH complementarity determining region-3 (CDR3)-CMV protein chemical complementarity, with IGH CDR3s representing both tumor resident and blood-sourced IGH recombinations, was associated with overall survival (OS) distinctions. IGH recombination sequencing reads were obtained from (a) the Clinical Proteomic Tumor Analysis Consortium, tumor RNAseq files; and (b) the cancer genome atlas, blood exome-derived files. The Adaptive Match web tool was used to calculate chemical complementarity scores (CSs) based on hydrophobic interactions, and those scores were used to group GBM cases and assess survival probabilities. We found a higher OS probability for cases whose hydrophobic IGH CDR3-CMV protein chemical complementarity scores (Hydro CSs) were in the upper 50th percentile for several CMV proteins, including UL99 and UL123, as well as for CSs based on known B cell epitopes representing these proteins. We also identified multiple immune signature genes, including CD79A and TNFRSF17, for which higher RNA expression was associated with higher Hydro CSs. Results were consistent with the idea that stronger immunoglobulin responses to CMV are associated with better OS probabilities for GBM.
Introduction
Glioblastoma multiforme (GBM) is the most common and aggressive form of primary adult brain malignancy (Yang et al., 2022). Cytomegalovirus (CMV), a double-stranded DNA virus in the human herpes virus family with the ability to establish latent infection, is suspected to play a role in GBM pathogenesis as either a potential causative factor or comorbidity (Yang et al., 2022). Potential roles for CMV include effects on angiogenesis, effects on immune pathways, and enhancing the expression of tumor stem cell factors (Rahman et al., 2019). Studies have sought to determine whether CMV infections in the GBM setting are associated with overall survival (OS), although results do not yet permit conclusions regarding a precise CMV-OS connection. One study found that low-grade CMV infection is strongly associated with long-term survival, with 40% of GBM patients whose tumors contained less than 25% CMV-infected cells surviving greater than 18 months, as opposed to only 8% of patients with greater than 25% CMV-infected cells (Rahbar et al., 2012). Another study found that CMV immunoglobulin G (IgG) seropositivity was significantly associated with worse OS in GBM patients, suggesting that prior systemic CMV infection may negatively affect patient outcomes (Foster et al., 2019).
GBM is a difficult cancer to treat and is resistant to many therapies (Dapash et al., 2021). The standard of care since 2005 has been the Stupp Protocol, which consists of surgical resection, radiation therapy, and concomitant and adjuvant temozolomide chemotherapy (Stupp et al., 2005). Immunotherapies face challenges to success because of high levels of regulatory B and T cells, as well as immunosuppressive myeloid cells (Dapash et al., 2021). However, immunotherapies and alternative treatment methods are currently under investigation or are being evaluated in ongoing clinical trials. These include polyclonal CMV-specific T cells, immune checkpoint blockades, CAR T cell therapy, oncolytic virotherapy, vaccines, and the utilization of ultrasound to improve drug delivery across the blood–brain barrier (Ghazi et al., 2012; Rong et al., 2022). Limitations, including immunosuppression, remain a challenge for many of these therapies, but several have shown clinical promise (Rong et al., 2022; Weathers et al., 2020).
In this study, we used an IGH CDR3-CMV protein chemical complementarity scoring algorithm as an approach to further understand the relationship between glioblastoma, CMV, Igs, and patient outcomes. The chemical complementarity analyses were based on the CDR3 amino acid (AA) sequences indicated by IGH recombination reads extracted from two datasets: the Clinical Proteomic Tumor Analysis Consortium (CPTAC)-GBM, tumor-derived RNAseq files, and the cancer genome atlas (TCGA)-GBM blood exome files.
Methods
Isolation of the IGH recombination reads from the CPTAC and TCGA datasets. The process of mining adaptive immune receptor (IR) recombination reads, from genomics files representing tissues, has been extensively described and benchmarked (Gill et al., 2016; Tong et al., 2017; Chobrutskiy et al., 2018; Patel et al., 2021). The IGH recombination reads were obtained from genomics files, in turn obtained from the genomic data commons (GDC) (Jensen et al., 2017). The data representing the IGH recombination reads used in this report are included in the supporting online material (Supplementary Tables S1, S2). The CPTAC RNAseq files (phs001287) were accessed via the National Institutes of Health (NIH) database of genotypes and phenotypes (dbGaP) under approved protocol number 31752. The TCGA-GBM blood-exome (WXS, whole exome sequence) files (phs000178) were accessed according to NIH dbGaP approved protocol number 6300.
Use of the Adaptive Match web tool. The Adaptive Match web tool (adaptivematch.com) uses a chemical complementarity scoring algorithm, based on refs (Chobrutskiy et al., 2021; Eakins et al., 2022; Pakasticali et al., 2023), to calculate chemical complementarity scores (CSs) for each amino acid (AA) sequence alignment of CDR3-candidate antigen pairings, and returns the highest score for each pairing. Chemical CSs can be based on hydrophobic or electrostatic interactions; however, only Hydro CSs were used in this report. In addition to outputting the raw CSs, the Adaptive Match web tool will output survival distinctions based on the maximal value of those CSs, for a given case, along with related statistical analyses information, such as Cox regression and Kaplan–Meier (KM) logrank p-values. Any survival distinctions based on KM analyses addressed in this report were verified by processing the raw CSs using Microsoft Excel functions followed by the application of two additional survival analyses software packages/approaches as indicated later in this Methods section. In addition, the Adaptive Match web tool accepts RNAseq values as input and will output the correlations of the RNAseq values with the maximal CS, for a given candidate antigen, for each of the case IDs in the set being assessed, that is, being inputted into adaptivematch.com. (See Supplementary Tables S3, S4, S5, S6, S7, S8, S9 for example adaptivematch.com input and output files. Note: adaptivematch.com input Supplementary Tables S3, S4, S5, S6 must be in csv format and maintain their current internal structure for use at adaptivematch.com. Instructions on file formats for input files can be found at adaptivematch.com.
CMV epitopes. CMV protein epitopes were obtained from iedb.org (Immune Epitope Database, Chronister et al., 2022), filtering for at least one positive B cell assay, linear peptides, and a specified CMV protein (Results).
Scatterplots. Microsoft Excel was used to generate scatterplots based on the gene expression (RNAseq RSEM (RNA-Seq by Expectation Maximization) values) data from Supplementary Table S6, obtained from cbioportal.org. These plots were generated based on the adaptivematch.com output, linking the maximal CSs to the RNAseq values on a case-by-case basis (Supplementary Table S9).
Survival analyses. Input data for survival analysis for TCGA-GBM were obtained from cbioportal.org (Firehose legacy) (Gao et al., 2013; Cerami et al., 2012). Input data for CPTAC-GBM were obtained from the GDC (Jensen et al., 2017). KM plots were generated using IBM Statistical Product and Service Solutions (formerly Statistical Package for the Social Sciences version 28). See Supplementary Tables S10, S11, S12, S13, S14, S15, S16, S17, S18, S19 for input and output data.
Results
Survival probability distinctions represented by Hydro CSs for IGH CDR3 AA sequences and CMV proteins. To determine whether chemical complementarity between IGH CDR3 AA sequences and CMV proteins was associated with OS distinctions, we assessed CSs for the GBM IGH CDR3 AA sequences and the AA sequences of multiple CMV proteins previously identified as antibody antigens (Greijer et al., 1999; Wang et al., 2006). Case IDs representing the upper and lower 50th percentile groups for Hydro CSs were then assessed via a KM analysis. We thus identified several CMV proteins for which the IGH CDR3-CMV protein Hydro CS, 50th percentile groups represented distinct OS probabilities, for both the CPTAC tumor, RNAseq-based IGH CDR3s and the TCGA blood WXS-derived IGH CDR3s (Methods): UL99 and UL123 (Fig. 1); UL38 was similarly identified (data not shown). In all cases, the upper 50th percentile of the CSs was associated with a better OS probability.

Kaplan–Meier (KM) analyses comparing overall survival (OS) probabilities for case IDs representing GBM and the upper and lower 50th percentiles for Hydro CSs for IGH CDR3s and the indicated CMV proteins.
Survival probability distinctions represented by Hydro CSs of IGH CDR3 AA sequences and CMV epitopes, as fragments of full-length proteins. To determine whether chemical complementarity between IGH CDR3 AA sequences and previously identified epitopes representing the previously indicated CMV proteins was associated with OS probability distinctions, we assessed CSs for the IGH CDR3 AA sequences and the AA sequence epitopes of CMV proteins, UL99 and UL123 (iedb.org) (Chronister et al., 2022) (Supplementary Table S20). Case IDs representing the upper and lower 50th percentiles of Hydro CSs were assessed by KM analyses. We thus identified CMV epitopes that represented a significant survival probability distinction, when the Hydro CSs were based on both the CPTAC-GBM, tumor-resident, RNAseq file-sourced CDR3s and the TCGA-GBM blood WXS-sourced CDR3s: IEDB-6203 (Table 1, UL99 epitope; CPTAC, Fig. 2A; TCGA, Fig. 2B) and IEDB-51541 (Table 1, UL123 epitope; CPTAC, Fig. 2C; TCGA, Fig. 2D). We also assessed previously identified epitopes representing UL38 but found no statistically significant survival probability distinctions. We thus randomly fragmented UL38 and assessed those fragments for OS probability distinctions. We identified a UL38 fragment, referred to in this report as UL38-3 (Table 1), for which the IGH CDR3-UL38-3 Hydro CSs represented a statistically significant distinction with regard to OS probabilities, for the CSs calculated using the IGH CDR3s from both the CPTAC RNAseq (Fig. 2E) and TCGA exome (Fig. 2F) datasets.

KM analyses comparing OS probabilities for case IDs representing GBM and the upper and lower 50th percentiles for Hydro CSs for the IGH CDR3 AA sequences and CMV protein epitopes.
AA Sequences of CMV Protein Epitopes and Fragments
AA, amino acid; CMV, cytomegalovirus.
Correlation between CPTAC RNA expression of immune signature genes and Hydro CSs for the IGH CDR3 AA sequences and CMV epitopes. To determine whether the expression of immune signature genes was associated with hydrophobic complementarity scoring, we analyzed the correlation between IGH CDR3-CMV protein epitope Hydro CSs and the tumor RNA expression of multiple immune signature and apoptosis-related genes (Table 2). We found that CPTAC-GBM case IDs with greater tumor RNA expression of multiple immune signature genes, notably CD79A (Fig. 3), TNFRSF17 (Fig. 4), CTLA4, and CASP8, were associated with higher Hydro CSs for the CDR3s and the indicated epitopes (Table 2).

Correlation between tumor RNA expression of CD79A and Hydro CSs. Each point represents a case from the CPTAC-GBM database. Scatterplot with linear regression line correlating RNA expression of the following:

Correlation between tumor RNA expression of TNFRSF17 and Hydro CSs. Each point represents a case from the CPTAC-GBM database. Scatterplot with linear regression line correlating RNA expression of the following:
Correlation Between CPTAC RNA Expression of Immune Signature Genes and Hydro CSs Calculated from the IGH CDR3 AA Sequences and CMV Epitope IEDB-6203, Epitope IEDB-51541, and Fragment UL38-3, Respectively
CDR3, complementarity-determining region 3; CPTAC, Clinical Proteomic Tumor Analysis Consortium; CSs, complementarity scores.
Discussion
We identified several CMV proteins, including UL99, UL123, and UL38, as well as epitopes or fragments of these proteins, where hydrophobic chemical complementarity with IGH CDR3 AA sequences was associated with a statistically significant increase in OS probability across two separate datasets: CPTAC-GBM (tumor-derived RNAseq-based) and TCGA-GBM (blood-derived WXS-based). Specifically, this process represented only CMV proteins with an epitope or sub-fragment that were both consistent with the use of IGH CDR3s obtained from both the CPTAC and TCGA datasets. Several other CMV proteins, including UL42 and US18, were also representative of a higher OS probability with a higher Hydro CS using the IGH CDR3s obtained from both the CPTAC and the TCGA datasets (data not shown). However, we were not able to identify sub-fragments of these proteins that also represented survival distinctions. Finally, numerous other CMV proteins were statistically significant with regard to higher CSs and better OS probabilities using the IGH CDR3s from either the CPTAC-GBM RNAseq dataset or the TCGA-GBM WXS dataset, but in these cases, the results were not consistent across the two datasets.
We also identified several immune signature genes, including CD79A, TNFRSF17, CTLA4, and CASP8, for which higher tumor expression was associated with higher IGH-based Hydro CSs using the gene expression values and the IGH-based CSs from the CPTAC-GBM RNAseq dataset.
These results suggest that stronger immune responses to CMV may correlate with better survival outcomes, which is consistent with a previous study that showed that lower-grade CMV infections, potentially indicating a better immune response to CMV, were associated with better long-term GBM survival (Rahbar et al., 2012). The mechanism behind these better survival outcomes is unknown, but our results suggest one component may be an improved IGH response after priming by a more successful response to an original CMV infection. Alternatively, improved survival could be related to reduced immunosuppression. Another study found that some individuals with CMV protein-positive tumors do not have detectable serum CMV IgG antibodies (Rahbar et al., 2015). They speculated that this discordance may be due to B cell tolerance to CMV, either because of early transmission of the virus to infants or immunosuppression during tumor development.
Another recent study found that CMV-specific IgG titers were higher in patients with GBM than in healthy donors; however, they also found that higher titers in patients with GBM were not associated with prolonged survival time (Rahbar et al., 2015). It is possible, then, that the potential strength or specificity of Ig-antigen binding is a better predictor of OS rather than Ig titer alone. These latter results would appear to be in contradiction the results presented in this report; however, the great differences regarding the methods of assessment of IG-CMV interactions could be a reason for such apparent discrepancies.
To the best of our knowledge, no studies have sought to determine whether anti-CMV antibodies show promise as a GBM treatment. However, a 2018 study that enrolled 14 patients with glioblastoma, as well as patients with other brain or pancreatic cancers, showed measurable yet heterogenous humoral immune reactivity against CMV pp65 epitopes, suggesting that antiviral immune responses are integral to host cancer defense (Meng et al., 2018).
Given the indication of the importance of IGH targeting of several specific CMV proteins based on the previous results, one future avenue of research could involve further investigation into other methods of inhibition or targeting of these proteins. Various methods have been used to detect CMV proteins in glioblastoma tissue samples, and multiple studies have detected UL123 (aka regulatory protein IE1) in 93–100% of CMV-positive glioblastoma (Rahbar et al., 2015). UL123 has been found to have several cancer-related effects on cells, including dysregulating Cyclin E expression, activating telomerase, inducing angiogenesis via interleukin-8, inhibiting apoptosis, and inducing chromosome aberrations (Dziurzynski et al., 2012). UL123 also plays a role in the reactivation of CMV from latency, and it has been suggested that GBM may produce an inflammatory environment that is optimal for this reactivation (Dziurzynski et al., 2012; Maleki et al., 2020). Our results are consistent with important UL123 functions in GBM tumor development, thereby supporting the idea that these UL123 functions could be specifically targeted in therapies.
This study is limited in that it relies on correlations and has not yet led to verifications of CDR3-antigen binding in vivo or in vitro. Future experiments could involve such in vivo or in vitro binding assessments and, if successful, anti-CMV IGs could be considered for treatment. Having noted the preceding concern, it is also possible that bioinformatics approaches, as used here, could indicate effective IGH CDR3-CMV antigen interactions not detectable with conventional in vitro or in vivo approaches. Another limitation is the retrospective nature of this study, which raises the question of whether a prospective study could lead to similar results where any concerning confounding variables were first taken into consideration.
Footnotes
Acknowledgments
The authors thank University of South Florida research computing, Ms. Corinne Walters for extensive administrative support with dataset access procedures, and the taxpayers of the state of Florida.
Authors’ Contributions
T.R.H.: Conceptualization, formal analysis, methodology, visualization, and writing—reviewing and editing; J.J.S.: Formal analysis, methodology, and software; A.C.: Methodology and visualization; B.I.C.: Methodology, software, and visualization; G.B.: Project administration, resources, supervision, and writing—reviewing and editing.
Author Disclosure Statement
Authors have nothing to declare.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
Supplementary Table S5
Supplementary Table S6
Supplementary Table S7
Supplementary Table S8
Supplementary Table S9
Supplementary Table S10
Supplementary Table S11
Supplementary Table S12
Supplementary Table S13
Supplementary Table S14
Supplementary Table S15
Supplementary Table S16
Supplementary Table S17
Supplementary Table S18
Supplementary Table S19
Supplementary Table S20
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
