Abstract
Background:
Breast cancer (BC) in women is the most common malignancy worldwide, but there is still a lack of validated tools to accurately assess patient prognosis and response to available chemotherapy treatment regimens.
Method:
We collected ultrasound images and transcriptome data of BC from our breast center and public database. Key ultrasound features were then identified by using the support vector machine (SVM) algorithm and correlated with prognostic genes. Long-term survival-related genes were identified through differential expression analysis, and a prognostic evaluation model was established by using Cox regression. In addition, VPS28 from the model was identified as a promising biomarker for BC.
Results:
Using univariate logistic regression and SVM algorithms, we identified 12 ultrasound features significantly associated with chemotherapy response. Subsequent correlation and differential expression analyses linked 401 genes to these features, from which five key signature genes were derived using Lasso and multivariate Cox regression models. This signature not only facilitates the stratification of patients into risk-specific treatment pathways but also predicts their chemotherapy response, thus supporting personalized medicine in clinical settings. Notably, VPS28, in the signature, emerged as a significant biomarker, strongly associated with poor prognosis, greater tumor invasiveness, and differing expression across demographic groups.
Conclusion:
In this study, we use ultrasound genomics to reveal a signature that can provide an effective tool for prognostic assessment and predicting chemotherapy response in patients with BC.
Introduction
As the most common malignant tumor in women around the world, research on breast cancer (BC) treatment and diagnostic methods has always been an important topic in the medical field. 1 Traditional BC treatments mainly include surgery, radiotherapy, chemotherapy, and endocrine therapy. 2 –4 Although these clinical decisions can significantly improve the survival benefit of patients, there is a lack of tools to effectively assess the prognosis of patients and their response to existing chemotherapy regimens.
Ultrasonomics is an emerging technology in recent years, which analyses quantitative data in ultrasound images and extracts a large number of ultrasound features through deep learning techniques. These ultrasound features include morphological features, texture features and waveform features, etc., which can be used to accurately describe the biological characteristics and behaviors of tumors. In the evaluation of neoadjuvant chemotherapy, ultrasonomics is able to provide timely and accurate feedback on the treatment effect through the comparative analysis of pre and postchemotherapy images and detailed documentation of changes in the tumor and its microenvironment. 5 Some researchers have demonstrated high accuracy when predicting HER2+ and HER2− BC subtypes based on ultrasonomics. 6 This result could identify more aggressive BC subtypes in the early stages of the diagnostic process and provide a justification for early clinical intervention. Ultrasound genomics, which is derived from transcriptome sequencing data, can take advantage of the image and genetic data characteristics to more accurately reflect oncological characteristics. However, there are fewer studies in this area, especially based on ultrasound genomics and prediction of prognosis and chemotherapy response in BC patients.
In our study, we performed feature extraction and analysis of ultrasound images combined with correlation analysis of transcriptome data to reveal the set of genes associated with BC ultrasound signature. Subsequently, a signature model was constructed based on this gene set, which provided significant value in the prognostic assessment and efficacy prediction of BC, and provided more accurate and effective clinical decisions for BC patients.
Materials and Method
Data collection and preparation
We collected ultrasound image data of chemotherapy-sensitive and nonsensitive patients from the university clinic breast center and obtained the RNA sequence data from the public database. All image data were normalized before further processing. For the RNA sequence data, we only keep the mRNA data, and the mRNA expression of less than 50% of samples was also removed. The clinical data between the breast center and public sources was matched using the propensity score matching method. The workflow is shown in Figure 1.

Workflow of this study.
Ultrasound feature selection and building of the model
We utilized 3D slice to identify and distinguish between sensitive and nonsensitive tumor areas. Subsequently, we extracted ultrasound features from these images. We applied uni-logistic and support vector machine (SVM) algorithms to identify important features and used multilogistic regression to construct a signature related to ultrasound features.
Features-related genes and correlation with ultrasound features related signature
We used the K–M method to filter the survival-related genes, and we identified significant results with a p-value < 0.05. Based on the median expression value of ultrasound features-related signature, patients were divided into low and high groups. We conducted a correlation analysis between prognostic genes and the high group, enrolling those with a p-value < 0.05 and an r-value > 0.1 into the next procedure.
Differentially expressed gene analysis between short- and long-term overall survival
After the analysis above, we identified the candidate ultrasound-related genes. Using these genes and tumor barcodes, we created a new expression matrix. Next, we planned to divide the patients into two groups based on whether their survival time is greater than 5 years or not, and then conducted differentially expressed gene (DEG) analysis between the two groups. Genes with absolute value of log-fold change (logFC) > 1 and p < 0.05 were defined as DEGs.
Prognosis gene model build
We used the Lasso regression model to select important genes, and then input these genes into the multivariate Cox regression model to build the prognostic gene-related signature. The gene expression value and corresponding regression coefficients are two important parameters for model building. The sum of the products of these two parameters for each gene is the score of the model after evaluating the patient. The log-rank test was used to evaluate the survival difference between different risk groups.
Predict the chemotherapy response
We downloaded all of the Food and Drug Association (FDA)-approved chemotherapy drugs from a public database and then used the R package to conduct the drug sensitivity analysis. The-value < 0.05 was identified as sensitive. The top sensitive drug was shown using a Violin plot.
Identify model novel gene and validation clinic value
We reviewed the lecture and discovered a new biomarker for BC. We validated its expression in both normal and tumor tissues, using both unpaired and paired groups. In addition, we examined the survival differences between low- and high-expression groups of the gene. Furthermore, we analyzed the relationship between age, race, menopause status, and American Joint Committee on Cancer (AJCC) stage of this gene.
Results
Ultrasound features select and model build
After marking the target area, we input the mask file and the original ultrasound image into the software to extract the image features. Then, we used univariate logistic regression and SVM algorithms to identify the ultrasound features related to chemotherapy. The results showed that 12 features are associated with chemotherapy response (Fig. 2A). Next, we conducted a multivariate logistic regression model to build the ultrasound features-related signature.

SVM discovers the 12 important features of ultrasound images
Identified ultrasound genomics-related gene
We divided the model into low- and high-risk groups and conducted correlation analysis with 1965 prognostic genes obtained from batch survival analysis. The results indicate that a total of 401 genes are associated with ultrasound features, with the top 20 genes listed in Figure 2B. In addition, we analyzed the DEGs between short- and long-term groups using the above gene sets and identified 37 DEGs associated with survival duration (Fig. 2C).
Ultrasound genomics-related signatures could guide patient stratification
We used the R package “cart” to randomly split patients into training and test datasets. Then, we performed Lasso regression and multivariate Cox regression analysis to identify five candidate signature genes and their related coefficients (Fig. 2D–F). The dot plot and heat map of the model in the training dataset are depicted in Figure 3A. Survival analysis indicates that high-risk patients have a poor outcome, whereas low-risk patients experience the opposite prognostic events (Fig. 3B). The validation results in the test dataset are consistent with those in the training set (Fig. 3C and D).

The dot plot and heatmap of five signature genes in the training datasets
Ultrasound genomics signature could predict the chemotherapy response
Chemotherapy is a crucial component of clinical decision-making, and predicting the response to therapy is beneficial for personalized treatment. Our analysis reveals that a total of 38 FDA-approved drugs are sensitive to both low and high-risk patient groups, with the top five drugs shown using violin plots (Fig. 4A–E). We found that low-risk patients are more sensitive to traditional chemotherapy drugs such as cisplatin and gefitinib and the results are also visualized as a matrix diagram (Fig. 4F).

The top five chemotherapy drugs for low risk of signature are temsirolimus
VPS28 shows high expression in BC and is associated with prognosis
In our study, VPS28 is identified as a novel biomarker for BC. The body heatmap indicates a high expression of VPS28 in BC (Fig. 5A), and this conclusion is supported by both unpaired and paired group analyses (Fig. 5B and C). Survival analysis demonstrates that the high expression of this gene leads to poor overall survival (Fig. 5D). However, no significant differences were observed in terms of disease-specific survival (DSS) and progression-free interval (PFI) between high and low-expression patients (Fig. 5E and F).

VPS28 is high expression in breast cancer from body heatmap, unpaired samples and paired samples
VPS28 is associated with higher tumor invasiveness
To evaluate the clinical value of this gene, we analyzed the correlation between this gene and common clinical factors. We found that patients of old age exhibit high expression of this gene (Fig. 6A). Compared with White and other races, the Asian race shows a low expression of VPS28 (Fig. 6B). In addition, postmenopausal patients also exhibit a high expression of VPS28 (Fig. 6C). Moreover, we found that VPS28 is associated with higher tumor invasiveness, meaning that these patients have bigger tumor sizes (Fig. 6E), more lymph node infiltration (Fig. 6F), and advanced tumor stage (Fig. 6G).

Patients of old age have high expression of VPS28
Discussion
BC continues to be a leading cause of cancer-related death among women, largely due to the disease’s significant heterogeneity, propensity for metastasis, and resistance to therapy. 7 The treatment of BC involves a variety of approaches, such as surgery, chemotherapy, radiotherapy, endocrine therapy, targeted therapy, and immunotherapy, all of which require the coordinated efforts of multiple subspecialties. 8 Despite the significant improvements in overall patient outcomes due to existing treatments, further efforts are still required to tailor appropriate therapeutic strategies for this heterogeneous disease. 9 Over the past decade, the large-scale integration of genomic and transcriptomic data has revealed distinct cancer subtypes, providing a foundation for personalized precision therapy. 10,11 In our study, we introduce, for the first time, an ultrasound genomics-based model that identifies a signature with the ability to accurately predict BC patient outcomes and their responsiveness to these different therapeutic strategies, including commonly used chemotherapy drugs.
Ultrasound genomics, which is favored with its noninvasive feature, combines the analysis of ultrasound imaging features with genomic data, integrating with radiomics and advanced machine learning algorithms, to enhance the detection, characterization, and prognosis of cancers, 12 has been applied in cancer studies. 13,14 However, research on ultrasound genomics in BC is still in its early stages. Some scholars have utilized ultrasound to guide BC management, and they found that a cut-off of ≤5 ultrasound-detected abnormal axillary nodes can accurately identify BC patients with a limited nodal burden, allowing for the potential omission of chemotherapy in favor of upfront surgery followed by gene assay testing. This approach may help guide treatment decisions, particularly between immediate surgery and the need for neoadjuvant chemotherapy. 15 Another study demonstrated an association between vascular ultrasound features and DNA sequencing in BC, revealing that specific ultrasound characteristics are significantly correlated with certain SNPs related to angiogenesis and prognosis. These findings suggest that vascular ultrasound could potentially reflect underlying genomic changes, aiding in the prediction of disease outcomes. 16 While previous studies have provided significant insights into the prognosis and treatment guidance of BC, no research has yet explored the use of combined ultrasound genomics to assess treatment response in BC. As shown in the results, our ultrasound genomics model effectively predicted the prognosis and treatment response of patients with BC in TCGA cohort, highlighting that ultrasound genomics, and by extension radiogenomics, holds critical value not only in diagnosis but also in predicting treatment response, underscoring its broader applicability in personalized cancer care.
In our ultrasound-genomics model, a six-gene signature comprising KLF11, FREM1, VPS28, PHKB, and CCDC9B was constructed, with VPS28 identified as the hub gene. Kruppel-like factor 11 (KLF11), a mediator of TGFβ signaling, plays a role in regulating growth inhibition in untransformed epithelial cells. In pancreatic cancer, KRAS-driven ERK pathway activation phosphorylates KLF11, disrupting its interaction with SIN3a and leading to SMAD7 transcriptional activation, which promotes tumor progression. 17 However, the specific role of KLF11 in BC remains unexplored. FRAS1-related extracellular matrix 1 (FREM1) is a gene that encodes a protein involved in the formation of the extracellular matrix and is part of the FRAS1-related extracellular matrix protein family. 18,19 Although its role in tumors remains unclear, it may contribute to tumor progression via the extracellular matrix. Phosphorylase kinase regulatory subunit beta (PHKB) has been reported to play a role in chronic myelogenous leukemia, where it is positively regulated by the enhancer RNA Hmrhl, contributing to leukemia progression. 20 However, its involvement in BC remains unexplored. In a recent study, coiled-coil domain containing 9B (CCDC9B) was identified as part of a six-gene risk signature that effectively predicts BC prognosis, especially in triple-negative BC (TNBC) patients, and could be valuable in guiding personalized treatment strategies. 21 This finding is consistent with our results. Although vacuolar protein sorting 28 (VPS28), as the core gene of this signature, is well-known for its role in Candida albicans virulence and HIV-1 budding, 22 recent research has begun to focus on its function in BC. For instance, Shi et al. 23 found that the expression of VPS28 is associated with poor prognosis and shorter survival of BC patients, and miR-491-5p can suppress BC progression by reducing VPS28 expression, highlighting its potential as a therapeutic target. Another study revealed the coexpression of EXCSC4 with VPS28 and Myc, showing elevated serum levels that are risk factor of OS among BC patients. 24 The differential expression of VPS28 across various subgroups in our study also suggest VPS28 as a risk factor in BC patients. Interestingly, our study revealed that high VPS28 expression is associated with poorer OS but does not impact DSS or PFI. Higher expression of VPS28 was found primarily impact late-stage disease might explain this observation.
Despite advancements in BC therapy, how to identify potential patients who could benefit from these new drugs remains critically challenging. In addition to its diagnostic and prognostic roles, our ultrasound genomics signature can also predict sensitivity to common therapy drugs, such as cisplatin and gefitinib, which are commonly applied in BC therapies. Notably, lenalidomide is an immunomodulatory drug that modulates the immune system and exhibits antiangiogenic properties, primarily used in hematological malignancies. The application of lenalidomide in BC remains in the early exploratory stage. 25 A previous study has shown that lenalidomide enhances the effectiveness of cisplatin in the TNBC cell line by significantly reducing cell viability, inducing apoptosis, and modulating key molecular pathways, suggesting potential therapeutic synergy in TNBC. 26 Vorinostat, a histone deacetylase inhibitor, has shown promise in preclinical and clinical trials for BC treatment, either alone or in combination with other anticancer agents. 27,28 All these results suggest our ultrasound genomics signature offers a promising tool for predicting drug sensitivity and could potentially enhance the BC therapy response.
Conclusion
In our study, our ultrasound genomics-revealed signature can provide an effective tool for prognostic assessment and chemotherapy response prediction in patients with BC. Application of this model not only improves the personalization of treatment but also provides more accurate and effective clinical decision-making for patients with BC.
Footnotes
Authors’ Contributions
Conceptualization: Q.L., W.W., and R.L.D.W. Methodology: Q.L., X.C., and G.W. Formal analysis: B.C., Z.G., and L.A.T.R. Data preparation: G.W. and R.Z. Writing—original draft: Q.L. and B.C. Supervision: R.L.D.W., X.C., and W.W. Writing—revision: R.L.D.W., L.A.T.R., B.C., and Z.G. Project administration and funding acquisition: W.W. and X.C. All authors have reviewed and approved the final version of the article.
Data Availability
All data can be obtained from the corresponding author.
Ethics Approval Statement
Ethics approval was obtained from the Second Affiliated Hospital of Guilin Medical University (HXXM-2022001).
Patient Consent Statement
Patient consent permission form.
Permission to Reproduce Material from Other Sources
All code with analysis can be obtained from the corresponding author.
Disclosure Statement
All authors declare no conflicts of interest.
Funding Information
This study supported by Wu Jieping Medical Foundation (320.6750.2022-19-40).
