Abstract
OBJECTIVES:
The purpose of our study is to present a method combining radiomics with deep learning and clinical data for improved differential diagnosis of sclerosing adenosis (SA)and breast cancer (BC).
METHODS:
A total of 97 patients with SA and 100 patients with BC were included in this study. The best model for classification was selected from among four different convolutional neural network (CNN) models, including Vgg16, Resnet18, Resnet50, and Desenet121. The intra-/inter-class correlation coefficient and least absolute shrinkage and selection operator method were used for radiomics feature selection. The clinical features selected were patient age and nodule size. The overall accuracy, sensitivity, specificity, Youden index, positive predictive value, negative predictive value, and area under curve (AUC) value were calculated for comparison of diagnostic efficacy.
RESULTS:
All the CNN models combined with radiomics and clinical data were significantly superior to CNN models only. The Desenet121+radiomics+clinical data model showed the best classification performance with an accuracy of 86.80%, sensitivity of 87.60%, specificity of 86.20% and AUC of 0.915, which was better than that of the CNN model only, which had an accuracy of 85.23%, sensitivity of 85.48%, specificity of 85.02%, and AUC of 0.870. In comparison, the diagnostic accuracy, sensitivity, specificity, and AUC value for breast radiologists were 72.08%, 100%, 43.30%, and 0.716, respectively.
CONCLUSIONS:
A combination of the CNN-radiomics model and clinical data could be a helpful auxiliary diagnostic tool for distinguishing between SA and BC.
Introduction
Sclerosing adenosis (SA) of the breast is a rare benign disease that is mainly derived from the terminal ductal-lobular unit and characterized by proliferation of the myoepithelium and epithelium [1]. It is pathologically characterized by small and similarly shaped epithelial cells that are radially arranged, and the proliferative tissue squeezes the lobule to form a pseudo-infiltration that is likely to be misdiagnosed as invasive breast carcinoma [2, 3]. SA can also mimic breast cancer (BC) on ultrasonography (US), mammography, and MRI scans because of the nonspecific and varied imaging findings detected during the course of the disease [4–6]. For example, it is difficult to distinguish the spicules on US scans of SA that are caused by fibrous tissue or sclerotic stroma from the spicules in malignant lesions that are caused by tumour infiltration or periductal fibrosis. A previous study showed that the misdiagnosis rate of SA based on US, mammography, and MRI findings was 17.5%, 17.9%, and 35.3%, respectively [7]. Therefore, precise diagnosis of SA and BC based on imaging findings remains a challenge.
Preoperative precise diagnosis of SA and BC is crucial for decision making and avoiding unnecessary biopsy and, subsequently, reducing the economic and psychological burden of patients. However, as mentioned above, correct diagnosis based on breast US imaging features by real-world radiologists has several limitations. There are significant differences in many morphological features between the two diseases on US images, and the typical characteristics of BC include a non-circumscribed margin, irregular shape, and microcalcification, among others. However, due to the wide range of overlapping features, such as calcification, echo attenuation, and irregular shape, it is difficult to capture subtle differences with the naked eye for accurate diagnosis. With the development of artificial intelligence (AI), deep learning has been widely used in medical image processing, including detection, segmentation, and classification [8–10]. As the most popular architecture in DL, the diagnostic accuracy, efficiency, and interobserver agreement of real-world radiologists have improved with the aid of the convolutional neural networks (CNNs) [11, 12]. The addition of AI has been a key step in the evolution of US technology in precision medicine. In the analysis of breast US images, AI has shown promising results for mass classification, lymph node prediction, and molecular subtype prediction [13, 14]. However, the application of AI for the differential diagnosis of SA and BC has not been explored in detail.
Radiomics can provide high-throughput quantitative parameters from images, for example, tumour shape, intensity, wavelet textures, and other parameters. However, these shallow and low-order features can only be identified and encoded by experts [15]. In contrast to traditional radiomics, deep-learning models can carry out the task procession automatically. As a promising alternative, deep learning radiomics (DLR) merges the benefits of both these methods to provide high-level self-study features and prospective diagnostic ability for the classification and prognostic evaluation of breast masses on US images [16, 17]. Several studies have shown that combining the quantitative features extracted by radiomics with the discriminative features extracted by deep learning can be used for classification based on a small dataset of images [18, 19]. As there is very little information available about SA, such a hybrid model that utilizes small-scale data could be promising for the diagnosis of SA. In the clinical context, patient age and lesion size are important reference variables for the classification of benign and malignant breast nodules [20, 21]. For example, studies have shown that individuals with SA tend to be younger than those with BC [22]. Therefore, in the present study, we hypothesize that DLR based on breast US images combined with objective clinical variables might provide much more information and better diagnostic performance for correctly classifying SA and BC. Accordingly, the purpose of this study was to explore the value of DLR combined with patient age and tumour size in distinguishing SA from BC, and to compare its diagnostic performance with that of experienced radiologists.
Material and methods
Study population
Institutional Review Board approval for this retrospective study was obtained from our hospital (Approval No. 2019KY055). The need for written informed consent was waived by the Institutional Review Board. This study included 97 cases of SA and 100 cases of BC that were treated between January 2012 and February 2021. The inclusion criteria were (1) complete conventional US examination before surgical excision, and (2) US images of good quality, without colour Doppler signal or other interference. The exclusion criteria were (1) histological features confirmed by biopsy rather than surgery, (2) SA accompanied with malignancy based on pathological examination, and (3) metastatic breast tumour.
US image acquisition and evaluation
The breast US images in all cases were acquired using US equipment manufactured by Philips (IU22; Amsterdam, the Netherlands), Hitachi (EUB 8500; Tokyo, Japan), Esaote (MyLab™Twice; Genoa, Italy), and Siemens (ACUSON S3000; Munich, Germany) with linear transducers of frequency 5–12 MHz. US images that met our inclusion criteria were selected and used for further study. The US images were evaluated by two experienced breast radiologists who were blinded to the clinical history, but not to the age, of each patient. The two radiologists had 6 and 20 years of experience with breast US interpretation. In case of disagreements, the radiologists discussed the issue and came to a consensus. In accordance with the American College of Radiology’s Breast Imaging Reporting and Data System (BI-RADS) lexicon, the breast lesions were classified into categories 3–5: lesions in category 3 were regarded as benign, and those in categories 4a, 4b, 4c, and 5 were regarded as malignant.
Radiomics feature extraction and feature selection
The workflow of our study is shown in Fig. 1. LabelImg was used to delineate the region of interest (ROI) on US images; this was performed manually by one radiologist with 6 years of experience in breast US interpretation. For each US image, the deep learning features were extracted by CNN models, and the quantitative features by radiomics. Specifically, the quantitative features included statistical, textural, and 7 Hu invariant features. A total of 73 radiomics features were extracted from the ROI and used for further analysis. The radiomics features are presented in detail in Table 1. In order to further improve the performance of the model, we implemented multiple processes for feature selection. First, intra-/inter-class correlation coefficients (ICC) were determined to evaluate feature reproducibility. Next, the least absolute shrinkage and selection operator (LASSO) method was successively used to reduce the features’ redundancy and to determine the candidate set of predictive features.

Deep learning combined with radiomics analysis workflow in this study. (1) Top part: the deep learning features were extracted by CNN models; (2) Middle part: the quantitative features by radiomics methods; (3) Bottom part: the clinical data consist of information about patients and the US lesions. Finally, the deep learning features, radiomics features, and clinical data were concatenated to form a new feature vector, which was utilized for the final prediction of SA and BC.
Details of the radiomics features
Four convolutional neural network (CNN) models (Vgg16, Resnet18, Resnet50, and Desenet121) were used to extract deep learning features, which were compared with a 5-fold cross-validation strategy. The overall classification accuracy (ACC), sensitivity (SEN), specificity (SPE), Youden index (YI), positive predictive value (PPV), and negative predictive value (NPV) were calculated as shown below.
TP, TN, FP, and FN represent the number of true positive, true negative, false positive, and false negative results, respectively. The classification results are presented as the mean±standard deviation values over repeat runs. The receiver operating characteristic (ROC) curve and the corresponding area under the curve (AUC) value were also used as evaluation indices. The descriptive variables lesion size and patient age are presented as mean±standard deviation. The differences among the clinical characteristics of the SA and BC lesions were analysed using the Chi-square test or Fisher’s exact test. All statistical analyses were performed using the R software with R Studio version 3.5.2 and SPSS version 22.0 (IBM Corp, Armonk, NY). The statistical significance levels were two-sided, and P values <0.05 were considered to indicate statistical significance.
Clinical characteristics of the patients
This study included 97 patients with SA and 100 patients with BC who had a mean age of 46.12±12.47 years and 61.25±12.53 years, respectively. The US and clinical characteristics of these two groups of patients are summarized in Table 2. With the exception of family history, the number of lesions (single or multiple), and echo pattern (hypoechoic, hyperechoic, or mixed echo), the SA and BC groups differed significantly in terms of the other characteristics listed in the table (P < 0.05).
Comparison of ultrasound characteristics between SA and BC
Comparison of ultrasound characteristics between SA and BC
The clinical features including patient age, nodule size (longitudinal and transverse diameter), and aspect ratio, are shown in Table 3.
Clinical parameters used in the CNN models
Table 4 shows the classification results with different CNN models and DLR models with clinical data created using the same US dataset. The final diagnostic results according to the radiologists were not satisfactory with regard to certain parameters, such as the classification accuracy (72.08%), specificity (43.30%), YI (43.30%), and PPV (64.52%). The DLR models with clinical data outperformed the CNN models based on all the evaluation indices. Specifically, the Densenet121+Radiomics+Clinical data model exhibited the best classification accuracy, sensitivity, specificity, YI, PPV, and NPV, which were 86.80±2.10% (95% CI, 84.19–89.40), 87.60±3.35% (95% CI, 83.06–91.13), 86.20±3.40% (95% CI, 81.97–90.42), 73.79±3.86% (95% CI, 69.00–78.58), 87.42±4.64% (95% CI, 81.66–93.18), and 86.01±3.09% (95% CI, 80.33–90.68), respectively. Importantly, the Densenet121+Radiomics+Clinical data model outperformed the Densenet121 model and showed a 1.57%, 2.12%, 1.18%, 3.29%, 1.16%, and 2.08% improvement in classification accuracy, specificity, YI, PPV, and NPV, respectively. These findings indicate that the application of radiomics and clinical data improved the diagnostic performance of the deep learning models.
Evaluation of different CNN models and DLR models with clinical data for distinguishing SA from BC
Evaluation of different CNN models and DLR models with clinical data for distinguishing SA from BC
Notes: R: radiomics features. C: clinical data. Absolute values or percentages with corresponding 95% confidence intervals (95% CI) indicated in parentheses.
Comparison of the AUC value obtained by ROC curve analysis between the CNN models and DLR models with clinical data for distinguishing SA from BC showed that the Densenet121+Radiomics+Clinical data model, again, exhibited the best performance with an AUC of 0.915 (95% CI, 0.878–0.952) (Fig. 2). Moreover, all the DLR models with clinical data had a higher AUC value than the corresponding CNN models. These findings imply that deep learning combined with radiomics and clinical data can improve the differentiation of SA and BC.

Comparison of ROC curves, AUC values between CNN models and DLR models with clinical data for distinguishing SA from BC. (a) ROC curves for Vgg16 and Vgg16+ Radiomics + Clinical data models, (b) ROC curves for Resnet18 and Resnet18 + Radiomics + Clinical data models, (c) ROC curves for Resnet50 and Resnet50 + Radiomics + Clinical data models, (d) ROC curves for Desenet121 and Desenet121 + Radiomics + Clinical data models.
In this study, we have proposed four DLR models with clinical data for the differential diagnosis of SA and BC. Densenet121+Radiomics+Clinical data model was the best-performing model with an AUC of 0.915 (95% CI: 0.878–0.952), a classification accuracy of 86.80±2.10% (95% CI, 84.19–89.40), a sensitivity of 87.60±3.35% (95% CI, 83.06–91.13), and a specificity of 86.20±3.40% (95% CI, 81.97–90.42). Overall, the CNN models and DLR models with clinical data performed better than the real-world experienced breast radiologists. Our results demonstrate the feasibility of using this improved approach to accomplish intractable classification tasks based on US images. To the best of our knowledge, this is the first attempt to apply DLR along with clinical data for clinically discriminating between SA and BC.
It has been confirmed that multimodal imaging of breast lesions including CEUS, Elastography, MRI, etc., has superior diagnostic performance compared with US alone [23]. As for CEUS, it was stated that the application of tumor specific contrast media in combination with CEUS perfusion was helpful in diagnosing early stages of breast cancer [24]. Abundant information received from multimodal imaging facilitate the treatment plan made by clinicians. However, clinical underutilize is time-consuming and operator dependent. Hence, computer-assisted image analysis with machine learning techniques covers the shortage with more efficient and precise results compared with real-world radiologists [25–27]. A few studies have indicated the value of deep learning with multimodal breast images to predict the extent of axillary lymph nodes or sentinel lymph nodes involvement preoperatively, then to determine appropriate axillary treatment options for patients [16, 29]. Apart from these improvements, US data combined with BI-RADS category demonstrated potential application value for predicting breast cancer which was classified as BI-RADS US III category, because of the existing of overlap with BI-RADS IV category in diagnostic criteria [30, 31].
As a retrospective study, we collected all images that could be applied to this study. Among these, 25 of 100 breast cancer cases and 29 of 97 sclerosing adenosis cases underwent CEUS examination for micro-vascularization assessment. Then, only 31 of 97 sclerosing adenosis cases showed I-II Alder Type of macro-vascularization. Hence, Because of the data modal cannot be aligned, we were not able to use the tumor vascularization information for this study. In the following prospective study, our research group hopes to conduct exploratory research on breast ultrasound technology based on multimode US images, believing that it will have better diagnostic results and better serve clinical work.
Our findings showed that the specificity for diagnosis by the radiologists in our study cohort was low. Specifically, 56.7% (55/97) of the SA lesions were incorrectly classified as benign (based on their categorization as BI-RADS 4-5) by experienced breast radiologists. Among the 55 misdiagnosed SA lesions, 46 lesions (83.6%) had irregular shape, 32 lesions (58.2%) had irregular shape, 16 lesions (29.1%) had posterior acoustic shadowing, and 13 lesions (23.6%) had calcifications in their mass. Consistent with previous studies [5], these characteristics mimic those of BC but have a higher prevalence in SA. Liang et al. created and verified a nomogram to distinguish BCs and SAs by combining US features, the objective indicators age and tumour size, and several subjective indicators (calcification, echogenic rim, and vascularity distribution), but this is a time-consuming and somewhat subjective process [32]. In contrast, the DLR model with clinical data used here to assist clinical decision making by integrating clinical data and radiomics features with network characteristics, which can help provide complementary information for images features to build models, thus promoting model performance.
From traditional radiomics to deep learning models, several studies have demonstrated that AI can greatly reduce the workflow of radiologists and is a fast, accurate, and reproducible technology. For example, Wang et al. adopted a combined radiomics and deep learning model created from a small dataset of 111 lung adenocarcinoma patients to identify high-grade adenocarcinoma with an accuracy of 0.966 in an independent validation dataset of 28 patients [17]. As the present study had limited data, we adopted a similar mixed model that incorporates as many features as possible to provide radiologists with a solution for the accurate diagnosis of SA and BC. The results showed that the diagnostic performance was significantly improved by the mixed models. More importantly, our proposed method greatly improved the specificity and accuracy of this diagnostic task compared with radiologists, which is a major breakthrough for clinical practice and can reduce unnecessary biopsy and surgery.
It is known that BC is strongly related to age, and that larger lesions (based on maximum diameter) have a greater possibility of malignancy [21, 22]. For patients with suspicious breast lesions on US images, tumour size and patient age are routinely taken into consideration for final diagnosis by real-world radiologists. Therefore, in the present study, apart from radiomics features that were extracted from US images, four clinical parameters, including patient age and tumour size, were added to the full connection layer of the CNN models. In the final model, radiomics features and clinical data, along with deep learning features, were used for the classification of SA and BC. As expected, the results showed that better diagnostic performance was achieved when the clinical data, radiomics features, and deep learning features were combined than when the deep learning features only. In agreement with these findings, a study by Zheng et al. also showed that clinical data combined with the DLR method resulted in significantly better prediction of lymph node metastasis than application of either method alone, such as axillary US, clinicopathologic data, and images only [29]. Thus, these results show that clinical data can improve the diagnostic efficiency of these models, and clinical data mining is, therefore, also useful for imaging diagnosis.
There are several limitations in this study. First, the limited data were collected retrospectively from a single centre. A larger and external validation dataset should be examined in the future to verify the generalizability of the model. Second, comprehensive diagnosis of breast nodules through multimodal ultrasound imaging is important. Therefore, the prospective and standardized collection of multimodal data is necessary for continuous improvement of the diagnostic performance of DLR models.
In conclusion, our proposed DLR approach with clinical data can improve the diagnosis of SA and BC. Importantly, the findings demonstrate the benefits of DLR models with clinical data as a supplementary diagnostic tool that would be useful to radiologists.
Funding
This work was supported by the National Natural Science Foundation of China (Grants No. 82071931, 82130057), the interdisciplinary program of Shanghai Jiaotong university (ZH2018ZDA17), the program from Science and Technology Commission of Shanghai Municipality (No. 20Y11912400), and the 2019 clinical research innovation team of Shanghai General Hospital (No. CTCCR-2019B05).
