Abstract
BACKGROUND:
The main metastatic route for lung cancer is lymph node metastasis, and studies have shown that non-small cell lung cancer (NSCLC) has a high risk of lymph node infiltration.
OBJECTIVE:
This study aimed to compare the performance of handcrafted radiomics (HR) features and deep transfer learning (DTL) features in Computed Tomography (CT) of intratumoral and peritumoral regions in predicting the metastatic status of NSCLC lymph nodes in different machine learning classifier models.
METHODS:
We retrospectively collected data of 199 patients with pathologically confirmed NSCLC. All patients were divided into training (n = 159) and validation (n = 40) cohorts, respectively. The best HR and DTL features in the intratumoral and peritumoral regions were extracted and selected, respectively. Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Light Gradient Boosting Machine (Light GBM), Multilayer Perceptron (MLP), and Logistic Regression (LR) models were constructed, and the performance of the models was evaluated.
RESULTS:
Among the five models in the training and validation cohorts, the LR classifier model performed best in terms of HR and DTL features. The AUCs of the training cohort were 0.841 (95% CI: 0.776–0.907) and
CONCLUSIONS:
Compared with the radiomics signature, the DTL signature constructed based on intratumoral and peritumoral areas in CT can better predict NSCLC lymph node metastasis.
Background
Lung cancer is the second most prevalent malignancy worldwide, accounting for approximately 11.4% of new cancer cases in 2020, and the leading cause of cancer-related deaths, accounting for approximately 18% of cancer-related deaths due to cancer [1]. Non-small cell lung cancer (NSCLC) accounts for approximately 82% of all lung cancer cases [2]. The main metastatic route for lung cancer is lymph node metastasis, and studies have shown that NSCLC has a high risk of lymph node infiltration [3, 4]. It is the main cause of tumor recurrence and distant metastases in patients after surgery. The postoperative survival rates also differ significantly for NSCLC patients with different lymph node statuses [5]. The 5-year survival rate for patients with no mediastinal or hilar lymph node metastases (N0) is approximately 56%, and for those with lymph node metastases (N1, N2, and N3), it is approximately 38%, 22%, and 6%, respectively [6]; therefore, the higher the N-stage of lymph node metastasis, the lower the survival rate of patients. Thus, preoperative determination of whether lymph node metastasis has occurred in NSCLC is important for individualized treatment and prognosis.
Conventional imaging methods for the preoperative assessment of lymph node metastases are inadequate, with Computed Tomography (CT) and Magnetic Resonance (MR) relying on changes in lymph node size to determine the presence of lymph node metastases: a short lymph node diameter>10 mm commonly used as a criterion [7]. However, the accuracy of determining whether a lymph node is metastatic based solely on changes in imaging morphology is low, and it is easy to misdiagnose enlarged lymph nodes with inflammatory responses as lymph node metastases, resulting in patients without lymph node metastases being overly cleared intraoperatively. However, metastatic lymph nodes<10 mm are easily missed, with a sensitivity rate of 51% [5, 7], resulting in patients who should have had their lymph nodes cleared intraoperatively and instead did not receive this treatment. Positron Emission Tomography/ Computed Tomography (PET/CT) has a higher accuracy than CT and MR; however, it has a relatively high false-positive rate and is easily confused with inflammatory processes and granulomatous infections in the early stages of lung cancer [8, 9]; the sensitivity of detecting lymph node metastases in lymph nodes<10 mm is only 32.4% [10]. Therefore, conventional imaging methods are limited in their ability to accurately assess preoperative lymph node metastasis in NSCLC.
Solid NSCLC tumors can spread through the tracheal, vascular, and lymphatic systems into the surrounding lung parenchyma, and there are areas of subclinical infiltration around the solid tumor [11, 12]. Studies have shown that spreading of lung adenocarcinoma tumor through the airspace is a significant predictor of occult lymph node metastasis and that the smaller the extent of tumor resection, the higher the probability of lymph node metastasis [13]. This subclinical stage of microscopic invasion cannot be detected by physicians using conventional imaging methods; however, the tissue surrounding the tumor is infiltrated and may spread distantly via lymphatics and blood vessels. Therefore, there is a need for a new examination technique that can predict the areas of peritumoral subclinical infiltration at the microscopic level.
Recently, radiomics, a technique that can quantify the biology of a lesion and heterogeneity within a tumor at a microscopic level, has been widely studied [14–16]. Radiomics breaks away from the traditional morphology-based medical imaging paradigm and allows access to biological behavioral diagnostic information that is not visible to the naked eye. This technical approach can be applied to the analysis of peritumoral microenvironmental alterations. Some researchers extracted CT imaging histological features from the peritumoral region of the lung and found that the inclusion of distal peritumoral imaging histological features could improve the predictive power of the model [17]. However, radiomics features, which can only manually extract the texture, size, volume, shape, and intensity features of tumors, also have limitations. Therefore, new methods, such as deep learning, are required to extract more complex and higher-dimensional features, potentially improving the predictive and generalization capabilities of the model [18–20].
Deep convolutional neural networks (CNN) have recently achieved remarkable results in computer vision for tumor grading prediction, patient prognosis, pathological classification, and organ segmentation [21, 22]. Compared to handcrafted radiomics(HR) features, deep learning features reflect information in medical images from a different perspective and at a deeper level and may add predictive value to the status of lymph node metastasis in NSCLC. However, deep learning models require a large number of image sets for training, and medical image sets are often limited compared with natural image sets, making it difficult to train CNN models from scratch. Therefore, deep transfer learning (DTL) has been proposed to overcome this shortcoming [23, 24]. DTL uses a model pre-trained from images in other domains, applying what the model learns during training to perform a specific task and using it to perform another potentially irrelevant new dataset [25]. Currently, DTL is widely used in the field of deep learning to alleviate the limitations of small datasets. Therefore, this study aimed to investigate the prediction of NSCLC lymph node metastasis status based on HR features and DTL features on intratumoral and peri-tumoral preoperative CT images and compare the differences between them, thus providing a complementary aid to the radiological assessment of patients.
Image preprocessing related work
The CT scan parameters are presented in Supplementary Table. All patient images were retrieved from our Picture Archiving and Communication Systems (PACS)and exported in Digital Imaging and Communications in Medicine Format for enhanced CT images of the patients’ chest. First, to eliminate parameter interference from different machines, we resampled all patient CT images into 1 ×1 ×1 mm voxels. Two radiologists with over five years of experience outlined the regions of interest (ROIs) in the images, which were then combined into a volume of interest. A week later, CT images of a random sample of 30 patients were re-outlined to calculate the intraclass correlation coefficient (ICC), retaining features with an ICC>0.75. In cases where tumor boundaries were unclear, decisions were made by a more experienced radiologist (>15 years).
Methods
This retrospective study was conducted in accordance with the Declaration of Helsinki and approved by the Ethical Committee of the First Hospital of Jiaxing (NO. 2022-LY-474), which waived the requirement for informed consent from individual patients.
Intra-peritumoral segmentation
The peritumoral region was expanded 0–4 mm around the tumor using the “SimpleITK” package in Python 3.6. Radiomic features were extracted from both intratumoral and peritumoral areas after pre-processing. The extent of the peritumoral area was determined based on previous studies, and most studies were within 10 mm of peritumoral extension.
Before DTL feature extraction, images underwent pre-processing. The “largest tumor cross-section” was automatically selected on axial slices, with two levels above (+1,+2) and below (–1, –2) taken for a total of five cross-sectional images. These were resized to 224×224 pixels to fit the CNN models’ input requirements. A Resnet50-based CNN model was constructed for the intra-peritumoral ROI images.
Handcrafted radiomic feature extraction and deep learning feature integration
Firstly, radiomic features were extracted from manually delineated intratumoral regions and automatically segmented peritumoral regions. These features are categorized into three groups: (I) geometry, (II) intensity, and (III) texture. Texture features, in particular, were derived using various methods such as the gray-level co-occurrence matrix, gray-level run length matrix, gray-level size zone matrix, and neighborhood gray-tone difference matrix. Next, deep transfer learning was employed to extract deep tumor-related (DTL) features from CT images of the intra-peritumoral regions. The pre-trained ResNet50 model on the ImageNet dataset was utilized as the base model [26]. This model contains an input layer with a 224×224 image size and five convolutional layers that progressively enhance the feature map, culminating in an average pool of 2048 features. ResNet50 incorporates residual learning to prevent issues like gradient vanishing and accuracy degradation in deeper networks, thus improving efficiency, accuracy, and speed.
For HR features, features with ICC>0.75 were filtered first, and then the ICC-filtered intra-peritumoral HR features were fused for features. For DTL features, we fused features from DTL features extracted at five intra-peritumoral levels. Regularization of HR features and DTL feature data was then performed. Normalizing the data change to a 0 mean 1 variance, the regularization formula is
Radiomics and DTL signature construction
After feature fusion and filtering, we constructed machine-learning classifier models using HR and DTL features. Provided by Python Scikit-learn (https://scikit-learn.org/stable/user_guide), including the Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Light Gradient Boosting Machine (Light GBM), Multilayer Perceptron (MLP), and Logistic Regression (LR) models. To prevent overfitting, a five-fold cross-validation was performed to select the best parameters for the classifier models in the training cohort and compare the performance of different classifiers to finally select the best machine learning model. The discriminative power of the model was assessed using area under curve (AUC) and receiver operating characteristic (ROC) curve. Accuracy, precision, recall, specificity, and F1-score were also used as quantitative indicators.
Statistical analysis
We used independent sample t-tests, Mann–Whitney U tests, or chi-square tests to compare the clinical characteristics of the patients. ROC curve was used to evaluate the diagnostic effectiveness of the intra-peritumor HR features model and intra-peritumor DTL features model. In addition, we used precision (the ratio of the true value to the true value), recall (the ratio of true values predicted by the model to true values), F1-score (the summed average of accuracy and recall), and accuracy (proportion of true values out of the total number of true categories predicted) to quantitate indicators. The AUC of the best machine-learning models was compared using the Delong test. Calibration curves were used to compare the agreement between the predictions of the best machine-learning models and actual observations. Finally, the clinical utility of the best models was compared using decision curve analysis (DCA). P < 0.05 was considered statistically significant.
Results
Patients’ clinical characteristics
In total, 199 patients (59 patients with lymph node metastases and 140 patients without lymph node metastases) with pathologically confirmed NSCLC diagnosed at our hospital between November 2018 and July 2022 were enrolled. Approximately 138 adenocarcinomas, 53 squamous carcinomas, and 8 large-cell lung carcinomas were included. All patients underwent puncture biopsy for lung cancer and a CT-enhanced scan of the chest within 2 weeks preoperatively, with histological results obtained within 2 weeks postoperatively as the gold standard. The inclusion criteria were as follows: 1) surgically resected and pathologically confirmed NSCLC with definitive lymph node biopsy results and 2) complete CT imaging data recorded within 2 weeks preoperatively. The exclusion criteria were as follows: 1) history of preoperative treatment (e.g., radiotherapy), 2) a combination of other extra-pulmonary malignancies, 3) lack of CT data or histological findings, and 4) poor image quality. All patients were divided into training (n = 159, November 2018 to September 2019) and validation (n = 40, September 2019 to July 2022) cohorts. Clinicopathological information, age, sex, smoking history, maximum tumor diameter, tumor clinical stage, and pathological type were collected for each patient. A baseline table of the clinical characteristics of the lymph node metastatic status in patients with NSCLC is presented in Table 1.
Baseline characteristics of patients in cohorts
Baseline characteristics of patients in cohorts
LN, lymph node.
Statistically significant differences in maximum tumor diameter and tumor clinical stage were observed between the lymph node metastasis-positive and lymph node metastasis-negative groups in the training cohort. Statistically significant differences were observed in the clinical stage of the tumor between the groups with positive and negative lymph node metastases in the validation cohort. Conversely, the differences between groups for the remaining clinical features were not statistically significant.
In total, 1906 HR features were extracted for intratumoral and peritumoral (0–4 mm) regions, respectively, including 397 first-order features, 14 shape features, and 1495 texture features, the details of which can be found in Supplementary Fig. The original CT images were passed through wavelet, lbp-3D, log-sigma, square root, square, logarithm, exponential, and gradient filters. All features were extracted using PyRadiomics (http://pyradiomics.readthedocs.io) internal feature analysis program. A total of 3812 features were obtained by fusing the intratumoral and peritumoral 0–4 mm HR features. For the DTL features, 2048 DTL features were extracted for each of the five images, and all DTL features were fused to obtain 10240 features. After feature screening, the final 16 best HR features and 28 best DTL features remained.

Flowcharts of radiomics and DTL features modeling process.
After obtaining the best HR features and DTL features, we constructed SVM, KNN, Light GBM, MLP, and LR models using five-fold cross-validation and compared the performances of the different models (Table 2). Of the five classifier models in the training and validation cohorts, the LR classifier model performed best. The AUC box plots of the best models for different classifier models after selection of the best parameters by five-fold cross-validation are shown in Fig. 2. Histograms and line charts of the accuracies of the different models are shown in Supplementary Fig. Plots of the LR classifier model feature weights are shown in Fig. 3.

Comparison of AUC box plots of different best classifier models in the training cohort after selection of optimal parameters by five-fold cross-validation.

Feature weighting for HR (A) and DTL (B) features.
Performance of Radiomics and DTL Signatures in Training and Validation Cohorts.
AUC, area under the curve; CI, confidence interval; DTL, deep transfer learning.
The LR model for HR features and DTL features defined the HR and DTL signatures, respectively. The ROC curves for the HR and DTL signature are shown in Fig. 4, with AUCs of 0.841 (95% CI: 0.776–0.907) and

HR and DTL signature ROC comparison. (A) ROC curve of HR vs. DTL signature for the training cohort; (B) ROC curve of HR vs. DTL signature for the validation cohort.
The results showed a higher index of AUC in predicting distant metastases with the radiomics features of peritumoral tissues compared with regions of lung cancer lesions. The inclusion of peritumoral radiomics features in these studies can significantly improve the predictive power of the model. In this study, radiomics features were extracted by adding a 4-mm region around the tumor and fusing features with intratumoral radiomics features. After multiple machine learning classifier models were compared, the final LR classifier model showed the best predictive performance. This study not only incorporates peritumoral regional features in combination with intratumoral features but also different levels to extract the maximum amount of information. It was found that the 28 best-filtered DTL features contained features of different levels, further enhancing the predictive power of the model. Better results were achieved in the LR classifier model, and the results were better than those of the radiomics feature model.
The calibration curves for the radiomics and DTL signature showed good consistency between the actual results and the model predictions in the training and validation cohorts (Fig. 5). The Hosmer–Lemeshow tests for HR and DTL signature for the training cohort were P = 0.73 and 0.74, respectively. The Hosmer–Lemeshow tests for HR and DTL signature for the validation cohort were P = 0.44 and 0.35, respectively, indicating good consistency between the two models in the training and validation cohorts. Fig. 6 depicts the DCA curves for the HR and DTL signature in the training and validation cohorts. The results showed that the HR signature and DTL signature had better net gains within most threshold probabilities, and the range of the DTL signature was higher than that of the HR signature, indicating that the DTL signature has better clinical utility and higher utility than the HR signature in predicting lymph node metastasis status in NSCLC. The Delong test showed a statistically significant difference between the HR and DTL signatures (P < 0.001).

HR and DTL signature calibration curves. (A) Calibration curve of HR vs. DTL signature for the training cohort; (B) Calibration curve of HR vs. DTL signature for the validation cohort.

DCA of HR and DTL signature. (A) DCA for HR and DTL signature in the training cohort; (B) DCA for HR and DTL signature in the validation cohort.
In this study, HR features and DTL features were constructed in combination with intra-peritumoral regions in CT images to predict the metastatic status of NSCLC lymph nodes in LR classifier models, which showed good predictive performance. The DTL signature was superior to the radiomics signature. Therefore, the DTL signature may be used as an adjunct to the radiological assessment of NSCLC. To the best of our knowledge, no studies have examined HR features compared with DTL features models for predicting lymph node metastasis status in NSCLC.
Radiomics is now widely used for the identification of benign and malignant lung nodules, pathological typing and staging of lung cancer, lung cancer gene expression, and sensitivity to immunotherapy in lung cancer. The correlation between radiomics features and biological behavior has been analyzed [27–30]. However, most studies have performed radiomics analysis of the primary lesion, ignoring peritumoral tissue. Furthermore, the microenvironment surrounding the lesion is difficult to represent using conventional imaging techniques. The presentation of peritumoral radiomics is a good way to uncover and reflect this micro-environmental heterogeneity, as radiomics can provide a macroscopic view of the micro-environmental changes surrounding a tumor [31]. Wang et al. [32] predicted lymph node metastasis in clinical stage Tl lung adenocarcinoma by extracting radiomics features and tumor volume within 15 mm of the lung tissue surrounding the lung adenocarcinoma. The AUC increased to 0.869 with the addition of peritumoral radiomics compared to that with intratumoral radiomics alone (AUC = 0.825). Dou et al. [33] used peritumoral lung cancer tissue as the ROI to construct a radiomics model to predict the rate of distant metastasis in lung adenocarcinoma. Apart from conventional radiomics approaches, the ongoing advancements in computed tomography (CT) technology have enabled spectral CT imaging to furnish an array of contrastive data regarding lesions and tissues, which holds the potential to significantly augment our capability to anticipate lymph node metastasis in non-small cell lung cancer with heightened precision [34–36].
Deep learning has received increasing attention in recent years and can help improve the performance of models [37, 38]. Relatively few studies have been conducted on deep learning for predicting lymph node metastasis in NSCLC. Tau et al. [39] evaluated the potential of deep learning using a CNN to predict newly diagnosed NSCLC lymph node metastases by analyzing the characteristics of Fluorine-18 Fluorodeoxyglucose (18F-FDG) PET primary lesions. The results showed an AUC of 0.80 for predicting lymph node metastasis using CNN analysis of PET images from patients with untreated NSCLC. This indicates that CNN has some value in predicting lymph node metastasis in NSCLC; however, the high cost of PET/CT examination makes it unsuitable for widespread dissemination. Previous research on deep learning has rarely included perineural imaging. Sun et al. [40] studied the performance of CNN in breast ultrasound prediction of axillary lymph node metastasis and compared it with radiomics. The results of the CNN model constructed in the combined intra-peritumoral region were significantly improved over those of the intratumoral model alone. The predictive performance of the combined intra-peritumoral CNN model was superior to that of the combined intra-peritumoral radiomic model. To compensate for the loss of information in deep learning feature extraction owing to single images, comprehensive Two-Dimensional (2D) information was extracted. In this study, the axial section with the largest tumor region was automatically selected as the “largest tumor image.” Additional DTL features were extracted from the upper two (+1,+2) and lower two (–1, –2) slices of the largest tumor image for a total of five images. Five levels of deep learning features were fused and added to the different classifier models. Li et al. [41] referred to this approach as a 2.5D CNN model to complement the information from a single-level CNN model. Choi et al. [42] also used this approach to predict IDH mutation status in gliomas using deep learning and radiomics, with an accuracy of up to 93.8% on the test set and an AUC of 0.96. The above results show that the inclusion of multilevel image information can significantly improve the prediction of deep learning models, complementing the lack of information at the individual level, although better results can also be achieved at the individual level.
Resnet50 is a deep learning model that learns increasingly advanced features from the input image through a series of successive linear and nonlinear layers [39]. Compared with traditional HR features, higher-order DTL features can provide further complementary information that can improve the performance of the model [43]. In this study, the DTL signature outperformed the radiomics signature in terms of accuracy, sensitivity, specificity, precision, recall, and F1 score.
This study has some limitations. First, our data were obtained from a relatively small dataset from a single institution and may not yet take full advantage of deep learning. Despite incorporating multiple layers of image information, we inevitably face challenges with data diversity. Second, the ROIs for this study were manually outlined by radiologists and were subject to human error. Prospective, multicenter, large-sample, multimodal imaging model associations are also needed for future research.
Conclusions
The DTL signature constructed based on intra-peritumoral regions in CT images can better predict NSCLC lymph node metastasis.
