Abstract
OBJECTIVE:
To establish and validate a model capable of predicting lymph node metastasis (LNM) of non-small cell lung cancer (NSCLC) patients.
METHODS:
Preoperative clinical and CT imaging data on patients with NSCLC undergoing surgery were retrospectively analyzed. A model was developed using a training cohort of 290 patients. The univariate analysis followed by dichotomous logistic regression was performed to estimate different risk factors of lymph node metastasis, and a nomogram was constructed. Using another testing cohort of 120 patients, the performance of the nomogram was validated using several evaluation methods and indices and evaluated including via the area under the curve (AUC), calibration curve, Hosmer-Lemeshow test and decision curve analysis (DCA).
RESULTS:
CT-based imaging signs were important independent risk factors for lymph node metastasis in NSCLC patients. The possible risk factors also included four other independent risk factors through dichotomous logistic regression, i.e., age, SIRI, PNI and CEA, which were filtered and included in the nomogram. Nomogram yields AUC values of 0.828 [95% confidence interval (CI): 0.778–0.877] in the training cohort and 0.816 (95% CI: 0.737–0.895) in the validation cohort, respectively. The calibration curves showed high agreement in both the training and validation cohorts. At the threshold probability of 0–0.8, the nomogram increases the net outcomes compared to the treat-none and treat-all lines in the decision curve.
CONCLUSIONS:
The nomogram based on the PNI and CT images signs holds promise as a novel and accurate tool for predicting the LNM in NSCLC patients and guiding intraoperative lymph node dissection.
Keywords
Introduction
For both sexes combined, lung cancer is the leading cause of cancer death in the world, with 1761,000 deaths occurring annually, imposing a public health burden [1]. Non-small cell lung cancer (NSCLC) is the most common histology type and represents 85% of all newly diagnosed lung cancer cases [2]. The major and most common pathway of metastasis for NSCLC is lymph node metastasis (LNM). A lot of studies have shown that metastasis to lymph nodes played an important role in the staging of NSCLC, and accurate staging was a crucial step to provide optimal treatment [3]. Assessment of lymph node metastasis relies on traditional computed tomography (CT) results and lymph nodes that present with short-axis diameters of more than 1 cm are generally defined as metastatic lesions [4]. However, the predicting accuracy of CT scan is not satisfactory enough and there are a certain false negative rate and false positive rate [5, 6]. Lymph nodes with a short diameter greater than 10 mm are not necessarily metastatic lesions; instead, lymph nodes with a short diameter less than 10 mm may also have been metastatic.
According to previous studies, PET-CT had better diagnostic ability than CT in LNM and exhibited a high negative predictive value (NPV) [7]. The sensitivity and the specificity of PET-CT were 80–90% and 85–95%, respectively [5, 8]. However, in PET-CT high specificity comes with a price of a high rate of false positives [9]. Besides, PET-CT required more expensive equipment and may contribute to excess health-care costs as well as a larger radiation dose. Other invasive diagnostic methods such as mediastinoscopy and CT or ultrasound-guided percutaneous tissue biopsy (ultrasound-guided fine-needle aspiration) were valuable.
Unfortunately, thoracic hematoma, infection, pneumothorax and needle-path tumor implantation due to these procedures might occur [3]. The nomogram is a pictorial presentation of a complex mathematic regression model for precisely calculating the continuous probability of a specific outcome for an individual patient. It has a user-friendly interface and has been accepted as a reliable tool for improving the accuracy of malignancies prognosis prediction and quantifying risk when making treatment-related decisions [10, 11]. As a non-invasive and accurate imaging method (which combined CT imaging signs and hematological parameters but does not need invasive diagnostic methods such as mediastinoscopy and CT or ultrasound-guided percutaneous tissue biopsy to predict LNM), it makes up for the deficiencies of the existing imaging methods in the diagnosis of benign and malignant lymph nodes.
Several factors have been revealed to be related to lymph node metastases in NSCLC by previous studies [12–14]. Xia et al. revealed that young age increased the risk of lymph node involvement and Ding et al. reported that tumor size, histologic differentiation, bronchus invasion and smoking history were correlated to lymph node positivity [12, 15]. Previous studies have also shown that prognostic nutritional index is an independent factor for affecting the survival of patients with lung cancer [16, 17], but no studies have shown that there is a correlation between preoperative PNI and LNM in patients with non-small cell lung cancer. Therefore, this study aimed to investigate the value of the PNI on LNM of NSCLC patients and to construct and validate a practical and effective nomogram, which combines patients’ characteristics with the findings of chest CT imaging signs and hematological features for preoperative noninvasive prediction of LNM in patients who had undergone surgical resection in NSCLC.
Patients and methods
Patients selection
We retrospectively collected and analyzed preoperative clinical and CT data for patients diagnosed with lung cancer who underwent lobectomy or segment resection with systematic lymph node dissection at the Second Affiliated Hospital and Yuying Children’s Hospital of Wenzhou Medical University from January 2015 and March 2021. All data were collected in our hospital, and all hematological indicators were obtained within one week before surgery. The database from our hospital was screened meticulously to select the potentially eligible patients who were: (I) aged over 18 years old; (II) histopathologically proved to NSCLC; (III) with available high-resolution CT chest before surgical resection; (IV) initial diagnosis rather than recurrent tumors.
The exclusion criteria were: (I) multiple primary tumors; (II) no lymph node examined; (III) incomplete data; (IV) merged with acute infectious diseases which can cause changes in systemic inflammatory marker level.; (V) received radiotherapy or chemotherapy before surgery; (VI) histologic type was minimally invasive adenocarcinoma and adenocarcinoma in situ; (VII) distant metastasis emerged; (VIII)central lung cancer patients with obstructive pneumonia. Ultimately, 410 patients were identified. Patients who were included from January 2015 to April 2019 were defined as the training cohort and patients who were included from May 2019 to March 2021 were defined as the validation cohort. All procedures performed in this study involving human participants were in accordance with the Declaration of Helsinki (as revised in 2013). The study was reviewed and approved by the Institutional Review Board (IRB) of the Second Affiliated Hospital and Yuying Children’s Hospital of Wenzhou Medical University (approval No.2021-K-76-02), and individual consent for this retrospective analysis was waived.
Details for the recruitment and selection criteria of the patients included in this study were shown in Fig. 1.

Flowchart detailing the selection of the patients in this study.
All Chest CT images were obtained with an Optima CT660(64T) scanner (GE, the United States of America). The scanning parameters were as follows: tube voltage, 120 kV; tube current, real-time adjustment according to the patient’s body shape but no more than 350 mAs; reconstruction filter, filtered back-projection; scanning thickness, 2.5 mm; rotation speed, 0.6 s/circle; and matrix size, 512×512.
Chest CT Image signs acquirement
The CT examination was performed within 30 days before surgery, and the imaging features included specific signs such as lobulation, speculation, pleural indentation, vacuolation sign and tumor size. The spiculation was defined as the edge of a nodule or mass that extends to the tissue surrounding lung parenchyma with a width of 1–2 mm and a length of 2–5 mm but not into the pleural margin [18]. We define a lobular sign as a nodule or mass’s outline with multiple arc-shaped protrusions [19]. Pleural indentation was identified as the tentorial image which can be seen between the nodule and the pleura with the tip pointing to the nodule and the base in the pleura. The vacuolation sign refers to the 1–2 mm low-density air-bearing image seen in the nodule or mass [20]. 1 point is assigned for each of the signs above.
Tumor size was measured on the CT image. We obtained the cutoff values of tumor size by the receiver operating characteristic (ROC) curve analysis. The optimal cut-off value for size was 2.5 cm and then converted into a binary variable, which means if tumor size > 2.5 cm, the value of the tumor size should be 1 for high-risk, and it counts as 1 point.
The inter-observer agreement for image features was assessed with the inter-class correlation coefficient (ICC), and a value > 0.75 was considered to be reproducible [21]. The CT imaging signs were reviewed by two radiologists independently. When their opinions were inconsistent, they discussed to reach a consensus, then took the final consensus as the results and recorded it. We used the integration method for imaging features, and each positive sign was counted as one point. Table 1 showed the integration for chest CT imaging signs.
Imaging features assessment diagnostic approach
Imaging features assessment diagnostic approach
For each enrolled patient, all information was collected in the hospital information system, i.e., age, gender, body mass index (BMI), smoking history, routine blood work prior to treatment including albumin, leukocyte count, lymphocyte count, monocyte count, platelet count and carcinoembryonic antigen (CEA). For the validation cohort, the same variables were also extracted.
The model indices were calculated as follows. Previous study has used the equations of NLR, LMR, PLR, SII, SIRI, PNI and ALI. [22]. The neutrophil-to-lymphocyte ratio (NLR) was defined as the neutrophil count divided by the lymphocyte count. The lymphocyte-to-monocyte ratio (LMR) was calculated by dividing the lymphocyte count by the monocyte count. The platelet-to-lymphocyte ratio (PLR) was determined as the platelet count divided by the lymphocyte count. The systemic immune inflammation index (SII) was calculated as platelet count×neutrophil count/lymphocyte count (109/L). The systemic inflammatory response index (SIRI) was calculated as monocyte count×neutrophil count/lymphocyte count (109/L). The prognostic nutritional index (PNI) was the sum of serum albumin (g/L) and 5×lymphocyte count (109/L). The advanced lung cancer inflammatory index (ALI) was calculated as BMI (kg/m2)×ALB(g/dl) /NLR.
Statistical analysis
Statistical analysis was performed using SPSS 26.0 (IBM Armonk, NY, USA). Continuous variables were using the t test or Mann–Whitney U test and categorical variables were using chi-square tests or Fisher’s exact tests. The cut-off values for model indices were obtained via receiver operating characteristic (ROC) curve analysis, and then these continuous variables were converted into binary variables according to cut-off values. Significant variables in the univariate analysis were further analyzed using polytomous logistic regression in order to identify independent risk factors for LNM in NSCLC in the training cohort. The nomogram of predicting LNM in NSCLC patients was constructed based on age at diagnosis, hematological features and CT imaging signs.
A logistic regression model-based nomogram was developed using the training cohort. Two-tailed p-values < 0.05 were considered statistically significant and then included in model construction. The risk score for LNM was determined from the nomogram. The ROC curves were constructed to evaluate the model performance in the training cohort and verify it in the validation cohort. We used bootstrapping (resampled 1,000 repetitions) to graph calibration curves evaluating the predictive accuracy of the nomogram, followed by a Hosmer–Lemeshow test (P > 0.05 indicating good fit). Decision curve analysis (DCA) was used to evaluate the clinical utility of the model. Decision curve analysis is a method to evaluate prediction models and diagnostic tests that were introduced by Vickers and Elkin in a 2006 publication in Medical Decision Making [23]. The above steps were analyzed in R, version 4.0.3. The main R package used included “rms”, “pROC”, “rmda” and “ResourceSelection”.
Results
Imaging features, demographic and clinical characteristics
As shown in Fig. 1, 410 patients were enrolled in this study: 290 patients in the training cohort and 120 patients in the validation cohort. The characteristics of patients in the two cohorts were displayed in Table 2. Total 410 patients’ lymph nodes were examined by surgical pathology, including 222 male patients and 188 female patients. In the CT image signs, the inter-observer agreement is greater than 0.800, indicating that the two observers’ results are highly consistent. Compared with the validation cohort, the lobulation sign (70 vs. 50, P < 0.001) and tumor size (3.00±1.57cm vs. 2.55±1.59cm, P = 0.009) in the training cohort were statistically different. No significant differences were observed in two cohorts in other variables, with P values ranging from 0.082 to 0.957.
Demographic and clinical characteristics
Demographic and clinical characteristics
CT, computed tomography; BMI, body mass index; CEA, Carcinoembryonic antigen.
In the training cohort, the optimal cut-off values for age, BMI, NLR, LMR, PLR, SII, SIRI, PNI, CEA, were determined to be 65 years old, 20.42 kg/m2, 2.06, 4.55, 128.32, 455.63, 0.80, 54.59, 3.26 ng/ml, respectively, as per the ROC curves. All variables were then divided into categorical variables and analyzed via univariate and multivariate regression analysis. Table 3 showed the univariate analysis results. In univariate analysis, LNM was prone to be found in smoking males who were hospitalized with higher levels of PLR, NLR, SIRI, SII, CEA and lower levels of LMR, PNI, ALI. Tumors with high lever integration image scores on CT were associated with a higher likelihood of lymph node metastasis. Besides, younger patients were more likely to have lymph node metastasis. Table 4 showed the dichotomous logistic regression analysis results. Age [odds ratio (OR) 2.348, 95% CI: 1.351–4.079, P = 0.002), imaging score (OR 3.949, 95% CI: 2.255–6.915, P < 0.001), SIRI (OR 2.585, 95% CI: 1.482–4.509, P = 0.001), PNI (OR 2.682, 95% CI: 1.241–5.797, P = 0.012), and CEA (OR 3.364, 95% CI: 1.948–5.812, P < 0.001).were identified as independent risk factors for LNM in NSCLC.
Univariate analysis results in the training cohort
Univariate analysis results in the training cohort
LNM, lymph node metastasis; BMI, body mass index; PLR, platelet-to-lymphocyte ratio; NLR, neutrophil-to-lymphocyte ratio; LMR, lymphocyte-to-monocyte ratio; PNI, prognostic nutritional index; SIRI, systemic inflammatory response index; ALI, advanced lung cancer inflammatory index; SII, systemic immune inflammation index; CEA, Carcinoembryonic antigen.
Multivariate dichotomous logistic regression results in the training cohort for predicting lymph node metastasis
CI, confidence interval; SIRI, systemic inflammatory response index; PNI, prognostic nutritional index; CEA, Carcinoembryonic antigen.
Dichotomous logistic regression identified five independent predictors for lymph node metastasis: imaging score, age, and levels of SIRI, PNI and CEA (Table 4). Other variables were not involved as significant factors. The formula predicting LNM for NSCLC was established: ex/(1 + ex), x = 1.373× score + 0.853×age +0.987×PNI level + 0.950× SIRI level + 1.213×CEA level+ –3.279. The value of “younger age”, the low level of “PNI”, high level of “SIRI” and high level of “CEA” should be 1 for high-risk and otherwise 0. A nomogram predicting the probability for LNM in NSCLC patients was developed on the basis of dichotomous logistic analysis (Fig. 2), which showed that imaging score had the greatest impact on the prediction of LNM in NSCLC, followed by CEA, PNI, age, and SIRI. We plotted the sum of each variable on the total point axis and obtained an estimated LNM rate by drawing a vertical line from the drawn total point axis down to the result axis. Validation was first performed in the training cohort internally. The performance of the nomogram in the training cohort was evaluated using the ROC curves and calibration curves. The AUC for the probabilities of LNM was 0.828 (95% CI:0.778-0.877) (Fig. 3, A).

The nomogram for predicting the risk of LNM in patients with NSCLC. LNM, lymph node metastasis; NSCLC, non-small cell lung cancer.

ROC curves of the model for predicting lymph node metastasis in(A) the training and (B) validation cohorts. (A) AUC was 0.828(95% CI:0.778–0.877) in the training cohort. (B) AUC was 0.816(95% CI:0.737–0.895) in the validation cohort. ROC, receiver operating characteristic; AUC, area under the curve.
We further validated the model in the validation cohort using the same method. In the validation cohort, the model had an AUC of 0.816 (95% CI: 0.737–0.895) (Fig. 3, B), which showed good discrimination. And we summarized the predictive performance of a predictive model in training and validation in Table 5. Our nomogram model was also well-calibrated in both internal and external validation (Fig. 4, A, B), with the mean absolute error of 0.012 and 0.032, respectively. The Hosmer-Lemeshow test had chi-square value of 5.4396(P = 0.7037) in the training cohort and 14.667 (P = 0.066) in the validation cohort which suggests that the logistic regression model had a good fit. DCA for the nomogram was presented in Fig. 5, A and B. The nomogram provides a better net benefit than treat-all or treat-none lines in the decision curve.
Area under the ROC (AUC), P-value (P), 95% confidence interval of AUC, sensitivity, and specificity

Calibration curves of the nomogram to predict LNM in the NSCLC in (A) the training and (B) validation cohorts. The x-axis represents the predicted risk of LNM. The y-axis represents the actual diagnosed LNM. The diagonal dotted line represents a perfect prediction by an ideal model. The solid line represents the performance of the nomogram, where a closer fit to the diagonal dotted line represents better prediction.

The DCA for the model in (A) the training and (B) validation cohort. The net benefit is measured by the y-axis, which is calculated by summing the benefits (true-positive findings) and deducting the harms (false-positive findings) while weighting the harms associated to the relative damage of undetected LNM when compared with the damage of being mistakenly diagnosed with LNM.
Due to the rapid development of screening approaches such asCT and widely applied annual health checks, an increasing number of lung cancer patients were detected at an early stage of disease [24]. Surgery was standard care for early stages NSCLC patients confirmed with no distant metastasis, and lobectomy with systematic lymph node dissection was the most common procedure [25, 26]. Pulmonary and systematic lymph nodes dissection was widely performed for patients suspected with LNM and was believed to be beneficial for patients, which was also supported by abundant evidence [27, 28]. However, it is not a rare scenario that post-operative pathologic examination showed no lymph node metastasis. In this era of minimally invasive surgery, there is a rising opinion that selective lymph node dissection may be a better alternative for systematic lymph node dissection, because it can reduce the operation time, blood loss and hospital stay, and may lead to a better quality of life afterwards [29–31]. Moreover, compared with those who underwent systematic lymph node dissection, patients that received selective lymph node dissection also presented a lower incidence of perioperative complications but similar survival [32, 33]. It is not difficult to understand that accurate evaluation before surgery to decide the danger of lymph node involvement and the range of lymph node dissection during operation become essential.
In this study, NSCLC patients were retrospectively reviewed for the development and validation of a nomogram for evaluating the risk of LNM. We established a nomogram based on SIRI and PNI. The nomogram showed good performance for predicting LNM as per the ROC curve, calibration curve, and internal cohort verification. Therefore, our nomogram might be a potential tool for surgeons to help them in making decisions of surgical procedure.
Relationships between nutrition status and cancer prognosis had gained attention from researchers over the last decade. Studies had shown that malnutrition often leads to poor tumor prognosis and LNM in malignant tumors [34–36]. The PNI was first reported by Onodera in Japan, as a nutrition status combining the albumin level and lymphocyte count [37]. It is increasingly used for hospitalized patients to evaluate their immuno-nutrition status. Although there has been no comprehensive study on the relationship between the PNI and LNM in the field of NSCLC, some meta-analyses were performed for patients with intraductal papillary mucinous neoplasm, gastric cancer and esophageal cancer to explore the prognostic role of PNI and the relationship between PNI and clinicopathological characteristics [34, 39]. In those analysis, a low PNI was found to have a significant relationship with the presence of LNM. Moreover, LNM may lower serum albumin level, resulting in a lower PNI, as observed with gastric cancer [40]. Our results indicated that low PNI (≤54.585) was an independent risk factor for LNM, and PNI based on our nomogram showed good prediction performance. Above all, we believed the nomogram based on PNI can effectively predict the LNM in NSCLC.
There was growing evidence indicating that local immune responses and systemic inflammation promote the growth, deterioration and metastasis of tumors [41]. Recent study had revealed that neutrophils in the peripheral blood or tumor microenvironment could produce angiogenic factors stimulating tumor angiogenesis and metastasis [42, 43]. Monocytes are a significant component of inflammatory infiltrates in neoplastic tissues, which produce a number of potent angiogenic and lymphangiogenic growth factors, cytokines and proteases, those are mediators that facilitate tumor cell invasion and metastasis [44–46]. Also, monocytes express VEGF-C and VEGF-D as well as the VEGF receptor-3 (VEGFR-3), which are implicated in the formation of lymphatic vessels and lymphatic metastases [44]. L-selectin on neutrophils and monocytes also may facilitate metastasis [47].
In addition, low lymphocyte counts weaken anti-cancer defenses and immune surveillance, then tumor cells are prone to metastasis [48]. In addition, the expression of L-selectin on tumor cells can foster metastasis to lymph nodes [49]. It shows that neutrophils, monocytes and lymphocytes are related to lymph node metastasis. SIRI is obtained based on the routine measurement of neutrophils, lymphocytes, and monocytes in clinical work. It is a more objective indicator that reflects the balance of inflammation and immune response in the body. It reflects the complex interactions and potential synergy between neutrophils, monocytes and lymphocytes in the tumor microenvironment. These may indicate that SIRI is related to lymph node metastasis. More importantly, SIRI is a convenient, low-cost, and effective inflammatory indicator to predict the LNM. This study also indicated that high level SIRI had a strong association with LNM in patients with NSCLC. High level SIRI (OR 2.514, 95% CI: 1.446–4.380, P = 0.001) were more likely to have LNM than low level SIRI. Cancer cachexia is reportedly driven by a sustained inflammatory response [50]. Therefore, we believe that inflammation and malnutrition jointly promote tumor progression; thus, our nomogram comprehensively includes both nutrition and inflammation indexes.
The role of CEA as a classic tumor maker has already been acknowledged, can not only indicate the occurrence and development of lung cancer, but also be directly related to tumor infiltration and metastasis, and the use of this factor to predict LNM in patients with NSCLC has been widely studied [51–53]. A large number of studies have shown that CEA can well reflect the biological activities of tumor cells, such as proliferation, infiltration, invasion, migration and other capabilities [54–56]. In our study, we also found that preoperative serum CEA level (OR:3.364, 95% CI:1.948–5.812) was an independent risk factor of LNM. Thus, the serum CEA level might be helpful for LNM prediction and should be considered before surgery.
Moreover, we found that the imaging features including spiculation, lobulation, vacuole sign, tumor size were risk factors associated with LNM. Cong et al. and Yang et al. demonstrated that tumor with spiculation was more likely to have LNM [57, 58]. In our study, we explored the value of individual’s signs of LNM using a unified scoring method, in which each sign was counted as one point. And tumor size was measured on the CT image. We convert the tumor size into a binary variable, and accumulate 1 point for more than 2.5 cm. We analyzed the imaging scores related to LNM and tried to develop and validate a cost-efficient nomogram based on SIRI and PNI for evaluating the risk of LNM in patients with NSCLC. This study indicated that high level image scores had a strong association with LNM in NSCLC patients. High level image scores (OR:3.949, 95% CI: 2.255–6.915) were more likely to have LNM than low level image scores.
Some previous studies had developed nomogram models using radiomics for predicting LNM [59, 60]. However, identifying radiomics features needed special techniques and the radiomics models might be difficult to be adopted in clinical. In comparison to using radiomics models, our nomogram makes use of imaging features which are routinely recorded, which could be more feasible in clinical practice. Our nomogram also included a common factor: age; for these, we included CT characteristics, and serum indicators to optimize the overall predictive ability.
There also exist several limitations due to the nature of the study. First, this was a non-randomized and retrospective study, therefore, pre-treatment blood tests could not be performed at a defined baseline time point. Second, certain weaknesses exist for using a single center, without external or multicenter verification. Third, the relatively small number of patients enrolled in the model was may have an influence on the accuracy of the model. Furthermore, our study did not involve clinicopathological characteristics, and due to the inclusion and exclusion criteria and the surgical status of central lung cancer patients in our hospital, only a small number of patients with central lung cancer were included. Therefore, future studies may consider prospective verification studied, multicenter, large-sample size and clinicopathological characteristics as much as possible.
In conclusion, we constructed a simple, practical and reliable nomogram for calculating the risk of LNM in NSCLC, which integrating CT imaging signs and hematological parameters. It had achieved a great predictive ability and clinical applicability, and hence was helpful for clinicians’ decision making.
Declaration of competing interest
All authors have read and approved the submitted manuscript. There are no conflicts of interest. The manuscript has not been submitted elsewhere nor published elsewhere in whole or in part. All relevant ethical safeguards had been met.
