Abstract
Objective
This study aimed to establish a diagnostic nomogram for identifying synchronous lung metastasis at initial diagnosis in osteosarcoma patients, and to descriptively analyze overall survival patterns in patient subgroups.
Methods
A total of 1149 eligible osteosarcoma cases diagnosed between 2010 and 2015 were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Candidate predictors were screened by univariate logistic regression (p < 0.10) and entered into backward stepwise multivariate logistic regression (retention p < 0.05) to construct a diagnostic nomogram. Discrimination was evaluated by the area under the receiver operating characteristic curve (AUC), calibration by calibration plot, and clinical utility by decision curve analysis (DCA) with bootstrapping validation. Secondary survival analysis used Kaplan-Meier curves and log-rank tests to describe overall survival patterns.
Results
Synchronous lung metastases were found in 213 patients (18.5%) at diagnosis. The nomogram incorporated age, race, sex, lymph node stage, grade, tumour size, tumour site, and histology. It achieved an AUC of 0.754 (95% confidence interval 0.719 - 0.799) and a bootstrap-calibrated C-index of 0.745. Calibration was satisfactory, and DCA demonstrated superior net benefit compared with traditional staging approaches, including AJCC stage, grade alone, size alone, and N stage alone. In exploratory survival analysis, patients with synchronous metastasis had significantly poorer overall survival, and primary tumour surgery was associated with prolonged median survival.
Conclusion
The diagnostic nomogram, incorporating tumour grade, site, size, and lymph node stage, provides a reliable tool for estimating the probability of synchronous lung metastasis at initial diagnosis, which may assist clinicians in guiding initial staging intensity and treatment planning. However, owing to the cross-sectional design of the SEER database, this model is not designed for predicting future metastatic progression.
Introduction
Although osteosarcoma is a rare disease, it is the most common primary bone malignancy that occurs during childhood and adolescence.1–4 The morbidity due to osteosarcoma is estimated to be 0.2 - 3 patients/100, 000 people annually. 5 Approximately 20% cases of osteosarcoma have synchronous lung metastases at the time of diagnosis.6,7 Synchronous metastases is associated with a high mortality rate and can often be misinterpreted as trauma or growing pains, leading to neglect at its onset. 8 Furthermore, approximately 50% of patients initially presenting without lung metastases eventually develop it during chemotherapy. Despite the development of numerous new therapies for the treatment of osteosarcoma, 30%–40% cases suffer from recurrence, with long-time, post-relapse survival of <20%.9,10 The early diagnosis and treatment of osteosarcoma is beneficial. Therefore, identifying synchronous metastasis of osteosarcoma at the time of diagnosis is crucial.
The American Joint Committee on Cancer and Enneking classification systems have been widely utilised for the classification of osteosarcoma based on primary staging according to the presence of metastasis and histological grade at diagnosis.11,12 In addition to factors adopted in the clinical classification, numerous clinical factors such as age, 13 tumour site, 11 and histological type 14 have also been identified as the diagnostic indicators for synchronous metastasis at diagnosis. No single factor can precisely identify synchronous metastasis. To achieve accurate diagnostic prediction at presentation, multiple predictors must be integrated using statistical prediction models.15–17 Nomograms are new approaches in addition to conventional classification systems to assist in diagnostic decision-making at initial presentation for different tumours.18–20 Several nomograms are being developed for soft tissue sarcoma21,22 and osteosarcoma.23–25
The primary objective of this study was to develop a diagnostic nomogram for identifying synchronous lung metastasis at initial diagnosis in osteosarcoma patients using baseline clinical features available at presentation. The secondary objective was to descriptively analyze overall survival patterns in patient subgroups with and without synchronous metastasis, entirely separate from the diagnostic prediction model. Although survival analysis is methodologically independent from the diagnostic model, descriptive characterization of survival patterns provides essential clinical context for interpreting the diagnostic tool’s utility in practice. This study aims to provide a tool to assist clinicians in determining the necessity of comprehensive staging workup at initial presentation (e.g., chest CT evaluation), not to predict future metastatic progression. Therefore, the diagnostic factors for synchronous lung metastasis in osteosarcoma were examined in a large population-based cohort, and survival patterns were characterized descriptively.
Materials and methods
Patients
The Surveillance, Epidemiology, and End Results (SEER) database, the largest oncology patient database in the USA, was used to collect data for this retrospective study. Data were extracted using the SEER*stat 8.3.5 software from the official website: https://seer.cancer.gov25. Patients diagnosed with osteosarcoma between 2010 and 2015 were identified using the following ICD-O-3 histology codes: 9180/3 (osteosarcoma, NOS), 9181/3 (chondroblastic osteosarcoma), 9182/3 (fibroblastic osteosarcoma), 9183/3 (telangiectatic osteosarcoma), 9184/3 (osteoblastoma-like), 9185/3 (small cell osteosarcoma), 9186/3 (central osteosarcoma), 9192/3 (parosteal osteosarcoma), 9193/3 (periosteal osteosarcoma), 9194/3 (high-grade surface osteosarcoma), and 9195/3 (intramedullary well-differentiated osteosarcoma). Patients with available data were included in the study. The following variables were collected for each patient: sex (male, female); age (≤24, 25–59, ≥ 60 years); race [white, black, Asian or Pacific Islander (API), American Indian/Alaska Native (AI)]; metastasis status; tumor size (≤8 cm, > 8 cm); tumor site (axial skeleton, spine, ribs, pelvis; extremities: long and short bones of the four extremities); histology (osteosarcoma NOS [ICD-O-3 codes 9180-9186], Parosteal osteosarcoma [ICD-O-3 code 9192]); regional lymph node stage (N0, N1); and tumor grade, categorized as highly and moderately differentiated (ICD-O-3 Grades 1/2) for low-grade tumors, or poorly differentiated or undifferentiated (ICD-O-3 Grades 3/4) for high-grade tumors. SEER records metastasis status only at diagnosis, precluding the analysis of metachronous metastases that develop during follow-up. Cases with missing metastasis status, tumor size, grade, or N stage were excluded from the analysis. Age was categorized into ≤24, 25–59, and ≥60 years based on clinical relevance to osteosarcoma epidemiology and consistent with prior SEER-based osteosarcoma studies. Tumor size was dichotomized at 8 cm based on clinical convention, prior literature, and ROC-derived optimal threshold using Youden index.
Construction of the nomogram
The present study conducted univariate nonadjusted logistic regression for evaluating diverse parameters such as sex, age, race, tumour size, tumour site, grade, histology, and N stage to identify factors associated with synchronous distant metastasis (DM) at diagnosis in patients with osteosarcoma. Logistic regression was selected as the appropriate statistical method for this diagnostic aim, given the cross-sectional nature of SEER metastasis data. Prognostic prediction of future metastatic progression was not attempted due to the absence of time-to-metastasis recording in the database. All candidate predictors were first assessed using univariate analysis, and variables with a liberal threshold of p < 0.10 were considered for multivariate analysis. These candidates were then entered into a multivariate logistic regression model using backward stepwise elimination with a retention criterion of p < 0.05. The final independent predictors were used to construct the nomogram, which visualised the prediction model based on the obtained regression coefficients.
Validation of the nomogram
The area under the receiver operating characteristic curve (AUC) was used to evaluate the nomogram discriminative ability for synchronous metastasis status at diagnosis. The AUC was reported as the primary discrimination metric for this diagnostic model. The concordance index (C-index) was reported secondarily for completeness but should be interpreted in the diagnostic rather than prognostic context. Bootstrapping validation was performed using 1, 000 bootstrap resamples to calculate the corrected C-index. Our diagnostic nomogram was then calibrated and validated with a calibration curve based on 1, 000 bootstrap resamples. The clinical utility of the nomogram was assessed and compared with four traditional staging approaches: AJCC stage (I/II vs. III/IV), tumor grade alone (I–II vs. III–IV), tumor size alone (≤8 cm vs. >8 cm), and N stage alone (N0 vs. N1). For each traditional approach, the predicted probability of synchronous metastasis was calculated based on the observed prevalence in each category, and DCA was performed to compare net clinical benefit across threshold probabilities. R software (version 3.5.1) and SPSS23.0 (IBM Corporation, Armonk, NY, USA) were utilised for data analysis. A P value < 0.05 (by two-tailed test) was considered statistically significant.
Secondary survival analysis (exploratory, descriptive only)
Distinct from the primary diagnostic prediction model, the following survival analyses were performed as exploratory secondary analyses to describe baseline survival patterns in the study cohort. These analyses do not establish prognostic prediction models due to the retrospective cross-sectional design without time-to-metastasis data. The present study regarded overall survival (OS) as secondary descriptive endpoint. OS indicated the duration between the initial diagnosis of osteosarcoma and cancer-related death. On the other hand, difference in survival was analysed using log-rank tests and Kaplan–Meier (K–M) curves.
Results
Patient features and univariate logistic regression
Baseline characteristics of 1149 osteosarcoma patients stratified by synchronous lung metastasis status at diagnosis.
Note: OR = odds ratio; CI = confidence interval.
Multivariate analysis and nomogram
Predictive logistic multivariate regression model parameters.
Note: β, regression coefficient; CI, confidence interval.

Nomogram for osteosarcoma patients to predict the risk of metastasis.
Validation and performance of the prediction nomogram
For the diagnostic prediction of synchronous metastasis status at diagnosis, our nomogram had a C-index value of 0.759 (95% CI: 0.719 - 0.799) in predicting metastasis, and this value was then calibrated by bootstrapping validation as 0.745. Additionally, the ROC curve was plotted (Figure 2(a)) with the AUC value of 0.754 for discriminating synchronous metastasis status at diagnosis, demonstrating favourable discriminating ability. The calibration curve was highly consistent between the nomogram-predicted and measured results in estimating synchronous metastasis probability at diagnosis among osteosarcoma cases (Figure 2(b)). DCA is a new approach that is utilised for evaluating the prediction models. The DCA demonstrated the satisfying net benefits of our nomogram across many threshold probabilities. Our constructed nomogram exhibited better net clinical benefits in the diagnostic identification of synchronous metastasis among osteosarcoma cases than the four traditional staging approaches (AJCC stage, tumor grade alone, tumor size alone, and N stage alone) (Figure 2(c)). Specifically, the nomogram demonstrated superior net benefit across threshold probabilities of 10%-60%, whereas individual staging variables showed limited or no net benefit in this range. The performance evaluation of the nomogram in predicting lung metastasis of osteosarcoma. (a) ROC curve demonstrating the discriminative ability of the nomogram. (b) Calibration curve comparing the predicted and observed metastasis probabilities. (c) Decision curve analysis (DCA) showing the clinical net benefit of the nomogram compared to conventional prediction methods.
Survival analysis for metastases
As an exploratory secondary analysis distinct from the primary diagnostic model, K-M analysis exhibited decreased OS in patients with synchronous metastasis at diagnosis (Figure 3(a); p < 0.001), elder patients (Figure 3(b); p < 0.001), those with axial location (Figure 3(c); p < 0.001), those with high-grade osteosarcoma (Figure 3(d); p < 0.001). Furthermore, OS in patients who received surgery at the primary site was significantly higher than those who did not receive surgery (Figure 3(e); p < 0.001). However, this observation must be interpreted with extreme caution due to inherent treatment-selection bias: patients without detectable metastasis at diagnosis are more likely to undergo definitive surgery, while those with extensive synchronous metastasis may be deemed unresectable. Causal inference about surgery’s survival benefit cannot be drawn from this retrospective analysis. Kaplan-Meier survival curves of Osteosarcoma for clinical risk factors. (a) Metastasis, (b) Age, (c) Tumor_site, (d) Grade, (e) surgery.
Discussion
This study developed and internally validated a diagnostic nomogram for synchronous lung metastasis at initial diagnosis in osteosarcoma using a population-based SEER cohort of 1149 patients. The overall prevalence of synchronous metastasis was 18.5%, a rate lower than that reported in a single-centre study (29.5%), 26 likely reflecting differences in referral patterns and disease severity between population-based and hospital-based cohorts. The model incorporated age, race, sex, tumour size, tumour site, histology, grade, and N stage, and demonstrated moderate discriminative ability with an AUC of 0.754 (95% CI 0.719–0.799) and a bootstrap-calibrated C-index of 0.745. Calibration was satisfactory, and decision curve analysis indicated consistent net clinical benefit across threshold probabilities of 10%–60%, outperforming individual traditional staging variables including AJCC stage, tumour grade alone, tumour size alone, and N stage alone. However, the moderate AUC suggests that critical predictive information is not captured by SEER variables; notably, serum alkaline phosphatase, MRI-defined soft tissue extension, response to neoadjuvant chemotherapy, and detailed lung nodule characteristics (number, size, laterality) are all absent from the database yet likely contribute substantially to metastasis risk. Future models incorporating such variables may achieve substantially higher discriminative performance.
Our findings align with and extend the existing literature on clinical predictors of synchronous metastasis in osteosarcoma. Advanced age (≥60 years) was associated with a markedly increased synchronous metastatic rate, consistent with the distinct biological behaviour of osteosarcoma in older adults, in whom the disease often presents as a secondary malignancy with more aggressive features. Neither race nor sex was significantly associated with synchronous metastasis prevalence in our cohort, a finding that concords with prior SEER-based studies.27,28 Some single-centre series have reported higher prevalence in males and in tumours of the tibia and femur 29 ; these discrepancies likely reflect the fact that these are the most common primary sites rather than a true sex- or site-specific biological predisposition. Large tumour size and high tumour grade each independently predicted synchronous metastasis in our model, corroborating evidence from Munajat et al. 30 and Zheng et al., 31 and consistent with the biological paradigm that larger, high-grade tumours have undergone more cell divisions and acquired a greater propensity for haematogenous dissemination.32,33 Tumour site and N stage further refined risk stratification: axial tumours carried a relatively higher probability of synchronous metastasis despite their lower absolute frequency, likely attributable to delayed detection due to deep anatomical location and proximity to vital vascular structures facilitating early haematogenous spread. The survival implications of these diagnostic factors were reflected in descriptive analyses: patients with synchronous metastasis experienced markedly poorer overall survival, and those with axial tumours and higher stage fared worst, in agreement with Lin et al. 34 and other reports. 35 It is important to distinguish this diagnostic tool from models predicting metachronous metastasis during chemotherapy. Approximately 50% of initially non-metastatic patients develop lung metastases during neoadjuvant treatment, yet SEER’s cross-sectional recording precludes analysis of this dynamic process. Consequently, clinicians must not apply this nomogram to guide surveillance intensity during follow-up or adjuvant therapy decisions. Compared with the existing osteosarcoma nomogram by Wang et al., which predicted cancer-specific survival, 36 our model differs fundamentally in aim (diagnostic vs. prognostic) and outcome (synchronous metastasis at diagnosis vs. survival), precluding direct performance comparison. To our knowledge, this is the first diagnostic nomogram specifically designed for identifying synchronous lung metastasis at initial diagnosis in osteosarcoma.
The intended clinical application of this nomogram is to guide the intensity of initial staging at the time of osteosarcoma diagnosis, not to predict future events. In resource-limited settings or emergency departments, a patient with a high predicted probability—based on age ≥60 years, axial tumour site, tumour size >8 cm, and high grade—may be prioritised for urgent chest CT staging rather than routine scheduling. However, the nomogram is not a replacement for imaging confirmation, nor is it designed for screening asymptomatic populations or for counselling about future metastatic risk. The decision to pursue aggressive surgical resection in patients with synchronous metastasis should be made within a multidisciplinary tumour board, with full acknowledgement that the observed survival benefit of surgery in this retrospective cohort may be largely attributable to selection of patients with favourable disease profiles.
Several limitations warrant careful consideration. First, the retrospective design of the SEER database introduces inherent confounding and selection biases. Second, and critically, SEER records metastasis status as a cross-sectional variable at diagnosis only; it does not capture the timing of metastasis detection or subsequent metachronous progression. Consequently, the nomogram estimates the probability of already-present synchronous metastasis and must not be applied to predict future metastatic risk. Third, only internal bootstrap validation was performed; no independent external cohort was available. External validation in a multi-institutional, prospective setting is essential before clinical implementation. Fourth, SEER lacks several clinically relevant variables known to influence metastasis risk and survival, including response to neoadjuvant chemotherapy, specific surgical margins, serum ALP and LDH levels, detailed lung nodule characteristics on imaging, and performance status and comorbidity data—all of which may have contributed to the moderate discriminative performance of our model. Fifth, the study population was derived from the United States, and generalizability to other populations, particularly in Asia, remains to be established. Despite these constraints, the nomogram provides a useful, transparent framework that can be refined as more granular data become available.
Conclusion
In conclusion, the diagnostic nomogram established in this study, incorporating tumour grade, site, size, and lymph node stage, offers a reliable method for estimating the probability of synchronous lung metastasis at initial diagnosis in osteosarcoma patients. Within the constraints of cross-sectional SEER data, this tool may assist clinicians in guiding initial staging intensity and treatment planning at presentation. It is not a prognostic tool and should not be used to predict future metastatic progression. Prospective studies with longitudinal metastasis monitoring are warranted to develop prognostic models for metachronous metastasis.
Footnotes
Ethical considerations
The current analysis did not involve individual participant data. Consequently, separate ethical approval is not required.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Program Project of Science and Technology Innovation Guidance of Zhengzhou City; 2024YLZDJH175, Key Scientific Research Project of Colleges and Universities in Henan Province; 26A310002, the Plan of Science and Technology of Kaifeng City; 2503167.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
