Radiomics-based machine learning models for prediction of medulloblastoma subgroups: a systematic review and meta-analysis of the diagnostic test performance

Abstract

Background

Medulloblastomas are a major cause of cancer-related mortality in the pediatric population. Four molecular groups have been identified, and these molecular groups drive risk stratification, prognostic modeling, and the development of novel treatment modalities. It has been demonstrated that radiomics-based machine learning (ML) models are effective at predicting the diagnosis, molecular class, and grades of CNS tumors.

Purpose

To assess radiomics-based ML models’ diagnostic performance in predicting medulloblastoma subgroups and the methodological quality of the studies.

Material and Methods

A comprehensive literature search was performed on PubMed; the last search was conducted on 1 May 2022. Studies that predicted all four medulloblastoma subgroups in patients with histopathologically confirmed medulloblastoma and reporting area under the curve (AUC) values were included in the study. The quality assessments were conducted according to the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and Checklist for Artificial Intelligence in Medical Imaging (CLAIM). A meta-analysis of radiomics-based ML studies’ diagnostic performance for the preoperative evaluation of medulloblastoma subgrouping was performed.

Results

Five studies were included in this meta-analysis. Regarding patient selection, two studies indicated an unclear risk of bias according to the QUADAS-2. The five studies had an average CLAIM score and compliance score of 23.2 and 0.57, respectively. The meta-analysis showed pooled AUCs of 0.88, 0.82, 0.83, and 0.88 for WNT, SHH, group 3, and group 4 for classification, respectively.

Conclusion

Radiomics-based ML studies have good classification performance in predicting medulloblastoma subgroups, with AUCs >0.80 in every subgroup. To be applied to clinical practice, they need methodological quality improvement and stability.

Keywords

Medulloblastoma radiomics machine learning deep learning artificial intelligence

Introduction

Medulloblastoma (MB) is a small blue cell malignancy of the cerebellum, and about 40% are located in the posterior cranial fossa (1,2). MBs constitute approximately 25% of all pediatric tumors (1) and are the leading cause of cancer-related death in children aged under 15–16 years (3). Despite the increase in survival rates in recent years, the five-year survival rate remains at 65%–70%, with surgical resection plus chemoradiotherapy (4). The heterogeneity in clinical outcomes is due to the unique biological properties of tumors (5). Four molecular subgroups—wingless (WNT), sonic hedgehog (SHH), group 3, and group 4—have been discovered in recent years. These four molecular subgroups have been incorporated into the World Health Organization (WHO) classification of central nervous system (CNS) tumors since 2016 (6). Today, risk stratification, prognostic modeling, and the development of novel treatment modalities are all driven by these molecular subgroups (7 –10).

Several molecular analysis techniques using tumor specimens from surgical resection have been developed to perform the molecular subgrouping of MBs. These techniques might not be adopted for routine clinical practice in many medical centers, particularly those with limited resources, due to technical complexity and high costs (11). Intratumor heterogeneity of tumors may result in sampling errors and, thus, misclassification (12,13). Likewise, tumor samples obtained during surgery and later analyzed in the laboratory do not allow clinicians to tailor their treatment strategies preoperatively.

Magnetic resonance imaging (MRI) is more widely available than complex molecular analysis methods and can produce high-resolution medical images. Despite not being a substitute for the molecular analysis of the tumor specimen, predicting MB subgroups with MRI offers a non-invasive, preoperative, and cheaper route to MB subgrouping. A few studies attempted to correlate MB subgroups with semantic radiologic features such as contrast enhancement, hemorrhage, cystic change, and so on (14 –18). However, these human-recognized qualitative features do not account for all the multidimensional data acquired by MRI and are susceptible to inter-observer variability.

In the past decade, radiomics has emerged as a general term to describe the technologies used to analyze medical images and the imaging features obtained during the process. With the high-throughput extraction of quantitative imaging features from radiologic images, radiomics enables the characterization of imaging phenotypes (19,20). Radiogenomics, in particular, is the study of the association between the genome and radiologic images (21). Radiomics-based machine learning (ML) models have been shown to be successful in predicting diagnosis, molecular class, and grades of other CNS tumors, including gliomas (22) and meningiomas (23,24). More recently, deep learning (DL) models have been utilized in similar classification tasks without using pre-engineered radiomic features (25,26).

The aim of the present study was to perform a systematic review and meta-analysis of radiomics-based ML models’ diagnostic performance in predicting MB subgroups. In addition, we analyzed the methodological quality of the included studies. We present the following article in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analysis for Diagnostic Test Accuracy (PRISMA-DTA) reporting checklist (27).

Material and Methods

Literature search

A comprehensive literature search was performed based on the following combination of Medical Subject Headings (MeSH) terms and keywords for a PubMed database search: “radiomic*”(tiab) OR “radiogenom*”(tiab) OR “machine learning”(MeSH) OR “machine learning*”(tiab) OR “deep learning*” OR “artificial intelligence*”) AND “medulloblastoma”(MeSH) OR “medulloblastoma*”(tiab) OR “posterior fossa tumor*”(tiab) OR (“posterior”(tiab) AND “fossa*”(tiab) AND “tumor*”(tiab)) OR “astrocytoma”(MeSH) OR “astrocytoma*”(tiab) OR “pilocytic astrocytoma”(tiab) OR “ependymoma”(MeSH) OR “ependymoma*”(tiab)). The last search was conducted on 1 May 2022.

Study selection

Two authors (AO and BK) determined the eligibility of the articles through title and abstract screening. The full text of articles thought to involve a radiomics-based ML model in patients with MB were obtained for further evaluation. The reference lists of the included studies were manually searched to identify other relevant studies.

Articles were included based on the fulfillment of all the following criteria: (i) patients with histopathologically confirmed MBs; (ii) available molecular subgroup information; (iii) molecular subgroup was predicted with a radiomics-based ML model; (iv) area under the curve (AUC) values were reported separately for four molecular subgroups (SHH, WNT, group 3, and group 4); and (v) original research articles.

The exclusion criteria were as follows: (i) MB molecular subgroups were not predicted; (ii) a radiomics-based ML model was not utilized to predict molecular subgroups; (iii) AUC values were not separately reported for four molecular subgroups; and (iv) reviews, letters, commentaries, or errata.

Data extraction

Data were collected by the two authors (AO and BK) for the following variables: (i) study characteristics (author, year, country, number of patients, age, sex, and distribution of molecular subgroups); (ii) MRI sequences used in models; (iii) type of data used in models; (iv) classification algorithm used for subgroup prediction; (v) validation method; and (vi) AUC values for each molecular subgroup classification. The data used in the meta-analysis were obtained from the validation sets in each study. When multiple validation sets were present in a study, higher AUC values for the subgroup classification were included in the meta-analysis.

Quality assessment

The quality assessments were conducted by two authors (MK and BBO) independently according to the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and Checklist for Artificial Intelligence in Medical Imaging (CLAIM) (28,29). Any disagreements were resolved through discussion or with the assistance of a third author (AO).

Four domains were evaluated with QUADAS-2: (i) patient selection; (ii) index test; (iii) reference standard; and (iv) flow and timing. The patient selection domain includes questions about patient selection methods. The index test domain provides questions concerning the index test and how it was performed and analyzed. The reference standard domain provides questions about the reference standard and how it was conducted and interpreted. The flow and timing domain questions whether patients did not get an index test or reference standard or were excluded from the confusion matrices. Concerns about the risk of bias and applicability were rated as low, high, or unclear on a 3-point scale.

The 42-item checklist, CLAIM, is used to assess the quality of artificial intelligence (AI) studies in medical imaging. Studies were scored on a 2-point scale of 0 or 1. The scores of each item were summed to calculate the CLAIM score for a study. The ratio of fulfilled items among the applicable items for each study was defined as the CLAIM compliance score.

Meta-analysis

In the meta-analysis, the AUC standard errors were calculated using the total number of patients in the related subgroup and the total number of patients. The inverse variance method was used to calculate the weight of each study. The results from all included studies were pooled for each subgroup, and an overall estimate of effect size was evaluated using a random-effects model. Heterogeneity across all included studies’ subgroups was estimated using Q-test, with P < 0.05 indicating the presence of study heterogeneity and I² statistics. I² values were defined as follows: heterogeneity that might not be important (0%–25%); low heterogeneity (26%–50%); moderate heterogeneity (51%–75%); and high heterogeneity (76%–100%) (30). Publication bias was not assessed in our analysis, as the small number of studies included in our meta-analysis (n = 5) may lead to inconclusive funnel plots and regression tests for detecting publication bias (31). Two-sided P values ≤0.05 were considered statistically significant.

All statistical analyses were conducted using R version 3.4.1 (R Foundation for Statistical Computing), implementing R package auctestr, and MedCalc Statistical Software version 20.110 (Ostend, Belgium) (32,33).

Results

Literature search

The study selection process is illustrated in Fig. 1. The initial literature search yielded 500 articles. These 500 articles were screened based on their title and abstract, and 474 were excluded. Full texts of the remaining 26 articles were obtained and reviewed.

Fig. 1.

The study selection process.

A total of 22 articles were excluded because they did not predict MB subgroups (n = 16), did not use a radiomics-based ML model to predict MB subgroups (n = 1), did not report AUC values for all subgroups (n = 3), or were reviews, letters, commentaries, or errata (n = 2).

Four articles were obtained and reviewed when the references provided in the included studies were also screened. One article was again excluded because it did not involve a radiomics-based ML model to predict MB subgroups. Two articles were excluded because they did not use radiologic imaging data to predict MB subgroups.

Finally, five original articles that included 420 patients with MBs were eventually included in the study (34 –38). Among these 420 patients, 289 were in the validation sets and analyzed in the present study.

Quality assessment

A quality assessment summary of the included studies using the QUADAS-2 tool is shown in Fig. 2. With regard to patient selection, two studies indicated an unclear risk of bias as they failed to mention the inclusion criteria of patient enrollment (37,38). Regarding the reference test, one study was considered to have a high risk of bias, as it used both formalin-fixed and fresh frozen tissue for tissue preservation (34).

Fig. 2.

Methodological quality of the studies included in the meta-analysis according to the QUADAS-2 tool for risk of bias and applicability concerns.

A quality assessment summary of the included studies using the CLAIM is shown in Table 1. All studies had an item marked as “not applicable” in the methods domain. The mean CLAIM score of the five studies was 23.2 ± 1.92 (range = 20–25). The mean CLAIM compliance score of the five studies was 0.57 ± 0.05 (range = 0.49–0.61).

Table 1.

CLAIM assessment.

Study	Title/Abstract (n = 2)	Introduction (n = 2)	Methods (n = 28)	Results (n = 5)	Discussion (n = 2)	Other information (n = 3)	Total (n = 42)	CLAIM compliance
Iv et al.	1	2	17 (1 NA)	2	2	1	25	0.61
Chen et al.	2	2	16 (1 NA)	1	2	1	24	0.59
Yan et al.	0	2	14 (1 NA)	4	2	1	23	0.56
Chang et al.	1	2	16 (1 NA)	2	2	1	24	0.59
Saju et al.	2	2	12 (1 NA)	2	1	1	20	0.49

CLAIM, Checklist for Artificial Intelligence in Medical Imaging; NA, not applicable.

Characteristics of included studies

The patient and study characteristics are described in Table 2. Regarding applied MRI sequences, Iv et al. used T1-weighted (T1W) and T2-weighted (T2W) sequences together for radiomic feature extraction (34). Chen et al. and Saju et al. used T1W contrast-enhanced and T2W sequences together (35,38). Yan et al. combined T1W, T2W, fluid-attenuated inversion recovery (FLAIR) sequences, and apparent diffusion coefficient (ADC) values (36). Chang et al. utilized T1W, T2W, FLAIR sequences, and diffusion-weighted imaging (DWI) (37). Three studies used support vector machines as classification algorithms (34,37,38). Yan et al. used random forest algorithm (36), while the study by Chen et al. was the only study that used DL for classification (35). For classification, four studies utilized radiomic features in their algorithm as input (34,35,37,38). Yan et al. included clinical and semantic features along with radiomic features. Chen et al. did not use handcrafted radiomic features, instead applied a convolutional neural network (CNN) model for feature extraction (35). Four of the included studies used various cross-validation methods for validation (37, 38, 40, 41), while Yan et al. used a separate test set (36).

Table 2.

Study characteristics.

Authors	Year	No. of patients	Mean age (years)	Sex (M:F)	WNT	SHH	Group 3	Group 4	MRI sequence	Information used in classification algorithm	Classification algorithm	Validation method
Iv et al.	2019	109	8.6	64:45	19	30	24	36	T1, T2	Radiomic features	SVM	3-dataset CV and double 10-fold CV
Chen et al.	2020	113	17.7	75:38	24	27	31	31	T1C, T2	Deep learning imaging features	CNN	3-fold CV
Yan et al.	2020	122	11.6	86:36	21	20	54	27	T1, T1C, T2, FLAIR, ADC	Clinical, semantic and texture features	Random forest	Test cohort
Chang et al.	2021	38	7.8	17:21	7	12	8	11	T1, T1C, T2, FLAIR, DWI	Radiomic features	SVM	Nested leave-one-out CV
Saju et al.	2022	38	9.0	31:7	7	7	12	12	T1C, T2	Radiomic features	SVM	10-fold CV
Total	2019–2021	420	11.9	273:147	78	96	129	117	-	-	-	-

CNN, convolutional neural network; CV, cross validation; DWI, diffusion-weighted imaging; FLAIR, fluid-attenuated inversion recovery; MRI, magnetic resonance imaging; SHH, sonic hedgehog; SVM, support vector machine; WNT, wingless.

Meta-analysis

The studies included in the meta-analysis and AUC and standard error of AUC values for each subgroup classification are reported in Table 3.

Table 3.

Diagnostic performance of the classification algorithms for the subgroups.

Authors - Year	WNT			SHH			Group 3			Group 4
Authors - Year	No. of patients	SE of AUC	AUC	No. of patients	SE of AUC	AUC	No. of patients	SE of AUC	AUC	No. of patients	SE of AUC	AUC
Iv et al. (2019)	19	0.10	0.72	30	0.10	0.79	24	0.11	0.70	36	0.09	0.83
Chen et al. (2020)	17	0.03	0.96	18	0.03	0.96	20	0.02	0.99	19	0.03	0.96
Yan et al. (2020)	6	0.07	0.91	4	0.08	0.87	14	0.05	0.67	6	0.05	0.67
Chang et al. (2021)	7	0.08	0.82	12	0.12	0.50	8	0.10	0.72	11	0.13	0.78
Saju et al. (2022)	7	0.07	0.93	7	0.05	0.90	12	0.06	0.93	12	0.05	0.93

AUC, area under the curve; SE, standard error; SHH, sonic hedgehog; WNT, wingless.

The models for the WNT subgroup classification showed an overall pooled AUC of 0.88 (95% confidence interval [CI] = 0.86–0.96) with a standard error of 0.04 (Fig. 3a). The Q-test demonstrated heterogeneity across the studies (Q = 10.28; P = 0.036), and the Higgins I² statistic demonstrated the presence of moderate heterogeneity in the WNT subgroup classification (61.11%).

Fig. 3.

Forest plots of the medulloblastoma subgroup classifications: (a) WNT, (b) SHH, (c) Group 3, and (d) Group 4.

The models for the SHH subgroup classification showed an overall pooled AUC of 0.82 (95% CI = 0.69–0.95) with a standard error of 0.07 (Fig. 3b). The Q-test demonstrated that heterogeneity was present across the studies (Q = 22.63; P < 0.001), and the Higgins I² statistic demonstrated the presence of high heterogeneity in the SHH subgroup classification (82.32%).

The models for the group 3 subgroup classification showed an overall pooled AUC of 0.83 (95% CI = 0.70–0.95) with a standard error of 0.06 (Fig. 3c). The Q-test demonstrated that heterogeneity was present across the studies (Q = 32.93; P < 0.001), and the Higgins I² statistic demonstrated the presence of high heterogeneity in the group 3 subgroup classification (87.85%).

The models for the group 4 subgroup classification showed an overall pooled AUC of 0.88 (95% CI = 0.81–0.95) with a standard error of 0.04 (Fig. 3d). The Q-test demonstrated that heterogeneity was present across the studies (Q = 10.88; P = 0.028), and the Higgins I² statistic demonstrated the presence of moderate heterogeneity in the group 4 subgroup classification (63.25%).

Discussion

Radiomics-based ML models, including the more novel DL approaches, offer a captivating way to get around current obstacles and quicken the transition to personalized medicine. In order to integrate patient-tailored predictions into routine clinical care, new algorithms combine data from imaging studies, molecular markers, and clinical information. Although efforts are being made to standardize the methodological approach, their application outside academic research has not yet been proven appropriate (39 –41). The only framework specifically designed for AI and capable of capturing the specifics of model reporting for prediction model studies involving applications of AI to medical imaging is CLAIM (29). Therefore, we used CLAIM to evaluate how the ML/DL models for the prediction of MB subgroups were presented in the included studies. The average CLAIM score and compliance score of the five studies were 23.2 and 0.57, respectively. The compliance score, which measures the ratio of fulfilled items to applicable items, is just above 0.5, which indicates a general lack of methodological quality. It is not known yet if this is valid for all the ML/DL studies or just for the studies included in this meta-analysis. CLAIM is a relatively new checklist, and there have not been many meta-analyses or systematic reviews investigating AI studies with this checklist. A meta-analysis of DL algorithms’ performance in predicting the isocitrate dehydrogenase mutation status of gliomas found that the mean CLAIM compliance score was 0.61 (42). Hence, it might be reasonable to assume that the lack of methodology is a general issue rather than being restricted to this specific field of study.

The QUADAS-2 evaluation of the studies included in the meta-analysis showed an overall low risk of bias but also brought to light some issues. Two studies that did not include the inclusion criteria for patient enrollment indicated an unclear risk of bias with regard to patient selection (37,38). One study that used both formalin-fixed and freshly frozen tissue for tissue preservation was thought to have a high risk of bias with regard to the reference test (34). It is shown in the literature that different techniques of tissue preservation may result in different genetic analyses (43).

Overall, with pooled AUC values >0.80 for all MB subgroups, radiomics-based ML/DL approaches offer great potential for MB subgrouping. All the studies, except for Chen et al. who employed a CNN model, used handcrafted radiomic features. Given that the DL approach is more “data hungry” compared with ML algorithms, which is not feasible in many research settings, this is reasonable (44). Federated learning has been identified as a promising field of research to overcome this need of data-driven models without breaking privacy regulations (45). Only Yan et al. trained a model using semantic radiological features, clinical parameters, and texture features together (36). Though it may be counter-intuitive, AUC values for the group 3 (0.67) and group 4 (0.67) classifications are the lowest, suggesting that this multimodal approach may not essentially increase model accuracy.

The accuracy metrics in selected studies were non-uniform and inconsistent. Not every study reported confusion matrices, which help obtain metrics like sensitivity, specificity, and accuracy while allowing for comprehensive meta-analyses. Authors of the included studies were approached via email to inquire about these metrics, but just one author supplied them. This lack of incomplete reporting in radiomics-based ML studies was previously reported in meta-analyses (23,46). Due to the unavailability of confusion matrices, our meta-analysis employed the most commonly reported metric, AUC values.

The confusion matrices at all threshold values are used to create the receiver operating characteristic (ROC) curve, which summarizes performance. AUC converts the ROC curve into a numerical gauge of a binary classifier's effectiveness. It measures how well a model can distinguish between positive and negative classifications. In general, AUC values in the range of 0.9–1 were considered excellent, 0.8–0.9 were considered good, 0.7–0.8 were considered fair, 0.6–0.7 were considered poor, and 0.5–0.6 were considered failed (47). Thus, pooled AUC values for all subgroups in our study fall within good AUC values. Although AUC values provide a good overall sense of classification performance, there are a few disadvantages. Its clinical interpretability is limited since it does not account for misclassification costs associated with false-negative and false-positive findings (48). It does, however, presume that specificity and sensitivity are equally important to the decision-maker, which may not always be the case for clinicians (49).

The capacity to differentiate MB subgroups using preoperative MRI may affect individualized treatment decisions. Treatments for MB are currently individualized and include surgery, radiation, and chemotherapy (50). When molecular subgroup affiliation is considered, the prognostic benefit of increased extent of resection for patients with MB is decreased, although the first-line treatment for MB is maximal safe resection (4,51). The extent of resection was not associated with overall survival in patients with group 4 MBs, but progression-free survival was improved with gross total resection compared to partial resection, especially in the context of disease spread (OS) in a retrospective study (4). There was no evidence of this effect in WNT, SHH, or group 3 MBs, and there was no overall survival gain from gross total versus subtotal resection in any MB subgroups.

On the other hand, neurological deficits, particularly posterior fossa syndrome (cerebellar mutism), which affects around 25% of patients and is characterized by emotional lability and trouble producing words, can complicate resection (52). In younger children, radiotherapy and cytotoxic chemotherapy are linked to secondary cancers, cerebrovascular disease, cataracts, hearing loss, low stature, pituitary hormone insufficiency, and neurocognitive impairment (53,54). These factors serve to enhance the urgency of developing more effective treatments to raise stagnant survival rates and lessen the long-term side effects of existing treatments (55). Establishing the effectiveness of treatment de-escalation for WNT MB and incorporating alternative radiation technologies, like proton beam therapy in place of photon-based radiation, are current research areas of particular interest (56).

The present study has some limitations. First, a relatively low number of papers met the inclusion criteria, resulting in a relatively low number of patients. We used only PubMed as a literature search source; however, it should be noted that the benefits of searching sites other than PubMed, specifically searching EMBASE, are not significant (57). Studies using the proper methodology but not reporting for all four subgroups were not included. Study heterogeneity was high, but it is commonly observed in meta-analyses on radiomics-based ML studies (23,46,58,59). AUC values were used as the common classification performance metric, which comes with disadvantages, as mentioned earlier. The QUADAS-2 analysis presented one high-risk source of bias along with two unclear items. Although CLAIM is a relatively recent checklist, it was observed that compliance was not high.

In conclusion, this study revealed that radiomics-based ML studies demonstrate good classification performance in predicting subgroups in MBs, with AUC values >0.80 in all subgroups. These results are promising for improving the management of MB through preoperative molecular subgrouping. They require methodological quality improvement and stability to be adapted to clinical practice. To prove their validity, well-designed prospective trials are required, and the reporting of methods and results must be standardized.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Mert Karabacak

References

Pollack

Jakacki

. Childhood brain tumors: epidemiology, current management and future directions. Nat Rev Neurol 2011;7:495–506.

Kool

Korshunov

Remke

, et al. Molecular subgroups of medulloblastoma: an international meta-analysis of transcriptome, genetic aberrations, and clinical data of WNT, SHH, group 3, and group 4 medulloblastomas. Acta Neuropathol 2012;123:473–484.

Curtin

Miniño

Anderson

. Declines in cancer death rates among children and adolescents in the United States, 1999–2014. NCHS Data Brief 2016;257:1–8.

Thompson

Hielscher

Bouffet

, et al. Prognostic value of medulloblastoma extent of resection after accounting for molecular subgroup: a retrospective integrated clinical and molecular analysis. Lancet Oncol 2016;17:484–495.

Northcott

Korshunov

Witt

, et al. Medulloblastoma comprises four distinct molecular variants. J Clin Oncol 2011;29:1408–1414.

Louis

Perry

Reifenberger

, et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol 2016;131:803–820.

Bavle

Parsons

. From one to many: further refinement of medulloblastoma subtypes offers promise for personalized therapy. Cancer Cell 2017;31:727–729.

Gupta

Shirsat

Jalali

. Molecular subgrouping of medulloblastoma: impact upon research and clinical practice. Curr Pediatr Rev 2015;11:106–119.

Archer

Mahoney

Pomeroy

. Medulloblastoma: molecular classification-based personal therapeutics. Neurotherapeutics 2017;14:265–273.

10.

Gajjar

Robinson

. Medulloblastoma—translating discoveries from the bench to the bedside. Nat Rev Clin Oncol 2014;11:714–722.

11.

Shuangshoti

Hjardermaal

Ahmad

, et al. Concurrence of multiple sclerosis and intracranial glioma. Report of a case and review of the literature. Clin Neuropathol 2003;22:304–308.

12.

Wang

Ramaswamy

Remke

, et al. Intertumoral and intratumoral heterogeneity as a barrier for effective treatment of medulloblastoma. Neurosurgery 2013;60:57–63.

13.

Kumar

Liu

APY

Northcott

. Medulloblastoma genomics in the modern molecular era. Brain Pathol 2020;30:679–690.

14.

Perreault

Ramaswamy

Achrol

, et al. MRI surrogates for molecular subgroups of medulloblastoma. AJNR Am J Neuroradiol 2014;35:1263–1269.

15.

Patay

DeSain

Hwang

, et al. MR imaging characteristics of wingless-type-subgroup pediatric medulloblastoma. AJNR Am J Neuroradiol 2015;36:2386–2393.

16.

Yeom

Mobley

Lober

, et al. Distinctive MRI features of pediatric medulloblastoma subtypes. AJR Am J Roentgenol 2013;200:895–903.

17.

Dasgupta

Gupta

Pungavkar

, et al. Nomograms based on preoperative multiparametric magnetic resonance imaging for prediction of molecular subgrouping in medulloblastoma: results from a radiogenomics study of 111 patients. Neuro-Oncol 2019;21:115–124.

18.

Zhao

Zhou

, et al. Distinctive localization and MRI features correlate of molecular subgroups in adult medulloblastoma. J Neurooncol 2017;135:353–360.

19.

Lambin

Rios-Velazquez

Leijenaar

, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441–446.

20.

Gillies

Kinahan

Hricak

. Radiomics: images are more than pictures, they are data. Radiology 2016;278:563–577.

21.

Madhogarhia

Haldar

Bagheri

, et al. Radiomics and radiogenomics in pediatric neuro-oncology: a review. Neurooncol Adv 2022;4:vdac083.

22.

Jian

Jang

Manuguerra

, et al. Machine learning for the prediction of molecular markers in glioma on magnetic resonance imaging: a systematic review and meta-analysis. Neurosurgery 2021;89:31–44.

23.

Ugga

Perillo

Cuocolo

, et al. Meningioma MRI radiomics and machine learning: systematic review, quality score assessment, and meta-analysis. Neuroradiology 2021;63:1293–1304.

24.

Brunasso

Ferini

Bonosi

, et al. A spotlight on the role of radiomics and machine-learning applications in the management of intracranial meningiomas: a new perspective in neuro-oncology: a review. Life (Basel) 2022;12:586.

25.

Daugaard Jørgensen

Antulov

Hess

, et al. Convolutional neural network performance compared to radiologists in detecting intracranial hemorrhage from brain computed tomography: a systematic review and meta-analysis. Eur J Radiol 2022;146:110073.

26.

Xue

Wang

Qin

, et al. Deep learning in image-based breast and cervical cancer detection: a systematic review and meta-analysis. NPJ Digit Med 2022;5:19.

27.

McInnes

MDF

Moher

Thombs

, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA 2018;319:388–396.

28.

Whiting

Rutjes

AWS

Westwood

, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–536.

29.

Mongan

Moy

Kahn

. Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell 2020;2:e200029.

30.

Higgins

JPT

Thompson

Deeks

, et al. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–560.

31.

Terrin

Schmid

Lau

. In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias. J Clin Epidemiol 2005;58:894–901.

32.

MedCalc Statistical Software. 2020. Available at: https://www.medcalc.org.

33.

R Core Team. R: A language and environment for statistical computing. Available at: https://www.R-project.org/.

34.

Zhou

Shpanskaya

, et al. MR imaging-based radiomic signatures of distinct molecular subgroups of medulloblastoma. AJNR Am J Neuroradiol 2019;40:154–161.

35.

Chen

Fan

KK-W

, et al. Molecular subgrouping of medulloblastoma based on few-shot learning of multitasking using conventional MR images: a retrospective multicenter study. Neurooncol Adv 2020;2:vdaa079.

36.

Yan

Liu

Wang

, et al. Radiomic features from multi-parameter MRI combined with clinical parameters predict molecular subgroups in patients with medulloblastoma. Front Oncol 2020;10:558162.

37.

Chang

F-C

Wong

T-T

K-S

, et al. Magnetic resonance radiomics features and prognosticators in different molecular subtypes of pediatric medulloblastoma. PLoS One 2021;16:e0255500.

38.

Saju

Chatterjee

Sahu

, et al. Machine-learning approach to predict molecular subgroups of medulloblastoma using multiparametric MRI-based tumor radiomics. BJR 2022;95:20211359.

39.

Lambin

Leijenaar

RTH

Deist

, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017;14:749–762.

40.

Zwanenburg

Vallières

Abdalah

, et al. The image biomarker standardization initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 2020;295:328–338.

41.

Wolff

Moons

KGM

Riley

, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019;170:51–58.

42.

Karabacak

Ozkara

Mordag

, et al. Deep learning for prediction of isocitrate dehydrogenase mutation in gliomas: a critical approach, systematic review and meta-analysis of the diagnostic test performance using a Bayesian approach. Quant Imaging Med Surg 2022;12:4033046.

43.

Gao

Gong

, et al. Comparison of fresh frozen tissue with formalin-fixed paraffin-embedded tissue for mutation analysis using a multi-gene panel in patients with colorectal cancer. Front Oncol 2020;10:310.

44.

Adadi

. A survey on data–efficient algorithms in big data era. J Big Data 2021;8:24.

45.

Darzidehkalani

Ghasemi-Rad

van Ooijen

PMA

. Federated learning in medical imaging: part I: toward multicentral health care ecosystems. J Am Coll Radiol 2022;S1546–1440:00280–0.

46.

Cuocolo

Cipullo

Stanzione

, et al. Machine learning for the identification of clinically significant prostate cancer on MRI: a meta-analysis. Eur Radiol 2020;30:6877–6887.

47.

Metz

. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283–298.

48.

Halligan

Altman

Mallett

. Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach. Eur Radiol 2015;25:932–939.

49.

Hand

. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 2009;77:103–123.

50.

Packer

Vezina

. Management of and prognosis with medulloblastoma: therapy at a crossroads. Arch Neurol 2008;65:1419–1424.

51.

Thompson

Bramall

Herndon

, et al. The clinical importance of medulloblastoma extent of resection: a systematic review. J Neurooncol 2018;139:523–539.

52.

Law

Greenberg

Bouffet

, et al. Clinical and neuroanatomical predictors of cerebellar mutism syndrome. Neuro Oncol 2012; 14: 1294–1303.

53.

Jakacki

Burger

Zhou

, et al. Outcome of children with metastatic medulloblastoma treated with carboplatin during craniospinal radiotherapy: a children’s oncology group phase I/II study. J Clin Oncol 2012;30:2648–2653.

54.

Ribi

Relly

Landolt

, et al. Outcome of medulloblastoma in children: long-term complications and quality of life. Neuropediatrics 2005;36:357–365.

55.

Juraschka

Taylor

. Medulloblastoma in the age of molecular subgroups: a review. J Neurosurg Pediatr 2019;24:353–363.

56.

Yock

Yeap

Ebb

, et al. Long-term toxic effects of proton radiotherapy for paediatric medulloblastoma: a phase 2 single-arm study. Lancet Oncol 2016;17:287–298.

57.

Halladay

Trikalinos

Schmid

, et al. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol 2015;68:1076–1084.

58.

Cronin

Kelly

Altaee

, et al. How to perform a systematic review and meta-analysis of diagnostic imaging studies. Acad Radiol 2018;25:573–593.

59.

Lee

Kim

Choi

, et al. Systematic review and meta-analysis of studies evaluating diagnostic test accuracy: a practical review for clinical researchers-part II. Statistical methods of meta-analysis. Korean J Radiol 2015;16:1188–1196.