Abstract
Background
Histologic grade assessment plays an important part in the clinical decision making and prognostic evaluation of squamous cell carcinoma (SCC) of the oral tongue and floor of mouth (FOM).
Purpose
To assess the value of apparent diffusion coefficient (ADC)-based radiomics in discriminating between low- and high-grade SCC of the oral tongue and FOM.
Material and Methods
We included data from 88 patients (training cohort: n = 59; testing cohort: n = 29) who underwent diffusion-weighted imaging with a 3.0-T magnetic resonance imaging scanner before treatment. A total of 526 radiomics features were extracted from ADC maps to construct a radiomics signature with least absolute shrinkage and selection operator logistic regression. Receiver operating characteristic curves and areas under the curve (AUCs) were used to evaluate the performance of radiomic signature.
Results
Five features were selected to construct the radiomics signature for predicting histologic grade. The ADC-based radiomics signature performed well for discriminating between low- and high-grade tumors, with AUCs of 0.83 in both cohorts. Based on the cut-off value of the training cohort, the radiomics signature achieved accuracies of 0.78 and 0.79, sensitivities of 0.65 and 0.71, and specificities of 0.85 and 0.82 in the training and testing cohorts, respectively.
Conclusion
ADC-based radiomics can be a useful and promising non-invasive method for predicting histologic grade of SCC of the oral tongue and FOM.
Introduction
Oral squamous cell carcinomas (SCC) is one of the most common cancers in the head and neck, and nearly half arise from the tongue and floor of mouth (FOM) (1,2). Several studies have reported an association of histologic grade with prognosis in patients with head and neck SCCs (HNSCC) (3–5). Larsen et al. (6) reported that histologic grade was significantly associated with lymph node metastasis at the time of diagnosis. Jerjes et al. (3) reported that 90% of postoperative mortality was due to regional or distant metastasis occurring in moderately or poorly differentiated (Grade II and III) SCCs. Histologic grade assessment therefore plays an important part in the clinical decision making and prognostic evaluation of SCCs of the oral tongue and FOM.
Magnetic resonance imaging (MRI) is routinely used to evaluate head and neck cancers and provide a wealth of information regarding tissue extent, perineural spread, osseous erosion, and other factors (7). Diffusion-weighted imaging (DWI) is a widely used functional MRI sequence that is helpful for clinical differentiation of histologic grades in HNSCC using apparent diffusion coefficient (ADC) values (8,9). Radiomics has attracted increasing attention because medical images contain tumor pathophysiology information of intra-tumor heterogeneity, which has profound importance in oncology (10–12). Radiomics enables the conversion of medical images into high-throughput quantitative features using data characterization algorithms and can build predictive models by analyzing these features and producing clinically significant output (13). Several studies reported that MRI radiomics features could predict p53 status and human papilloma virus infection in HNSCCs (14,15). Recently, ADC-based radiomics demonstrated potential to predict the histologic grade of glioma (16), cervical cancer (17), and bladder cancer (18) with favorable accuracies. To date, there is a lack of DWI-based radiomics analyses to evaluate HNSCC histologic grade.
The aim of the present study was to investigate the value of ADC-based radiomics in discriminating between low- and high-grade SCCs of the oral tongue and FOM.
Material and Methods
Patients
The Ethics Review Board of our hospital approved the protocol of this retrospective study; written informed consent was not required. One experienced radiologist reviewed medical records collected from April 2015 to December 2018 to identify patients with SCC of the oral tongue and/or FOM in our hospital. The patients were chosen and excluded according to the criteria in Fig. 1. The patients were randomly divided into training and testing cohorts at a ratio of 2:1. The training cohort was used for machine learning, which is to fit the parameters of the classifier. The testing cohort was used to assess the performance and generalization of the trained model. Tumor differentiation grade was determined by two head and neck pathologists (with three and seven years of experience) based on the World Health Organization classification system (Broder’s grade) (19). For statistical purposes, all lesions were divided into low-grade (Grade I and I–II) and high-grade (Grade II and III) groups. The independent sample t-tests or chi-square tests, where appropriate, were used to assess the differences of clinical characteristics between two cohorts as well as between low- and high-grade groups.

Study flow chart with inclusion and exclusion criteria.
Image acquisition, image segmentation, and radiomics feature extraction
MRI was performed on a 3.0-T scanner (Ingenia; Philips Healthcare, Best, The Netherlands) using a head-and-neck array coil. The MRI protocol comprised T1-weighted (T2W) imaging, T2-weighted (T2W) imaging, DWI, and contrast-enhanced T1W imaging. DWI was undertaken using a single-shot spin-echo echo-planar imaging sequence. The imaging parameters for DWI were: repetition time = 1922 ms; echo time = 67 ms; field of view = 192 × 192 mm; b value = 0 and 1000 s/mm2; slice thickness = 5 mm; spacing between slices = 0.5 mm; and gradient directions = x, y, and z. ADC maps were derived with a mono-exponential model on the Philips Medical Systems workstation. The ADC unit was 1 × 10−6 mm2/s.
Image segmentation was undertaken using the “Segment Editor” module of the open-source 3D Slicer software (www.slicer.org/). Manual segmentation was performed by a radiologist with five years of head and neck imaging experience. The three-dimensional regions of interest (ROIs) were delineated slice-by-slice to cover the whole tumor to the greatest extent possible. Given that ADC values were significantly influenced by free water molecules and we wanted to assess microscopic intra-tumor heterogeneity, visible necrotic and cystic components within the tumor were excluded by referencing T2W imaging and contrast-enhanced T1W imaging. To verify the reproducibility of inter-observer delineation, data from 50 patients were randomly selected for analysis by another radiologist with 10 years of head and neck imaging experience. The intraclass correlation coefficient (ICC) was used to determine agreement in radiomics feature measurements. We selected an ICC ≥ 0.75 to denote “acceptable reliability.”
Extraction of radiomics features was conducted using the “Radiomics” module of 3D Slicer, which is based on the open-source python package Pyradiomics (www.radiomics.io/pyradiomics.html). The radiomics features comprised the following features: (i) size and shape; (ii) first-order histogram; (iii) texture; and (iv) wavelet. The texture features included the gray-level co-occurrence matrix (GLCM) and gray-level run length matrix (GLRLM). The wavelet features were the recalculated histogram and textural features after wavelet decomposition of the original images in three directions (x, y, z). More details about radiomics features are available at https://pyradiomics.readthedocs.io/en/latest/features.html.
Construction and validation of the radiomics signature
Considering the complexity of radiomics features, there was a risk of overfitting. The least absolute shrinkage and selection operator (LASSO) logistic regression model was used in the training cohort to identify the most valuable predictive features and build a radiomics signature. This approach minimizes binomial deviance by selecting a tuning parameter (λ); we adopted fivefold cross-validation with minimum criteria (20). Simultaneously, a formula was generated using a linear combination of the optimal features weighted by their respective coefficients. The radiomics score for each patient was calculated based on this formula. The association of radiomics features and radiomics score with histologic grade was analyzed using Mann–Whitney U test.
Analyses of receiver operating characteristic (ROC) curves were performed to assess the predictive ability of the radiomics signature, and the areas under the ROC curves (AUCs) were obtained. The optimal cut-off value determined in the training cohort was applied to the testing cohort to derive the accuracy, sensitivity, and specificity.
All statistical analyses were conducted using R software (www.r-project.org). All reported statistical significance levels are two-sided and significance was set at 0.05. The radiomics analysis workflow is shown in Fig. 2.

Radiomics analysis workflow. Experienced radiologists delineated the region of interest (ROI) covering the whole tumor. Radiomics features were extracted from the ROI, including size- and shape-based features, first-order histogram features, textural features, and wavelet features. LASSO regression was used to select the optimal set of features and to build the radiomics signature. ROC analysis was used to evaluate the classification ability of the radiomics signature. LASSO, least absolute shrinkage and selection operator; ROC, receiver operating characteristic.
Results
Patient characteristics
Patient characteristics in the training and testing cohorts are listed in Table 1. The training cohort comprised 59 patients with 39 and 20 low- and high-grade tumors, respectively. The testing cohort comprised 29 patients with 22 and 7 low- and high-grade tumors, respectively. No significant differences were found between cohorts for any clinical characteristics (all P > 0.05). No clinical variables were significantly associated with tumor histologic grade in the training cohort (all P > 0.05).
Baseline characteristics of patients in training and testing cohorts.*
Values are given as n (%) or mean ± SD.
*The characteristics were compared between the training cohort and testing cohort.
Construction and validation of radiomics signature
In total, we extracted 526 radiomics features from ADC maps. After robustness assessment, 437 features with an ICC ≥ 0.75 remained and 43 were significantly associated with tumor grading. There was no significant difference in mean ADC on original images between low- and high-grade carcinomas (P = 0.762).
Five optimal features were selected with LASSO regression (Fig. 3). The weighting coefficients of these features used to calculate radiomics scores are shown in Table 2. We found that wLHL_GLRLM_ShortRunLowGrayLevelEmphasis was significantly higher in the low-grade group compared to the high-grade group, while the other features were significantly higher in the high-grade group (all P < 0.05) (Table 2). The AUCs of optimal features were in the range of 0.66–0.75 (Fig. 4 and Table 2).

Feature selection was performed using the LASSO binary regression model. (a) Tuning parameter (λ) selection in the LASSO model by fivefold cross-validation with the minimum criteria. (b) LASSO coefficient profiles of the 43 texture features. A coefficient profile plot was produced against the log(λ) sequence. A blue dotted line was drawn at the selected value (log(λ) = –3.05), which resulted in five nonzero coefficients. LASSO, least absolute shrinkage and selection operator.
Optimal features for predicting histologic grade.
Values are given as median (IQR).
*Five optimal radiomics features were included in the radiomics signature.
AUC, area under the curve; IQR, interquartile range; LASSO, least absolute shrinkage and selection operator.

ROC curves of the optimal features for discriminating between low- and high-grade tumors in the training cohort. AUC, area under the curve; ROC, receiver operating characteristic.
The radiomics score was significantly different between low- and high-grade tumors in the training (P < 0.001) and testing (P = 0.008) cohorts, with the former being much lower) (Fig. 5). The radiomics signature yielded AUCs of 0.83 in both the training and testing cohorts (Fig. 6). On the basis of the cut-off value obtained from the training cohort, the ADC-based radiomics signature achieved a favorable classification performance, with accuracies of 0.78 and 0.79, sensitivities of 0.65 and 0.71, and specificities of 0.85 and 0.82 in the training and testing cohorts, respectively (Fig. 5 and Table 3).

Radiomics scores for each patient with regard to tumor grade classification in the training and testing cohorts.

ROC curves of radiomics signature for discriminating between low- and high-grade tumors in the training and testing cohorts. AUC, area under the curve; ROC, receiver operating characteristic.
Predictive performance of the radiomics signature in the training and testing cohorts.
AUC, area under the curve; CI, confidence internal; NPV, negative predictive value; PPV, positive predictive value.
Discussion
We developed and validated an ADC-based radiomics signature for predicting histologic grade in SCC of the oral tongue and FOM. Based on the significant difference in radiomics scores between low- and high-grade tumors, we conclude that the radiomics signature could help differentiate the two groups.
The histologic grade of SCC of the oral tongue and FOM in pretreatment evaluation has long been considered an important prognostic factor (8,21). Invasive biopsy is the “gold standard” for pretreatment evaluation of histologic grade, but it carries a risk of bleeding, infection, and other adverse events. In addition, biopsy does not exactly reflect the pathological characteristic of the whole tumor and is often limited due to sampling bias (19). Among non-invasive methods, several quantitative parameters from functional MRI were demonstrated to be well correlated with pathological degree of tumor differentiation. A perfusion-weighted MRI study showed that tumor blood flow in poorly and undifferentiated HNSCC was significantly higher than that in well-to-moderately differentiated lesions (22). Razek et al. (23,24) suggested magnetic resonance spectroscopy as a potential method for tumor grading of HNSCCs due to the difference in choline/creatine (Ch/Cr) at different degrees of tumor differentiation. As the most commonly used form of functional MRI, DWI was widely proved to be helpful in grading of HNSCCs (9,24,25). However, most quantitative parameters were derived from the mean value of pixels within an ROI, which likely failed to reflect the heterogeneity of the entire tumor. In addition, although some studies obtained favorable power for grading using mean ADC with accuracies in the range of 0.7–0.88 (9,23), no testing cohorts were constructed to validate the performances. By integrating comprehensive information regarding water diffusion and heterogeneity within the whole tumor, we found that ADC-based radiomics had moderate power in discriminating between low- and high-grade SCCs with accuracies of 0.78 and 0.79 in the training and testing cohorts, respectively. If the present study is reproduced with independent datasets from other centers, these results would likely suggest that an ADC-based radiomics signature can be used for computer-aided grading of SCCs of the oral tongue and FOM.
In agreement with previous studies (9,25), mean ADC on original ADC maps with b = 0 and 1000 s/mm2 was not significantly different between low- and high-grade tumors. The more pronounced microscopic necrotic areas and peritumoral edematous zones with increased ADC values and the absence of cell keratinization known to hinder water diffusion would at least partially explain the higher-than-expected ADC in high-grade tumors (9). Among the optimal radiomics features, positive skewness on original ADC maps was demonstrated in low- and high-grade tumors and was significantly greater in the latter. Skewness describes the asymmetry of histogram distribution, with a higher positive skew indicating that the voxel values cluster toward the lower end of the histogram (26). Studies have reported significantly higher ADC skewness in high- versus low-grade renal cell carcinomas (27), as well as for malignant compared with benign endometrial tumors (26). Our observation of higher ADC skewness within high-grade SCCs reflected a predominance of lower ADC values resulting from neoplasia-related cellularity. In addition, the Histogram_Entropy, GLCM_Contrast, GLRLM_ShortRunLowGrayLevelEmphasis were significantly different between low- and high-grade tumors. The histogram, GLCM, and GLRLM correspond to first-, second-, and high-order descriptions of the pixel distribution within a ROI and can characterize global, local, and regional tumor heterogeneity on different scales (28). Presumably, these findings reflect the fact that high-grade carcinomas have greater intra-tumor heterogeneity than low-grade tumors due to increased hypoxic voids, necrosis, and edema within the tumor.
It should be noted that radiomics is an evolving field in oncology, and reproducibility is still a problem that urgently needs to be solved; data, segmentation, and statistical reproducibility remain the biggest concerns (29). Berenguer et al. (30) reported that radiomics features were significantly influenced by imaging acquisition parameters. As this was a retrospective pilot study, only two b values (0 and 1000 s/mm2) were used for calculating ADC maps and the radiomics signature may be not suitable for DWI with other b values. In addition, ADC-based radiomics was influenced by both diffusion and perfusion effects. Therefore, the intratumor heterogeneity regarding water diffusion is deserved to be explored by radiomics based on DWI with multiple b values. A recent study reported that 63% of ADC-based radiomics features without image normalization were reported to have good inter-observer agreement in patients with cervical cancer (31). In the present study, we found a higher proportion of radiomics features (83%) with good inter-observer agreement, which may be attributable to differences in tumor type and size and image quality. With respect to the statistical approach, a supervised LASSO method was used to construct a radiomics signature to predict histologic grade. It employs nested cross-validation to avoid overfitting for analyzing large numbers of radiomics features with a relatively small sample size and has been used widely in radiomics research (32,33). From 43 candidate features associated with histologic grade, we identified five potential predictors to build an ADC-based radiomics signature. The signature achieved similar performance in discriminating between low- and high-grade tumors in the training and testing cohorts. Given the comparable proportions of low- and high-grade tumors in both cohorts, the similar predictive performances indicate that the LASSO-generated radiomics signature was relatively robust.
The present study had four main limitations. First, this single-center study was relatively small, and Grade II and III tumors were not analyzed separately because of the limited number of samples. Second, in order to extract valid radiomics parameters, some lesions with too few voxels (minimum tumor diameter < 10 mm) were excluded, which inevitably lead to sample bias. Therefore, the radiomics signature derived from our study may be not helpful in the grading of small tumors. Third, delineating the whole tumor volume was challenging because some lesions were infiltrative with indistinct borders, and manual labeling was complex and time consuming. In the future, we will develop an automatic segmentation approach to define the ROIs. Fourth, the possibility of type-I errors > 0.05 could not be avoided without adopting multiple-testing corrections. However, this was not appropriate given the small sample size and exploratory purpose of our preliminary study (34). Finally, the methodology needs to be improved. Most notably, the intra- and inter-scanner variances of radiomics features deserve exploration. In addition, as machine learning techniques develop, deep learning methods have emerged. More advanced machine-learning methods such as convolutional neural networks could help optimize models.
In conclusion, our preliminary study demonstrated that an ADC-based radiomics signature can be a useful and promising non-invasive method for discriminating between low- and high-grade SCCs of the oral tongue and FOM.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received the following financial support for the research, authorship, and/or publication of this article: This work was supported by funds from the National Scientific Foundation of China (91859202, 81771901) and Shanghai Municipal Health Commission (20194Y0104).
