Validation of Three Scoring Risk-Stratification Models for Thyroid Nodules

Abstract

Background:

To minimize potential harm from overuse of fine-needle aspiration, Thyroid Imaging Reporting and Data Systems (TIRADSs) were developed for thyroid nodule risk stratification. The purpose of this study was to perform validation of three scoring risk-stratification models for thyroid nodules using ultrasonography features, a web-based malignancy risk-stratification system, and a model developed by the Korean Society of Thyroid Radiology and the American College of Radiology.

Methods:

Using ultrasonography images, radiologists assessed thyroid nodules according to the following criteria: internal content, echogenicity of the solid portion, shape, margin, and calcifications. A total of 954 patients (M _age = 50.8 years; range 13–86 years) with 1112 nodules were evaluated at the authors' institute from January 2013 to December 2014. The discrimination ability of the three models was assessed by estimating the area under the receiver operating characteristic curve. Additionally, Hosmer–Lemeshow goodness-of-fit statistics (calibration ability) were used to evaluate the agreement between the observed and expected number of nodules that were benign or malignant.

Results:

Thyroid malignancy was present in 37.2% (414/1112) of nodules. According to the 14-point web-based scoring risk-stratification system, malignancy risk ranged from 4.5% to 100.0% and was positively associated with an increase in risk scores. The areas under the receiver operating characteristic curve of the validation set were 0.884 in the web-based model, 0.891 in the Korean Society of Thyroid Radiology model, and 0.875 in the American College of Radiology model. The Hosmer–Lemeshow goodness-of-fit test indicated that the web-based scoring system showed the best-calibrated result, with a p-value of 0.078.

Conclusion:

The three scoring risk-stratification models using the ultrasonography features of thyroid nodules to stratify malignancy risk showed acceptable predictive accuracy and similar areas under the curve. The web-based scoring system demonstrated the strongest agreement in calibration ability analysis. The easily accessible automated web-based scoring risk-stratification system may overcome the complexity of the various Thyroid Imaging Reporting and Data System guidelines and provide simplified guidance on personalized and optimal management in real practice.

Introduction

Although widespread use of ultrasonography (US) has exponentially increased thyroid nodule detection to about 19.0–67.0%, malignancy is found in only about 9.0–15.0% of nodules evaluated using fine-needle aspiration (FNA) (1 –3). To minimize potential harm from overuse of FNA, the Thyroid Imaging Reporting and Data System (TIRADS) was developed for thyroid nodule risk stratification (4 –7). Two previous studies described the US patterns of thyroid nodules and the related pattern-associated rates of malignancy, but these patterns were not applicable to some thyroid nodules with multiple US features (4,6). Kwak et al. (7) weighted each suspicious US feature with the same risk of malignancy, and each category encompassed a wide range of malignancy risk, complicating its stratification capabilities. This “categorization” of TIRADS is a pattern-oriented approach that combines the US features of thyroid nodules and subdivides them into four or five categories. Thus, each category includes a wide range of malignancy risk. Meanwhile, for malignancy risk stratification, several attempts have been made to convert this “pattern-based” approach to a “score-based” approach (8,9). Such scoring risk-stratification models can be implemented into clinical practice with varying volumes of patients and in sites with varying levels of professional expertise and can be readily linked to reporting templates (10). Moreover, they permit more personalized management with >10 different ranges of malignancy risk scores for thyroid nodules. Therefore, the Korean Society of Thyroid Radiology (KSThR) has proposed a prediction model that assigns scores to US features (8). However, the scoring risk-stratification models are underused due to their complexity, and more clinicians have adopted an approach based on pattern recognition. To overcome these drawbacks, Choi et al. (11) developed a web-based automatic scoring system. More recently, the American College of Radiology (ACR) developed guidelines based on the allocation of scores for US features (12).

With the development of various guidelines and its widespread use, the role of US in the personalized management of patients with thyroid nodules has been further emphasized, which necessitates the validation of these scoring risk-stratification models. The purpose of this study was to validate the web-based prediction system and the scoring risk-stratification models developed by the KSThR and the ACR as risk-stratification tools for thyroid nodules.

Patients and Methods

This retrospective study was approved by the Institutional Review Board of the Chung-Ang University Hospital; the requirement for informed consent was waived for data evaluation. Written informed consent for routine thyroid US and US-guided procedures was obtained from all patients before each US examination. The patient cohort was validated using the US characteristics of thyroid nodules to stratify the risk of malignancy using the scoring system developed by Choi et al. (11), an online automatically calculated scoring system (www.gap.pe.kr/thyroidnodule.php).

Study population

The patient cohort was retrospectively collected from patients assessed from January 2013 to December 2014. Only patients who met at least one of the following criteria were included: (i) nodule size >5 mm; (ii) patients who underwent surgery or core-needle biopsy (CNB) after thyroid US; (iii) patients who underwent US-guided FNA cytology for benign thyroid lesions at least twice within a one-year interval (Bethesda category 2); and (iv) patients who underwent initial US-guided FNA cytology and US follow-up (>12 months after US-guided FNA cytology) for benign thyroid lesions. Nodules with indeterminate or non-diagnostic results were excluded from the study unless they were followed by diagnostic FNA or surgery. Finally, 954 patients (M _age = 50.8 years; range 13–86 years) with 1112 nodules were included.

US findings for potential diagnostic determinants

US images for the evaluation of thyroid nodules were obtained using an iU22 ultrasound system (Philips Healthcare, Bothell, WA) equipped with a 50 mm linear array transducer with a bandwidth of 7–12 MHz. The scanning protocol in all cases included both transverse and longitudinal real-time imaging of thyroid nodules. Thyroid radiologists with 8 and 10 years of clinical experience in performing and evaluating thyroid US data reviewed all of the US images and reached a consensus.

When analyzing the US images, the radiologists were asked to assess thyroid nodules using criteria obtained from the literature (13 –16), including internal content, echogenicity of the solid portion, shape, margin, and calcifications. Although the presence of intranodular vascularity might increase the risk of malignancy, there are no consistent results regarding the association of an intranodular vascularity pattern with the risk of malignancy (17 –19). Accordingly, three scoring systems did not recommend its inclusion on the basis of inconsistent literature about its value in differentiating malignancy from benign nodules. Therefore, vascularity was excluded as a criterion. The internal content of a nodule was categorized according to the ratio of the cystic to the solid portion within a nodule, that is, solid (≤10% cystic), predominantly solid (>10% cystic and ≤50% cystic), predominantly cystic (>50% cystic), and spongiform appearance. A spongiform appearance was defined as the aggregation of multiple microcystic components consisting of >50% of the total nodule volume (15), and the solid component of spongiform nodules was not assessed. The echogenicity of the solid portion was classified as hyper- or isoechogenic, hypoechogenic, or marked hypoechogenic. When the echogenicity of the nodule was similar to that of the surrounding thyroid parenchyma, it was classified as isoechogenic. Hypoechogenicity was defined as decreased echogenicity compared to the thyroid parenchyma. Marked hypoechogenicity was defined as decreased echogenicity compared to that of the strap muscles (14). The nodule shape was categorized as follows: ovoid to round (when the anteroposterior diameter of the nodule was equal to or less than its transverse diameter on a transverse or longitudinal plane), taller-than-wide (when the anteroposterior diameter of a nodule was longer than its transverse diameter on a transverse or longitudinal plane), or irregular (when a nodule was neither ovoid to round nor taller-than-wide). Margins were classified as well-defined smooth, microlobulated or spiculated, or ill-defined (15). Calcifications were categorized as microcalcifications, macrocalcifications, rim calcifications, or none. Microcalcifications were defined as calcifications ≤1 mm in diameter and were visualized as tiny, punctate, hyperechoic foci with or without acoustic shadows. If tiny, bright reflectors with a clear-cut, comet-tail artifact were observed with conventional US, these were considered to be colloid. Macrocalcifications were defined as hyperechoic foci >1 mm, and rim calcifications were defined as nodules with peripheral curvilinear or eggshell calcifications (13,20). When nodules had both types of calcifications, that is, macrocalcifications including rim calcifications intermingled with microcalcifications, the nodule was considered to have microcalcifications.

Between three scoring risk-stratification models, there was difference in US feature definitions. Therefore, during image analysis, each US feature was categorized according to each of the scoring model definitions; for example, irregular margin in the ACR TIRADS was regarded as spiculated margin in the web-based scoring system. Additionally, extrathyroidal extension was evaluated according to the definitions in the ACR TIRADS (12). An online automatically calculated scoring system (www.gap.pe.kr/thyroidnodule.php) was used for malignancy risk stratification according to US thyroid characteristics. This system additionally provided the malignancy rate using various guidelines, including the web-based TIRADS (11), Korean TIRADS (16), the American Thyroid Association (ATA) guidelines (5), and the TIRADS proposed by Russ et al. (21). Suspicious nodules were defined as those with a score of ≥8 with the web-based TIRADS, high suspicion with Korean TIRADS, high suspicion with the ATA guidelines, and a score of ≥4B with the TIRADS proposed by Russ et al. (21).

Reference standard

For each thyroid nodule, the final diagnosis was determined by either histopathology or radiological follow-up. For malignant nodules, the pathological diagnosis was confirmed by surgery or CNB. For benign nodules, the pathological diagnosis was confirmed by surgery or CNB, FNA repeated at least twice with benign results, or a benign result on FNA and no change or reduced size on follow-up US (>12 months).

Data and statistical analysis

Multivariate logistic regression analysis was performed to estimate the risk of thyroid cancer associated with US findings. Risk score of each US feature of thyroid nodule was assigned, and total risk score was calculated according to the web-based model developed by Choi et al. (11) and those developed by the KSThR (8) and the ACR (12) (Table 1). Validation of the models was performed separately by measuring the discrimination and calibration abilities. The discrimination ability of the models was assessed by estimating the area under the receiver operating characteristic (ROC) curve. p-Values of <0.05 were considered statistically significant. Second, Hosmer–Lemeshow goodness-of-fit statistics (calibration ability) were used to evaluate the agreement between the observed and expected number of nodules that were benign or malignant. All statistical analyses were carried out using IBM SPSS Statistics for Windows v23.0 (IBM Corp., Armonk, NY).

Table 1.

Thyroid Imaging Reporting and Data System Point Allocation Scheme by Three Scoring Systems

US characteristics	Web-based point allocation	KsThR point allocation	ACR point allocation
Composition
Solid	2	0	2
Predominantly solid	0	0	1 (mixed)
Predominantly cystic	0	0	1 (mixed)
Cystic	N/A	N/A	0
Spongiform	0	0	0
Shape
Ovoid to round	0	0	0 (wider-than-tall)
Taller-than-wide	2	1	3
Irregular	0	0	N/A
Margin
Well-defined smooth	0	0	0
Spiculated/microlobulated	2	5	2 (lobulated/irregular)
Ill-defined	1	1	0
Extrathyroidal extension	N/A	N/A	3
Echogenicity
Iso- or hyperechogenicity	0	0	1
Hypoechogenicity	2	2	2
Marked hypoechogenicity	4	6	3
Anechoic	N/A	N/A	0
Calcification
No calcification	0	0	0
Microcalcification	3	2	3
Macrocalcification	0	0 (include rim)	1
Rim calcification	2	N/A	2

US, ultrasound; KsThR, Korean Society of Thyroid Radiology; ACR, American College of Radiology.

Results

Thyroid malignancy was present in 37.2% (414/1112) of the nodules in the present series, of which 78.7% (326/414) were papillary thyroid carcinomas, 1.9% (8/414) were follicular carcinomas, 13.0% (54/414) were follicular variant papillary thyroid carcinomas, and 6.3% (26/414) were other malignancies (Supplementary Table S1; Supplementary Data are available online at www.liebertpub.com/thy). Of the benign nodules, 9.9% (69/698) were confirmed with surgery. The nodule size ranged from 0.5 to 7.0 cm (M = 14.1 ± 9.9 cm). The proportion of nodules >1.0 cm in diameter was 57.9% (644/1112). By multivariate logistic regression analysis, taller-than-wide shape, a spiculated/microlobulated margin, marked hypoechogenicity, and microcalcification were US features that were significantly different between benign and malignant nodules (Supplementary Table S2).

Table 2 presents the risk of malignancy according to the 14-point web-based scoring risk-stratification system, which was 4.5% in thyroid nodules without suspicious malignant US features (a score of 0). The malignancy risk increased as the risk score increased and peaked at 100.0% in all scoring risk-stratification models (odds ratio [OR] = 1.808 [confidence interval (CI) 1.690–1.934], p < 0.001 for the web-based model; OR = 1.815 [CI 1.691–1.949], p < 0.001 for the KSThR model; OR = 1.750 [CI 1.645–1.861], p < 0.001 for the ACR TIRADS). The area under the curve (AUC) of the web-based scoring risk-stratification system was 0.884, with a CI of 0.863–0.905 (p < 0.001). According to the KSThR scoring risk-stratification model, the risk of malignancy was 4.1% in thyroid nodules without suspicious malignant US features and ranged from 4.1% to 100.0%, with an AUC of 0.891 ([CI 0.871–0.911]; p < 0.001; Table 3). According to the ACR TIRADS, the malignancy risk ranged from 4.5% to 100.0%, with an AUC of 0.875 ([CI 0.853–0.896]; p < 0.001; Table 4). Among category 2 (not suspicious) and category 5 (highly suspicious) nodules, about 7.0% (78/1112) of all nodules were above the risk threshold. Overall, 93.0% (1034/1112) of all nodules were below the established ACR TIRADS specified risk thresholds. The ROC curves for each model are shown in Figure 1. The Hosmer–Lemeshow goodness-of-fit test indicated that the web-based scoring risk-stratification model showed the best-calibrated result, with a p-value of 0.078, indicating the strongest agreement between the observed and model-predicted number of nodules that were malignant or benign across all of the strata, whereas the others showed p-values of <0.05 (Table 5).

FIG. 1.

Scoring system performances of the web-based, KSThR, and ACR systems with areas under the receiver operating characteristics (ROC) curves of 0.884, 0.891, and 0.875, respectively. KSThR, Korean Society of Thyroid Radiology; ACR, American College of Radiology.

Table 2.

Malignancy Risk According to the Web-Based Scoring System

Risk score	Malignancy risk (%)	Total number	Number of malignancies
0	4.5%	155	7
1	13.0%	23	3
2	4.5%	156	7
3	18.6%	102	19
4	15.9%	113	18
5	30.3%	155	47
6	44.7%	38	17
7	54.8%	84	46
8	78.8%	99	78
9	74.4%	39	29
10	94.0%	67	63
11	100.0%	41	41
12	92.3%	13	12
13	100.0%	27	27
Total	37.2%	1112	414

Table 3.

Malignancy Risk According to the KSThR Scoring System

Risk score	Malignancy risk (%)	Total number	Number of malignancies
0	4.1%	269	11
1	11.7%	94	11
2	12.5%	176	22
3	32.0%	175	56
4	55.9%	59	33
5	73.8%	61	45
6	78.9%	38	30
7	52.4%	42	22
8	78.9%	57	45
9	100.0%	24	24
10	97.8%	46	45
11	100.0%	7	7
12	96.6%	29	28
13	100.0%	8	8
14	100.0%	27	27
Total	37.2%	1112	414

Table 4.

Malignancy Risk According to the ACR Scoring System

Risk score	Malignancy risk (%)	Total number	Number of malignancies
0	—	0	0
1	0.0%	9	0
2	8.0%	50	4
3	4.5%	265	12
4	16.9%	195	33
5	50.0%	28	14
6	18.3%	71	13
7	42.0%	131	55
8	46.2%	52	24
9	66.1%	62	41
10	79.4%	131	104
11	84.6%	26	22
12	100.0%	44	44
13	100.0%	44	44
14	100.0%	4	4
Total	37.2%	1112	414

Table 5.

Validation of the Risk Score Models Using Discrimination and Calibration Abilities

		CI
	AUC	Lower	Upper	p-Value	Hosmer–Lemeshow test p-Value
Web-based	0.884	0.863	0.905	<0.001	0.078
KSThR	0.891	0.871	0.911	<0.001	<0.001
ACR	0.875	0.853	0.896	<0.001	0.040

Data indicate the number of lesions.

AUC, area under the curve; CI, confidence interval.

The web-based scoring risk-stratification model determined the malignancy rate of thyroid nodules with suspicious US features according to various malignancy risk-stratification systems. In the present study population, the web-based system accurately predicted the malignancy rate in 87.4% of nodules (250/286), showing higher positive predictive value than the various TIRADS guidelines applied by this system (i.e., French TIRADS 74.0% [322/435]; ATA guidelines 81.4% [293/360]; and K-TIRADS 81.7% [286/350]; Supplementary Table S3). The ATA pattern-based system could not classify about 20.6% (229/1112) nodules.

Discussion

The current validation study has revealed that the web-based, KSThR, and ACR scoring risk-stratification models show acceptable predictive accuracy with similar AUCs. In particular, the web-based scoring system showed the highest agreement in calibration ability. Furthermore, the advantage of this web-based scoring system was found to be rapid, with its online automatically calculated system overcoming the complexity of previous scoring risk-stratification models.

Several TIRADS have been developed for malignancy risk stratification (4,6,7) that incorporate US features to categorize thyroid nodules and recommend cytological diagnosis. Two previous studies described the US patterns of thyroid nodules and the related pattern-associated rates of malignancy, but these patterns were not applicable to some thyroid nodules with multiple US features (4,6). Kwak et al. (7) used suspicious US features and related the risk of malignancy to the number of malignancies, but they weighted the same risk of malignancy to each suspicious US feature, and one category (category 4c) was associated with a wide range of malignancy risk (21.0–91.9%). To integrate the combination of specific US features with different odds ratios for malignancy risk, quantitative grading systems were established that stratify malignancy risk by combining the suspicious US features possessed by thyroid nodules and categorizing the results (16). One such model— the Korean TIRADS—was recently published by the KSThR (16), and was validated prospectively in a multicenter study (22). Recently, the ATA management guidelines for thyroid nodules also stratified the risk of malignancy into five categories (5). Meanwhile, for malignancy risk stratification, several attempts have been made to convert this “pattern-based” approach to a “score-based” approach. Previously, Park et al. (9) proposed an equation for predicting the probability of malignancy based on 12 US features, but it was difficult to assign each thyroid nodule to a proposed equation in clinical practice. To overcome the associated complexity, the KSThR (8) developed a prediction model that assigned a different risk score to each suspicious US feature and obtained the malignancy risk by summing the total score. The score-based approach has the advantage of achieving more personalized management, with >10 subdivided ranges for the malignancy risk scores of the thyroid nodules. However, in real practice, the complexity and lack of congruence of these previous systems have limited their adoption and may have been more cumbersome for those more used to pattern-based analysis. Due to a greater emphasis on personalized management of patients with thyroid nodules and to overcome the low reproducibility and practicality, Choi et al. (11) developed a web-based automatic scoring risk-stratification system using US characteristics. This system also classifies nodules according to various guidelines, such as the French TIRADS (23), ATA guidelines (5), and Korean TIRADS (16). More recently, the ACR developed TIRADS (12) by allocating points to more suspicious features, summing the points, and determining the TIRADS category of nodules. However, validation of these scoring risk-stratification models remains to be conducted.

All scoring risk-stratification models investigated in the present study proved to be acceptable tools for effective malignancy risk stratification in clinical practice, with an AUC range of 0.875–0.891. A recent study evaluating thyroid nodules according to the 2015 ATA guidelines yielded an AUC ranging from 0.721 to 0.839 with respect to thyroid nodule size (24). Moreover, the web-based scoring system showed agreement in both discrimination and calibration abilities. This may be due to the involvement of multiple centers in the design of this web-based predictive model, and the results may thus be more reproducible and generalizable. Choi et al. (11) collected data from nine affiliated hospitals and used true external validation set data from different time periods and institutions. In contrast, the KSThR (8) used split-sample development and validation sets. In addition, the strongest advantage of the web-based scoring system is that it is linked to an online risk calculator (www.gap.pe.kr/thyroidnodule.php), which permits rapid calculation and immediate estimation of the possible malignancy risk based on US findings during their interpretation after US examination and can thus guide clinical decision making regarding the need for biopsy. For more convenient use of the web-based scoring system, it is expected to be embedded in some computerized system that transfers results directly into a report or electronic medical record form. Additionally, it is believed that this web-based scoring risk-stratification system could also help to increase biopsy efficacy for sub-centimeter nodules. The malignancy rate in thyroid nodules <1 cm in size ranged from 0.0% to 100.0%, with a tendency for increased malignancy risk according to suspicious features (Supplementary Table S4). The most recent ATA (5) and ACR TIRADS (12) state that thyroid nodules <1 cm in size do not need to be evaluated with cytology unless aggressive features such as lymph node metastases, distant metastases, and apparent extrathyroidal extension are found. However, there are differing opinions on the management of small nodules, and some reports from Japan (25) and Korea (16) recommend using FNA for sub-centimeter nodules >5 mm with highly suspicious US patterns. Accordingly, the ACR TIRADS recognizes that some advocate active surveillance, ablation, or lobectomy for papillary microcarcinomas, and therefore biopsy of 5–9 mm highly suspicious nodules may be appropriate under certain circumstances (26 –29). Recently, there has been a published article dealing with risk stratification of ideal, appropriate, and inappropriate candidates for active surveillance (28). Taken together, recently published guidelines recommend a more conservative approach for sub-centimeter lesions with or without biopsy-proven papillary thyroid carcinomas.

Clinicians are often interested in developing points-based risk scoring systems and value simple scoring systems, as well as the ease with which they can be used in routine clinical practice (30). The recently developed ACR TIRADS determines the final category according to the following process: nodule detection, evaluation of the score of each US feature, summation of scores, determination of the category, size matching, and FNA decision. In the current era of mobile computing and web-based risk calculators, this scoring risk-stratification model can be more readily applied in real practice due to straightforward implementation of the automatic score-calculating system (11). In accordance with this trend, Cheng et al. (31) have recently developed an automatic score-calculating system based on the ACR TIRADS (12). Regarding the tendency for personalized medicine, the three scoring risk-stratification models provide >10 ranges of malignancy risk scores. Indeed, the score of the ACR TIRADS can be up to 16. In a recent validation study of the ACR TIRADS (10), the cancer risk of each category was 0.3%, 1.5%, 4.8%, 9.1%, and 35.0%. However, category 5 (highly suspicious), allocated a median score of ≥7, comprises a relatively low malignancy risk (35.0%) and a wide range of malignancy probability. This may be because this scoring risk-stratification model was based on a review of the literature, expert opinion, and preliminary analysis of patients, with arbitrary score assignation rather than by multivariate analysis to estimate the thyroid cancer risk associated with each US feature. For risk analysis, the odds ratios for each US feature from the logistic regression model were used (8,11), and this statistical aspect may permit more effective risk stratification with future revisions of the ACR TIRADS. Regarding the lowest malignancy risk assigned to benign nodules without any suspicious malignant features or a score of 0, a lower biopsy rate can be expected. According to the web-based scoring risk-stratification model, the overall risk of malignancy for score 0 nodules was <5% (11). Therefore, the number of unnecessary biopsies may be reduced. The current results equated to 4.1% in score 0 nodules based on the web-based scoring system. In contrast, the risk of malignancy was 7.3% in the training set and 6.2% in the validation set of the KSThR (8), which are higher than in the benign category of The Bethesda System for Reporting Thyroid Cytopathology (32).

Recently, an artificial intelligence-adapted US machine was developed for thyroid nodule characterization (S-Detect for Thyroid). Choi et al. (33) validated the system and demonstrated that it shows satisfactory diagnostic performance (sensitivity 90.7%; specificity 74.6%) for thyroid malignancy. The thyroid US computer-aided diagnosis system using artificial intelligence investigated in this study was installed within the US system, allowing real-time decision making regarding the need for FNA (33). It is expected that future implementation of the web-based scoring risk-stratification model and the artificial intelligence-adapted US machine will guide and simplify personalized management and reduce analysis time.

The present study has some limitations. First, due to its retrospective design, a selection bias is inevitable, and it could not be determined whether the patients are representative of the general population. The usefulness of this scoring system requires confirmation by prospective studies with large cohorts representing samples from the general population. Second, the interobserver variability in the interpretation of US images between the two radiologists was not evaluated. Third, many benign nodules (90.1%) were not confirmed by surgery. Fourth, the malignancy rate of thyroid nodules included in the analysis was relatively high (37.2%), which may due to the fact that the authors' institution is a referral center and many nodules with suspicious US features warranted biopsy; many patients with indeterminate results who were lost to follow-up were omitted. Lastly, about 42.1% of nodules measured <1 cm in this study. The fact that very small nodules were subject to biopsy can be questioned. However, the aim of this study was to test the probability for malignancy as independently as possible. As discussed above, there are differing opinions on the management of small thyroid nodules, and patient preferences have also to be considered. A recently published study (34) demonstrated that biopsy should be considered for thyroid nodules <1 cm prior to active surveillance to prevent unnecessary active surveillance and patient anxiety. For these reasons, size has been included as an essential data point for the establishment of the management plan, but not for risk estimation. Previous literature has shown that the risk of malignancy is not dependent on nodule size, but the management (observation, lobectomy, and thyroidectomy) can depend on the nodule size, among other factors (13,35,36). Future studies combining US findings and other factors such as size, clinical characteristics, and family history of cancer may refine management plans offered through this web-based system.

In summary, the easily accessible and reliable automated scoring risk-stratification system described herein will support clinical decision making, increase FNA efficacy, compensate for the complexity of various TIRADS, and enable more personalized and optimized management. The current study is the first to evaluate the diagnostic efficacy of various scoring risk-stratification models, and it is expected that it will be followed by a future prospective study with a large cohort.

Footnotes

Author Disclosure Statement

The authors declare no conflicts of interest.

References

Brander

, Viikinkoski

, Nickels

, Kivisaari

. 1991. Thyroid gland: US screening in a random adult population. Radiology, 181:683–687.

Frates

, Benson

, Charboneau

, Cibas

, Clark

, Coleman

, Cronan

, Doubilet

, Evans

, Goellner

, Hay

, Hertzberg

, Intenzo

, Jeffrey

, Langer

, Larsen

, Mandel

, Middleton

, Reading

, Sherman

, Tessler

. 2005. Management of thyroid nodules detected at US: Society of Radiologists in Ultrasound consensus conference statement. Radiology, 237:794–800.

Tan

, Gharib

. 1997. Thyroid incidentalomas: management approaches to nonpalpable nodules discovered incidentally on thyroid imaging. Ann Intern Med, 126:226–231.

Cheng

, Lee

, Lin

, Chuang

, Chien

, Liu

. 2013. Characterization of thyroid nodules using the proposed thyroid imaging reporting and data system (TI-RADS). Head Neck, 35:541–547.

Haugen

, Alexander

, Bible

, Doherty

, Mandel

, Nikiforov

, Pacini

, Randolph

, Sawka

, Schlumberger

, Schuff

, Sherman

, Sosa

, Steward

, Tuttle

, Wartofsky

. 2016. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid, 26:1–133.

Horvath

, Majlis

, Rossi

, Franco

, Niedmann

, Castro

, Dominguez

. 2009. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J Clin Endocrinol Metab, 94:1748–1751.

Kwak

, Han

, Yoon

, Moon

, Son

, Park

, Jung

, Choi

, Kim

. 2011. Thyroid imaging reporting and data system for US features of nodules: a step in establishing better stratification of cancer risk. Radiology, 260:892–899.

Kwak

, Jung

, Baek

, Choi

, Jung

, Kim

, Lee

, Moon

, Park

, Ryu

, Shin

, Son

, Sung

, Na DG; Korean Society of Thyroid Radiology; Korean Society of

Radiology

. 2013. Image reporting and characterization system for ultrasound features of thyroid nodules: multicentric Korean retrospective study. Korean J Radiol, 14:110–117.

Park

, Lee

, Jang

, Kim

, Yi

, Lee

, Kim

. 2009. A proposal for a thyroid imaging reporting and data system for ultrasound features of thyroid carcinoma. Thyroid, 19:1257–1264.

10.

Middleton

, Teefey

, Reading

, Langer

, Beland

, Szabunio

, Desser

. 2017. Multiinstitutional analysis of thyroid nodule risk stratification using the American College of Radiology Thyroid Imaging Reporting and Data System. AJR Am J Roentgenol, 208:1331–1341.

11.

Choi

, Baek

, Shim

, Lee

, Shong

, Ha

, Lee

. 2015. Web-based malignancy risk estimation for thyroid nodules using ultrasonography characteristics: development and validation of a predictive model. Thyroid, 25:1306–1312.

12.

Tessler

, Middleton

, Grant

, Hoang

, Berland

, Teefey

, Cronan

, Beland

, Desser

, Frates

, Hammers

, Hamper

, Langer

, Reading

, Scoutt

, Stavros

. 2017. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol, 14:587–595.

13.

Moon

, Baek

, Jung

, Kim

, Kwak

, Lee

, Na

, Park

. 2011. Ultrasonography and the ultrasound-based management of thyroid nodules: consensus statement and recommendations. Korean J Radiol, 12:1–14.

14.

Kim

, Park

, Chung

, Oh

, Kim

, Lee

, Yoo

. 2002. New sonographic criteria for recommending fine-needle aspiration biopsy of nonpalpable solid nodules of the thyroid. AJR Am J Roentgenol, 178:687–691.

15.

Moon

, Jung

, Lee

, Na

, Baek

, Lee

, Kim

, Byun

, Lee

. 2008. Benign and malignant thyroid nodules: US differentiation—multicenter retrospective study. Radiology, 247:762–770.

16.

Shin

, Baek

, Chung

, Ha

, Kim

, Lee

, Lim

, Moon

, Na

, Park

, Choi

, Hahn

, Jeon

, Jung

, Kim

, Kwak

, Lee

, Park

, Sung

. 2016. Ultrasonography diagnosis and imaging-based management of thyroid nodules: revised Korean Society of Thyroid Radiology consensus statement and recommendations. Korean J Radiol, 17:370–395.

17.

, Ding

, Xu

, Song

, Huang

, Wang

. 2014. Diagnostic performances of various gray-scale, color Doppler, and contrast-enhanced ultrasonography findings in predicting malignant thyroid nodules. Thyroid, 24:355–363.

18.

Moon

, Kwak

, Kim

, Son

, Kim

. 2010. Can vascularity at power Doppler US help predict thyroid malignancy?. Radiology, 255:260–269.

19.

Zhou

, Zhou

, Zhan

, Zhou

, Dong

. 2014. Maximal, minimal, and mean pulsed Doppler parameters: which should be utilized in the diagnosis of thyroid nodules?. Clin Radiol, 69:e477–484.

20.

, Kim

, Ryoo

, Jung

. 2016. Thyroid nodules with isolated macrocalcification: malignancy risk and diagnostic efficacy of fine-needle aspiration and core needle biopsy. Ultrasonography, 35:212–219.

21.

Russ

. 2016. Risk stratification of thyroid nodules on ultrasonography with the French TI-RADS: description and reflections. Ultrasonography, 35:25–38.

22.

, Moon

, Na

, Lee

, Choi

, Kim

. 2016. A multicenter prospective validation study for the Korean Thyroid Imaging Reporting and Data System in patients with thyroid nodules. Korean J Radiol, 17:811–821.

23.

Russ

, Royer

, Bigorgne

, Rouxel

, Bienvenu-Perrard

, Leenhardt

. 2013. Prospective evaluation of thyroid imaging reporting and data system on 4550 nodules with and without elastography. Eur J Endocrinol, 168:649–655.

24.

, Gu

, Ye

, Xu

, Wu

, Shao

, Liu

, Lu

, Hua

, Shi

, Liang

, Xu

, Tang

, Liu

, Wu

. 2017. Thyroid nodule sizes influence the diagnostic performance of TIRADS and ultrasound patterns of 2015 ATA guidelines: a multicenter retrospective study. Sci Rep, 7:43183.

25.

Ito

, Oda

, Miyauchi

. 2016. Insights and clinical questions about the active surveillance of low-risk papillary thyroid microcarcinomas [Review]. Endocr J, 63:323–328.

26.

Leboulleux

, Tuttle

, Pacini

, Schlumberger

. 2016. Papillary thyroid microcarcinoma: time to shift from surgery to active surveillance?. Lancet Diabetes Endocrinol, 4:933–942.

27.

Ito

, Miyauchi

, Inoue

, Fukushima

, Kihara

, Higashiyama

, Tomoda

, Takamura

, Kobayashi

, Miya

. 2010. An observational trial for papillary thyroid microcarcinoma in Japanese patients. World J Surg, 34:28–35.

28.

Brito

, Ito

, Miyauchi

, Tuttle

. 2016. A clinical framework to facilitate risk stratification when considering an active surveillance alternative to immediate biopsy and surgery in papillary microcarcinoma. Thyroid, 26:144–149.

29.

Brito

, Hay

, Morris

. 2014. Low risk papillary thyroid cancer. BMJ, 348:g3045.

30.

Austin

, Lee

, D'Agostino

, Fine

. 2016. Developing points-based risk-scoring systems in the presence of competing risks. Stat Med, 35:4056–4072.

31.

Phillip

. 2017. TIRADS Calculator for Thyroid Nodules. Available at: www-hsc.usc.edu/∼phillimc/calc/tirads.html (accessed July 13, 2017 ).

32.

Cibas

, Ali

. 2009. The Bethesda System For Reporting Thyroid Cytopathology. Am J Clin Pathol, 132:658–665.

33.

Choi

, Baek

, Park

, Shim

, Kim

, Shong

, Lee

. 2017. A computer-aided diagnosis system using artificial intelligence for the diagnosis and characterization of thyroid nodules on ultrasound: initial clinical assessment. Thyroid, 27:546–552.

34.

, Kim

, Baek

. 2017. Detection of malignancy among suspicious thyroid nodules <1 cm on ultrasound with various Thyroid Image Reporting and Data Systems. Thyroid, 27:1307–1315.

35.

Cooper

, Doherty

, Haugen

, Kloos

, Lee

, Mandel

, Mazzaferri

, McIver

, Pacini

, Schlumberger

, Sherman

, Steward

, Tuttle

. 2009. Revised American Thyroid Association management guidelines for patients with thyroid nodules and differentiated thyroid cancer. Thyroid, 19:1167–1214.

36.

Gharib

, Papini

, Paschke

, Duick

, Valcavi

, Hegedus

, Vitti

. 2010. American Association of Clinical Endocrinologists, Associazione Medici Endocrinologi, and European Thyroid Association medical guidelines for clinical practice for the diagnosis and management of thyroid nodules. Endocr Pract, 16:1–43.