Abstract
Background:
Several thyroid ultrasound (TUS) findings have been associated with an increased risk for thyroid cancer; however, there is no consensus as to the format and style for reporting the results of TUS. The objective of this study was to discover the features indicative of malignancy in thyroid nodules based on TUS, generate an equation using these features that would be predictive of malignancy in thyroid nodules, and stratify the results of this equation into TUS categories reflecting the probability of malignancy.
Methods:
We obtained odds ratios of TUS findings indicative of malignancy and probability of malignancy for each nodule as determined by logistic regression analysis of ultrasound (US) findings in 1694 patients who had US-guided fine-needle aspiration biopsy. We then generated an equation to predict the probability of malignancy based on TUS and developed categories ranging from lowest to highest probability of malignancy. We evaluated the reliability of this equation and the categories using cytology and histopathology information regarding malignancy in the thyroid nodules.
Results:
We characterized 12 aspects of thyroid nodules as seen on TUS and developed an equation to predict Pus, the probability of a nodule being malignant based on these US findings. The equation was Pus = 1/(1 + e−z), where e is the mathematical constant 2.71828 and z is the logit of malignant thyroid nodule. Pus was stratified into five categories based on the probability of a nodule being malignant as indicated by the findings (TUS 1, benign; TUS 2, probably benign; TUS 3, indeterminate; TUS 4, probably malignant; TUS 5, malignant). There was a significant correlation between the cytological category and the TUS 1 through TUS 5 categories (r = 0.491, p < 0.001).
Conclusions:
We propose an equation to predict the probability of malignancy in thyroid nodules based on 12 features of thyroid nodules as noted on TUS. This equation, and the stratification of its results into categories, should be useful in reporting the findings of US for thyroid nodules and in guiding management decisions.
Introduction
US is useful in detecting a focal lesion, determining whether a lesion is cystic or solid, and guiding USG-FNAB. Several US findings are associated with an increased risk for thyroid cancer (6 –8). These include calcifications, hypoechogenecity, irregular margins, absence of a halo, predominantly solid composition, and intranodule vascularity (6,7,9 –13). However, the sensitivities, specificities, and negative and positive predictive values for these criteria vary greatly in different studies (14). Moreover, there is overlap between the features of malignant and benign nodules with respect to US findings and considerable variation in the interpretation and recommendations of radiologists in clinical settings (10,15). In addition, there is no consensus as to what format should be used for reporting the results of thyroid US (TUS). Finally, the terminology used by radiologists is sometimes ambiguous.
We sought, therefore, to develop a model to predict thyroid malignancy based on US findings. The features we evaluated as being possible predictors of malignancy were based, with some modifications, on previously published criteria. We categorized the findings according to the distribution of the probability of a malignancy by using logistic regression analysis. Our goal was to develop a categorical reporting system that could (i) simplify the interpretation of the US findings by radiologists, (ii) reduce confusion among the decision-making physicians as to how incidental nodules should be managed, and (iii) reduce the number of invasive procedures by making them unnecessary.
Materials and Methods
Between July 2001 and December 2006, we retrospectively evaluated 2679 patients who had thyroid nodules and had undergone USG-FNAB at our tertiary medical center. US had been performed in these patients with an HDI 5000 scanner (Advanced Technology Laboratories, Bothell, WA) using electronically focused near-field probes at a bandwidth of 7–12 MHz. The mean age of the patients was 48.8 years (range, 14–83 years). Institutional review board approval was obtained, and informed consent was not required for this retrospective study.
The cytology readings of the USG-FNABs were classified by the category for each nodule, according to a five-category scheme based on published category system: THY1, inadequate; THY2, benign; THY3, indeterminate; THY4, suspiciously malignant; THY5, malignant (16,17). An inadequate specimen (THY1) was defined as the presence of less than five clusters of follicular thyroid cells in each preparation (10). Cytological diagnoses of THY2 included the normal thyroid tissue, nodular hyperplasia or other benign conditions such as thyroiditis, and benign follicular tumor. Cytological diagnoses of THY3 included atypical cellularity without surgical confirmation, the presence of a follicular neoplasm, and the presence of a Hürthle cell neoplasm. THY4 was defined as the presence of the patterns suspicious of a malignant nodule but not diagnostic. THY5 consisted of a cytological or pathological malignant nodule. In cases with more than two cytological results in one patient, the result most likely to be malignant was assigned. Thus, if the patient had one THY1 nodule and one THY2 nodule, the assigned reading was THY2; with one THY2 and one THY3 result, THY3 was assigned to the reading. In cases where more than one USG-FNAB was done, the result of the initial USG-FNAB was assigned, because the aim of this study was related to the usefulness of US for triage of thyroid nodules. USG-FNAB was performed for more than one nodule per patient. We labeled each specimen to identify US and cytology findings. Because in cases with more than two cytological results, the most likely to be malignant result was assigned, enrolled number of USG-FNAB was same as the number of patients.
Histopathlological diagnoses were not available for all thyroid nodules that were considered as benign. We classified nodules as benign if they were classified on cytology as THY2. We classified nodules that were classified on cytology as THY1 and THY3 as benign if they were confirmed by histopathological examination after surgery to be benign. If they were confirmed by histopathological examination to be malignant they were classified as malignant.
The classification of THY4 and THY5 nodules as benign or malignant was based on histopathological examination. Patients with THY4 and THY5 nodules who did not have surgery were excluded from the study. Among 779 nodules with a THY1 reading, 709 were excluded from the analysis because they did not have thyroid surgery and 70 (malignant = 16, benign = 54) were included because they were confirmed by surgery and histopathological examination. Among 390 nodules with a THY3 reading, 256 were excluded from the analysis because they did not have thyroid surgery and 134 (malignant = 114, benign = 20) were included because the diagnosis was confirmed by surgery and histopathological examination. Consequently, statistical analysis was performed with 1694 (63.2%) cases, among the total USG-FNAB results for 2679 patients. The percentage of nodules classified as malignant was 27.4% (464/1694) and those classified as benign was 72.6% (1230/1694). The malignant nodules consisted of 352 papillary carcinomas, seven follicular carcinomas, one medullary carcinoma, and four Hürthle cell carcinomas.
We transferred the US data for each patient in the analysis to picture archiving and communication systems and recorded features of their thyroid nodules. Based on previously published criteria (2,6 –11,15,18 –21) with some modifications, we recorded seven dichotomous US features (goiter, taller, echo texture, presence of micro-, macro-, egg shell calcifications, and lymph node abnormality), three discrete nondichotomous US features (margin, echogenecity, and composition), and two numerical variables (size and tallness). We measured down to one decimal place for numerical variables such as anterior to posterior diameter or transverse diameter. We have summarized the definition of the US features in Table 1.
US findings were analyzed using the χ2 test and t-test for univariate studies and logistic regression analysis for multivariate studies (22,23). Using logistic analysis, we analyzed the relationship between each US finding and the presence of a malignant nodule, as established by histopathological examination of the nodule after surgery for significant variables in χ2 test and t-test. Independent variables (x variables: US findings) and dichotomous dependent variables (y variables: a malignant nodule and a benign nodule) were compared by logistic regression analysis. Using backward elimination method, insignificant variables were excluded from the final regression analysis. After this final analysis, we obtained the regression coefficient (β) and exp(β) of statistically useful US findings by logistic regression analysis. We employed a regression equation to predict the presence of a malignant nodule. Using this equation, we simplified the distribution of the probability of a malignancy for each nodule using 95% and 99% confidence intervals (CI) and summarized the representative US findings, for these to be applicable to a clinical setting. Finally, we correlated the category system with cytological results.
The statistical analysis, including the logistic regression analysis, was performed with a commercial software package (SPSS for Windows, v13.0; SPSS, Chicago, IL). Statistical significance was set at p < 0.05.
Results
By univariate analysis the following criteria were significantly associated (p < 0.05) with a malignant classification based on US: a halo sign; a well-circumscribed, microlobulated, and infiltrative margin; a taller shape; a pattern of echogenecity; a pattern of echotexture; a cystic or solid composition; the presence of microcalcification or a perinodular halo; and the presence of an abnormal lymph node. These were used as independent variables in logistic regression analysis. The analysis indicated that features such as a not-circumscribed margin (including an indistinct or microlobulated margin), tallness (anterior/posterior ratio), marked hypoechogenecity, hypoechogenecity, homogeneous echotexture, solid composition, presence of microcalcification, absence of a perinodular halo, and presence of an adjacent abnormal lymph node predicted a malignant classification; and homogeneous echotexture, perinodular halo, and mainly cystic compostion predicted a benign classification (p < 0.05). We could obtain β and exp(β) of each US finding by logistic regression analysis. A positive regression coefficient means that the risk factor increases the probability of malignancy. Exp(β) indicates odds ratio (OR) of each independent variable in logistic regression analysis, with a value >1.0 suggestive of malignancy. For example, infiltrative margin suggests malignancy 4.076/1.787 times more than taller shape (Table 2).
β, regression coefficient; CI, confidence interval.
Final regression analysis of the selected findings allowed the elaboration of a regression equation to generate logit of malignant thyroid nodule (z), as follows:
The X constants shown in this equation are defined in Table 3.
See text for the equation for determining z.
Using the value for z, the probability of a thyroid nodule being malignant based on US features was calculated as Pus = 1/(1 + e−z), where e is a mathematical constant (e = 2.71828 …). Using this regression equation, Pus of nodules classified as benign was 0.07 to 0.23 with 95% CI and 0.04 to 0.50 with 99% CI. Pus of nodules classified as malignant was 0.37 to 0.90 with 95% CI and 0.07 to 0.97 with 99% CI (Fig. 1). After calculating the Pus for each nodule, we stratified its distribution for each nodule and summarized the representative US findings to make the analysis applicable to a clinical setting. These are referred to as the categorized US findings. Nodules with a Pus of 0 to 0.07 did not have cancer with a 99% CI, and were placed into the “highly specific benign” category (TUS 1). A large portion of benign nodules had Pus less than 0.23, which were placed into the “probably benign” category (TUS 2). If Pus was lower than 23%, further evaluation of the nodule was not needed. Specifically, we assigned nodules that did not differ in the frequency of benign and malignant into the “indeterminate” category, where Pus ranged from 0.24 to 0.50 (TUS 3). TUS 4 (Pus, 0.51 to 0.90) included a large portion of the malignant nodules. TUS 5 (Pus, 0.91 to 1.0) was highly specific for malignant nodules, as there were no benign nodules with a 99% CI. In Table 4 we have recommended guidelines for the management of a thyroid nodule according to its Pus as calculated from the logistic regression analysis.

The probability (Pus) of 1694 thyroid nodules being malignant, as calculated by logistic regression using ultrasound features of the nodules. The probabilities are depicted as benign or malignant (malignancy) on the x-axis, based on the histopathology or cytology evaluation of the nodule. See text for information regarding the probability equation and the method for using histopathology and cytology for nodule classification. The box bar graphs show 95% confidence interval (CI) for the probability of a malignancy of benign and malignant nodules, and error bars show 99% CI.
Probability is the range for Pus, the probability of a nodule being malignant based on an equation derived from US features.
Cytology categories: THY1, inadequate; THY2, benign; THY3, indeterminate; THY4, suspiciously malignant; THY5, malignant.
TUS, thyroid ultrasound; US, ultrasound; FNAB, fine-needle aspiration biopsy.
Figure 2 shows the mean cytological result for the categorized US findings. There is a significant linear correlation between the two findings (r = 0.491, p < 0.001).

Error bars show means and 95% CIs of the cytological results (2: benign; 3: indeterminate; 4: suspiciously malignant; 5: malignant; 1: inadequate cases, not included in this study) for the proposed classification (TUS 1 through TUS 5) of thyroid nodules based on their ultrasonographic features.
Discussion
There is general agreement that US features indicating a high risk for malignancy should be an indication for an FNAB and even further treatment such as surgery. However, some investigators insist that US evaluations for thyroid nodules lack absolute specificity due to the considerable overlapping of the findings for malignant and benign nodules (10,15). Improper interpretations of US findings may lead to economic and legal problems. A false-positive interpretation may lead to increased emotional and medical costs, as it will trigger the request for additional diagnostic studies including biopsies. A false-negative study, however, may miss detection of a thyroid cancer at an early stage when it could be treated using minimally invasive surgery.
We felt the importance of establishing clear guidelines to allow physicians to readily understand the clinical implications of a US report for thyroid nodules. This would be analogous to the Breast Imaging Reporting and Data System (BIRADS) of mammography. A categorical reporting system could be advantageous in the management of nodules with inadequate or intermediate results from USG-FNAB and providing clear guidelines for further management.
Logistic regression analysis is a model of the relationship between observed variables (pathologically confirmed nodule) and a variable that is putatively predictive (US findings) (22,23). The incorporation of US findings provided a predictive model based on a logit formula. Applying logistic regression analysis with OR calculation allows the construction of prediction rules to estimate the probability of a lesion being malignant. In particular, such a set of prediction rules could ultimately be used to stratify each lesion into one of “biopsy,” “follow-up carefully,” or “benign” categories with high reliability and reproducibility. Thus, instead of ambiguous terms such as “probably malignant,” explicit probability estimates for given lesions can be obtained and used in any future decision analysis situation. The five-category diagnostic system of US based on the probability of a malignancy could provide guidelines for the optimum utility of FNA and proper and efficient management of incidental nodules.
Reliance on screening US to determine which patients with a thyroid nodule can be observed is dependent on a low false-negative rate. False-negative results were thought to be mainly due to misinterpretations of the US findings, which implies that the known US features indicating a high risk of malignancy might be insufficient to distinguish every small malignant nodule. Our five-category system allows some nodules of intermediate risk to be labeled as TUS 3, highlighting a risk for cancer but probably not requiring immediate excision. In general, TUS 3 corresponds to the US finding of homogeneous echotexture, hypoechogenecity, a circumscribed margin, and a solid or taller appearance. This had a weak OR for malignancy in logistic regression analysis (1 < OR < 2). In our opinion, the use of an indeterminate category allows the radiologists or clinicians performing US to express their concern that there is an abnormality present that requires correlation with the cytological results, but not necessarily extensive evaluation, particularly in the absence of complete clinical information. The TUS 3 category, not indicative of either benign or malignant, may have advantages as a separate diagnostic category, with implications for clinical management of patients. Long-term follow-up studies are needed to confirm this.
Effective categorization can help determine which nodule needs an invasive procedure performed, such as an FNAB or surgery, and can help save time and effort by skipping the need to redescribe the US features of a benign-appearing or unchanged lesion in a follow-up study. In particular, atypical celluarity as a cytological result, even in Hashimoto's thyroiditis, is confusing to clinicians when trying to make management decisions. According to our results, there is a significant linear trend between the cytological results and the categorized US findings. This coordination is expected to provide diagnostically and therapeutically meaningful information to clinicians. When there is discordance between the two findings, a repeated cytological evaluation should be performed to reduce false-positive results. With multiple nodules, the categorical reporting system may be more useful. Irrespective of how many nodules exist, those categorized as high grade can be attended to, avoiding unnecessary efforts on the obvious benign-appearing ones. Another advantage of the category system is its use in a long-term follow-up study. It overcomes the difficulty in understanding the complex description of some US readings. Even labeling a small nodule can be easier and more convenient in a follow-up study by comparing the previous status of the nodule. Moreover, a change in the US features of a lesion can be reported easily by noting the change from the previous category.
This study, however, has some limitations. First, the population included is not free of selection bias and we were unable to determine if our patients were representative of the general population. The cancer rate in the study population might have been slightly overestimated due to the selection of nodules by a previous palpation or US examination. We thought the selection of cases made it impossible to calculate the sensitivity, specificity, and prevalence of malignant nodules in the general population. However, a larger population was included to compensate for selection bias. Second, US findings were not agreed with authorized consensus but were determined by our own definitions, even though our definitions were based on previously published criteria. There are various nomenclatures for the same finding, and enhancement of Thyroid Imaging Reporting and Data System should follow standardization of nomenclature.
In summary, we devised a US category system to stratify thyroid nodules according to the probability of a malignancy calculated by a regression equation. Although the usefulness of this category system requires confirmation by a prospective study with a general population, our results could provide helpful guidelines in deciding the optimal strategies for management of thyroid nodules.
Disclosure Statement
The authors declare that no competing financial interests exist.
