Abstract
Background:
Fine-needle aspiration (FNA) with ultrasonography guidance is one of the optimal techniques for the diagnostic evaluation of thyroid nodules. A significant subset of thyroid FNAs continues to be inadequate for interpretation, which potentially leads to increased costs from repeat aspirations. Numerous studies have been published regarding the influence of rapid onsite evaluation (ROSE) by cytopathologists on thyroid FNAs, some indicating that FNA is more likely to be adequate for interpretation with ROSE, while others refute this idea. To our knowledge, no meta-analysis of the literature on this subject has been undertaken.
Methods:
We searched MEDLINE and EMBASE using the following search string: (needle biopsy) AND (assessment or onsite OR onsite or immediate or rapid)/title or abstract. There were no restrictions on study design, language, anatomic site, or time period. Only studies comparing two arms (with/without ROSE) at a single site were eligible for inclusion. Potentially relevant studies were subjected to a citation search (forward search) and reference search (backward search) using SCOPUS. Statistical calculations were performed using Stata Release 12. Meta-analysis was completed using a random-effect model as implemented in the metan routine in Stata.
Results:
An initial search obtained 2179 studies from MEDLINE and EMBASE, and screening yielded 71 potentially relevant studies. A focused review of this subset resulted in seven full studies and one abstract that met our inclusion criteria. Our citation search using SCOPUS yielded no new studies. Overall, the average adequacy rate was 83% without ROSE compared to 92% with ROSE. Visual inspection of the data suggested that the improvement in adequacy due to ROSE may be related to the adequacy rate without ROSE. Metaregression analysis showed that the change in the adequacy rate was strongly correlated (t=−12.7, p<0.001) with the non-ROSE adequacy rate. In addition, the non-ROSE adequacy rate explained all, but 10% of the residual between study variability in the change in the adequacy rates due to ROSE.
Conclusions:
ROSE is generally associated with an improvement in adequacy, but the impact of ROSE depends heavily on the initial adequacy rate. Sites with lower initial adequacy rates can benefit the most from the implementation of ROSE.
Introduction
Numerous studies have been published regarding the influence of ROSE on FNA specimen adequacy in evaluating thyroid nodules. Some studies indicate that FNA is more likely to be adequate for interpretation with ROSE (1,5 –9), while others refute this idea (10). To our knowledge, this body of literature has never been reviewed. We therefore performed a systematic review and meta-analysis of the literature relating to the impact of ROSE for FNA procedures on thyroid nodules to determine the impact of ROSE on specimen adequacy. Our review will include only those articles with two study arms (an FNA subgroup with ROSE and an FNA subgroup without ROSE), as such studies sufficiently limit possible confounding factors (e.g., institutional variations in procedural technique and different operator/cytopathologist personnel) and lend themselves to optimal statistical analysis.
Materials and Methods
Literature search
We followed guidelines for systematic reviews of diagnostic accuracy studies (Fig. 1) (11,12). We searched MEDLINE and EMBASE on November 6, 2011, using the following search string: (needle biopsy) AND (assessment or onsite OR onsite or immediate or rapid)/title or abstract. There was no restriction on study design, language, anatomic site, or time period of study. Only studies comparing two arms (with ROSE vs. without ROSE) at a single institutional site were eligible for inclusion. The titles and abstracts of the resulting set of studies were screened in two stages by the two authors (B.L.W. and R.L.S.). In the first stage, we included studies that reported any outcome (e.g., accuracy, adequacy, cost, and complication rate) associated with ROSE. We screened studies a second time to exclude studies that did not involve a comparison of two separate arms (with ROSE vs. without ROSE) at a single institution. A citation search (forward search) and reference search (backward search) were conducted using SCOPUS on February 6, 2012. Duplicates were removed, and the titles and abstracts of these additional studies were screened for potentially relevant studies. Full-text articles were then obtained for all potentially relevant studies. Studies from this set were included if they contained data comparing the effect of ROSE on any outcome associated with FNA of the thyroid.

The progress of the search: An initial search of MEDLINE and EMBASE produced 2179 unique studies. A preliminary screen of titles and abstracts produced 71 potentially relevant studies. This set of studies was used as the basis for a reference (backward) and citation (forward) search in SCOPUS, which produced 2 additional studies. We evaluated the full study for 37 potentially relevant studies and included 8 studies.
Data extraction
Each of the eight included studies was independently assessed by the two authors (R.L.S. and B.L.W.) using a standardized data extraction form.
Statistical analysis
Statistical calculations were performed using Stata Release 12. Meta-analysis of adequacy was completed using a random-effect model as implemented in the metan routine in Stata. Tests for heterogeneity were conducted using the inconsistency statistics (11). Relative risk is defined as the per case probability of an inadequate sample with ROSE divided by the per case probability without ROSE. Statistical significance for relative risk is evaluated relative to a value of 1.
Comparison of study arms for other factors potentially contributing to adequacy
The included studies were evaluated for information with the potential to independently impact adequacy, including needle size, number of passes per case, criteria for adequacy, and preparation methods in both the ROSE and non-ROSE groups.
Results
Literature search
An initial search of MEDLINE and EMBASE yielded 2179 unique studies (Fig. 1). Screening of abstracts and titles provided a set of 73 potentially relevant studies. Citations and references of the set of potentially relevant studies provided an additional 1031 studies that were screened for relevancy and provided two additional eligible studies. A secondary screen of the 73 potentially relevant studies yielded 37 eligible studies. Full-text review of the 37 eligible studies resulted in seven studies and one abstract that met our inclusion criteria.
Characteristics of included studies
We identified seven full studies and one abstract within a total of 6253 cases (Table 1) (1,5 –7,9,10,13,14). All included studies compared two cohorts, with and without ROSE, at a single site. All studies were conducted in the United States, and all but one was retrospective [the Moberly et al. (7) study did not report their study design]. Most studies used US guidance. The study by Ghofrani et al. (9) reported data on two separate cohorts: lesions sampled by palpation guidance and lesions sampled by US guidance. We analyzed these as two separate datasets. The Redman et al. study (1) contained both US-guided and nonguided cases, but the data were not separable. The percentage of US-guided biopsies was not reported in three studies (5,6,13). In most studies, ROSE was assessed by a cytopathologist; however, in three studies, ROSE was performed by both cytotechnologists and cytopathologists (1,6,9).
ROSE, rapid onsite evaluation; US, ultrasonography.
Effect of ROSE on adequacy
Cohorts using ROSE generally had a higher adequacy rate than cohorts without ROSE (Figs. 2 and 3 and Table 2). Without ROSE, the average adequacy rate was 83% (Table 2) and ranged from 66% to 94%. With ROSE, the average adequacy rate was 92% and ranged from 76% to 96%. On average, the relative risk of an inadequate sample was reduced by 44% in this group of studies (Table 2). The average relative risk of an inadequate sample with ROSE was 0.44 [95% confidence interval: 0.26, 0.73] in a pooled analysis that included all studies. This result was statistically significant (z=3.2, p=0.002); however, there was significant statistical heterogeneity in this group of studies (I 2=85.0%, p=0.002).

Comparison of the adequacy rate before and after the implementation of rapid onsite evaluation (ROSE). We divided the studies into two groups based on the adequacy rate without ROSE: low adequacy (<75%) and high adequacy (>75%). The dotted line indicates equivalence or no change due to ROSE (adequacy rate before ROSE=adequacy rate after ROSE). Points above the line indicate an improvement in adequacy following the implementation of ROSE. With the exception of the O'Malley study (10), all study results are above the line.

The estimated relative risk for each study is designated by a square. The lines extending from the squares indicate the 95% confidence interval for the relative risk. The overall average is indicated by a diamond.
CI, 95% confidence interval.
Impact of ROSE on adequacy rate
Two studies showed much greater improvement with ROSE than the others (6,13). These studies were unusual, because the adequacy rate without ROSE was much lower than other studies, and the change with ROSE was much greater (Table 2 and Fig. 2). Visual inspection of Figure 3 suggested that the improvement in adequacy due to ROSE may be related to the adequacy rate without ROSE. Metaregression analysis showed that the change in the adequacy rate was strongly correlated (t=−12.7, p<0.001) with the non-ROSE adequacy rate (Fig. 4). In addition, the non-ROSE adequacy rate explained all, but 10% of the residual between study variability in the change in adequacy rates due to ROSE.

The results of a meta-regression analysis showing the relationship between the non-ROSE adequacy rate and the change in adequacy rate (adequacy rate with ROSE−adequacy rate without ROSE). Each circle represents a study. The size of the circle respresents the size of the study. The dashed line respresents the best fit line as determined by meta-regression.
Comparison of study arms for potential confounding factors
Minimal data were provided on the needle size used, and no data were available regarding whether the needle gauge differed between the ROSE and non-ROSE groups in any of the studies. The most common criteria for adequacy stated in the studies are the widely implemented six groups of 10 or more follicular cells; however, half of the studies did not include information relating to adequacy criteria. Only two studies [Redman et al. (1) and O'Malley et al. (10)] compared the mean number of passes between the ROSE and non-ROSE groups; however, the combined findings suggest no clear trend in how ROSE impacts the number of passes. Two of the studies had significantly different bedside preparations between the ROSE and non-ROSE groups (indicating that no smears were made in the non-ROSE groups), while the remaining studies had either no data on the subject or similar preparation methods between groups (Table 3).
Discussion
Many factors can affect adequacy rates in FNAs at any anatomic site such as the experience of the aspirator, procedural variations, and lesion characteristics (e.g., size, cystic vs. solid). With regard to operator experience, Ghofrani et al. (9) showed that inexperienced radiologists had an 8.2% nondiagnostic rate as compared to a 5.4% nondiagnostic rate for their experienced counterparts. In terms of the possible impact of specimen preparation, a recent study by Abraham et al. (15) demonstrated that overall bedside- (onsite) prepared slides result in fewer nondiagnostic samples, 16.7%, as compared to solution-based samples (when the aspirate is directly placed in a liquid preservative), 25.3%. Furthermore, when solution-based samples are used, 21-gauge needles provided improved adequacy over 25-gauge needles. Another facet of procedural variation that may impact the adequacy of thyroid FNAs is the degree of sampling, which is usually delineated in a number of passes made. Our dataset had limited data on variation in the pass number between the ROSE and non-ROSE groups, with the Zhu and Michael study (6) showing that overall adequacy improved with passes numbering up to six. Multiple studies have determined that cystic nodules have an inherently higher rate of inadequacy (16,17). This is reflected in the O'Malley et al. study (10), where a 16% nondiagnostic rate was present for solid nodules as compared to a 30% nondiagnostic rate for predominantly cystic nodules. Because many of these factors naturally fluctuate between different institutions, one would expect significant variation in FNA adequacy rates between facilities even for the same anatomic site. Therefore, the effect size (change in adequacy with ROSE) may be small relative to the inherent variation in adequacy between institutions. Thus, a meta-analysis comparing studies based on multiple institutions (some implementing ROSE and others not) would be unlikely to detect an accurate ROSE effect. In our analysis, we only selected studies that were based on a side-by-side comparison (ROSE vs. no ROSE) at a single institution, so that variation due to other factors would be minimized. This is a strong design, and where possible, future studies should employ this type of design. On the other hand, our study is limited by the fact that there were relatively few studies of this type, and consequently, our sample size was small. Another limitation to our series is that the reference standard for diagnosis (histology based on excision) was either poorly defined or only applied to an unspecified subset of cases. Because of this, evaluating the impact of ROSE on diagnostic accuracy could not be achieved.
Our results show that ROSE is generally associated with an improvement in adequacy. With the exception of the O'Malley et al. study (10), there was consistent improvement across studies. On average, ROSE appears to reduce the inadequacy rate by 44%. Furthermore, the impact of ROSE depends on the initial adequacy rate. Sites with high adequacy rates have less opportunity for improvement than sites with low adequacy rates. Although ROSE was associated with statistically significant increases in adequacy overall, the improvement is dependent upon the initial adequacy rate. Improvement is greatest in studies where the initial adequacy rate is low.
Currently, the literature indicates that standardizing the methodology in thyroid FNAs leads to improved specimen quality and fewer nondiagnostic/unsatisfactory cases. One aspect of procedural standardization relates to the experience level and training of the operators. In a review of 1043 consecutive breast FNAs, Ljung et al. (18) demonstrated that those aspirates performed by 69 physicians without formal training in an FNA technique with a median experience of two FNAs per year was 36.9%, versus 2.2% for seven formally trained physicians with at least 100 FNAs per year of experience. Similarly, though more directly related to thyroid sampling, Sidiropoulos et al. (19) showed that when thyroid FNAs were performed by a single operator with smears and needle rinses being prepared by a single trained nurse, the unsatisfactory rate was reduced from 11% to 4% as compared to a multiple operator group at the same institution. In fact, one of our included studies [Ghofrani et al. (9)] alluded to the effect of operator experience and relative impact of ROSE. Their results showed that ROSE significantly reduced the nondiagnostic sampling rate (from 13.0% to 4.5%) among FNAs from less-experienced radiologists compared with a more modest effect when the more-experienced radiologists were obtaining the samples. However, this study did not specifically define what level of experience was needed to move into the more experienced group.
Our assessment of the studies for other factors possibly contributing to adequacy had limited data on whether needle size, adequacy criteria, or number of passes differed between the ROSE and non-ROSE groups (Table 3). In terms of the bedside preparation between the groups, while our data are also limited on this matter between the groups, two of the studies [Jing et al. (13) and Zhu and Michael (6)] stated that the non-ROSE group had no smears prepared, and that the needle rinses were placed directly into a fluid fixative for ThinPrep® and cell block preparation. Because this eliminates the ability for the cytopathologist to evaluate a smear on a given case, it may have negatively impacted the adequacy rates in the non-ROSE groups. Finally, in specific regard to the O'Malley et al. (10) study, where no improvement was observed with ROSE, our assessment of needle size, number of passes, criteria for adequacy, and preparation method between the ROSE and non-ROSE groups in this study yielded no clear explanation for why this study's results were not in line with other findings.
In addition to properly training and streamlining the staff involved in specimen collection, recent recommendations for optimal specimen preparation include the use of air-dried and alcohol-fixed smears prepared for Romanowksy and Papanicolaou staining, respectively. Additionally, depending on the case type/complexity, supplemental combinations of liquid-based (Surepath or ThinPrep) or cytospin preparations, cellblocks (formalin or saline), RPMI, or balanced saline rinses for flow cytometric evaluation where appropriate, and possibly sterile material for microbiology. For cyst fluid-only specimens, only one or two smears are recommended with the remainder being processed as either cytospins or liquid-based preparations (cellblock is an option for clotted specimens or those with tissue fragments) (20). Obviously, even recommendations aimed at standardizing specimen preparation methods are fraught with variations, and reflect that individual cases may require different approaches. As conveyed by Pitman et al. (20), immediate evaluation of the material allows the opportunity to obtain additional material for cellblocks and/or ancillary studies. This is a rare occasion in thyroid FNAs, essentially confined to suspected cases of medullary carcinoma requiring immunostaining for calcitonin and discriminating florid Hashimoto's thyroiditis from lymphoma with flow cytometry. To this point, Berner et al. (8), in their comparison of thyroid FNAs between different institutions, found that the fraction of unsatisfactory cases was reduced from 24% to 9.5% for those specimens taken in the presence of a cytopathologist. Furthermore, they found that the percentage of malignant cases went from 2.2% to 4.1% when a cytopathologist was involved, indicating an increased sensitivity for the test. Such findings might suggest that one straightforward method for optimizing specimen preparation is to involve a cytopathologist through ROSE.
Another aspect of FNA standardization for FNAs of the thyroid is invoking a uniform terminology for diagnostic interpretations. This has largely occurred via the six-tiered diagnostic system that was suggested at the National Cancer Institute conference in Bethesda, MD (The Bethesda System for Reporting Thyroid Cytopathology) (21). The six categories are Benign, Atypical, Follicular Neoplasm, Suspicious for Malignancy, Malignant, and Nondiagnostic. Each category includes an estimate for the associated risk of malignancy (21). Since this is a relatively new implementation, studies are still ongoing to evaluate its impact on management. Of the articles evaluated in our study, three of the four studies that mentioned specific criteria for adequacy utilized criteria akin to the Bethesda System for both the ROSE and non-ROSE groups, and thus it seems unlikely that its uniform application would significantly change the adequacy data between the study groups. Furthermore, one recent study comparing groups before and after The Bethesda System noted no significant disparity in the nondiagnostic rates between the groups (22).
In summary, our study minimizes, but does not eliminate, confounding factors. Although the number of studies is small, the results are statistically significant, and our data show that ROSE significantly improves specimen adequacy, especially at institutions where the initial adequacy rate is low. Its impact on adequacy is positive, although more modest, at institutions with a high (>80%) initial adequacy rate. These results indicate that ROSE should be widely implemented for FNAs of the thyroid to improve diagnostic utility and help reduce repeat procedures due to inadequate/nondiagnostic results. However, selected institutions with experienced operators, trained specimen preparation staff, and a low nondiagnostic rate may consider not using ROSE, as its benefit may not weigh favorably against the additive costs.
Footnotes
Disclosure Statement
No competing financial interests exist.
