Abstract
Background:
Current methods based on fine-needle aspiration biopsy (FNAB) are not sufficient to distinguish among follicular thyroid lesions, follicular adenoma (FA), follicular thyroid carcinoma (FTC), and the follicular variant of papillary thyroid cancer (FVPTC). Furthermore, none of the immunohistochemical markers currently available are sensitive or specific enough to be used in the clinical setting, necessitating a diagnostic hemithyroidectomy. The aim of this study was to identify proteins of value for differential diagnosis between benign and malignant thyroid follicular lesions.
Methods:
This retrospective analysis is based on an assessment of the immunoexpression of 19 proteins on 81 benign thyroid lesions (FA) and 50 malignant tumors (FTC/FVPTC). The resulting expression profile allowed the design of a scoring system model to improve the differential diagnosis of benign and malignant thyroid lesions. The model was validated using an independent series of 69 FA and 40 FTC and an external series of 40 nodular hyperplasias, and was further tested in a series of 38 FNAB cell blocks.
Results:
A model based on the nuclear and cytoplasmic expression of APLP2, RRM2, and PRC1 discriminated between benign and malignant lesions with 100% sensitivity in both main and validation groups, with specificities of 71.3% and 50.7%, respectively. For the nodular hyperplasia series, specificity reached 94.8%. Finally, in FNAB samples, the sensitivity was 100% and the specificity was 45% for discrimination between benign and malignant lesions.
Conclusions:
These findings suggest that the identified APLP2, RRM2, and PRC1 signature could be useful for distinguishing between benign (FA) and malignant (FTC and FVPTC) tumors of the thyroid follicular epithelium.
Introduction
F
Preoperatively, it is difficult to distinguish between FA, FTC, and FVPTC (2). Follicular tumors are often misdiagnosed due to the low sensitivity of fine-needle aspiration biopsy (FNAB) cytology, the currently used methodology to diagnose follicular tumors (3 –5). For these tumor types, morphologic evaluation is often subjective, despite the standardized nomenclature developed by the Bethesda System for Reporting Thyroid Cytopathology (6). Moreover, biopsy assesses only a limited fragment of the sample (7), and capsular and vascular invasion, the hallmark used to discriminate benign from malignant follicular tumors, cannot be assessed with FNAB.
This limitation frequently leads to potentially unnecessary surgical procedures to complete the histological examination of the capsule and vasculature (8). In addition, the histological diagnosis of follicular lesions from a resected tumor can be challenging due to the presence of incomplete or indeterminate capsular invasion, vascular invasion, or technical difficulties encountered during processing (9,10). Furthermore, improved diagnosis would maximize the benefits of invasive therapies in high-risk patients, while minimizing potential complications and side effects in low-risk patients. Therefore, an accurate preoperative diagnosis of FTC is essential for the choice of a preoperative plan and to avoid unnecessary surgical procedures.
Although molecular markers such as the gene expression classifier Afirma® represent a recent and promising tool for differentiating thyroid neoplasms, there is ample room for improvement in both research and clinical applications. Moreover, of the markers reported to date, none have proven optimal for differentiating benign (FA) from malignant follicular lesions (FTC and FVPTC). However, techniques such as microarray analysis of gene expression patterns have provided a way to identify molecular events connected to different tumor types and to specific cancer processes such as metastasis, proliferation, and angiogenesis (11). Furthermore, many studies have demonstrated the potential of expression arrays to identify molecular signatures associated with distinct clinical features, which could help improve differential diagnosis and identify novel targets for therapy (12,13).
A previous study analyzed the gene expression profiles of thyroid carcinomas using cDNA microarrays, which provided a prognostic molecular signature of 30 transcripts (14). The genes of the signature were mainly clustered in the following pathways: MAP kinase signaling, TGF-β signaling, focal adhesion and cell motility, activation of actin polymerization, and cell cycle. As these molecular pathways are altered in the process of carcinogenesis, it was hypothesized that there could also be a differential expression between benign and malignant neoplasms of the thyroid. Therefore, to identify a set of proteins that might improve differential diagnosis between benign and malignant thyroid lesions, the protein expression of this signature was evaluated by immunohistochemistry in a group of follicular thyroid lesions.
Materials and Methods
Case selection and tissue microarray design
Tissue microarrays were constructed from formalin-fixed paraffin-embedded biopsy samples. A series of 131 surgically removed thyroid tumors were used, which were obtained from four university hospitals that belong to the Consortium for the Study of Thyroid Cancer (CECaT; Arnau de Vilanova, Sabadell, Clinic, and Germans Trias i Pujol). Of these tumors, 81 were FA, and 50 were malignant tumors (15 FTC and 35 FVPTC). For validation, an additional set of 149 tumor biopsies (40 FTC, 69 FA, and 40 nodular hyperplasias [NH]), and 38 cell blocks from FNAB material (9 FTC, 4 FVPTC, 12 FA, and 13 NH) were collected from two participating CECaT centers (Vall d'Hebron and Arnau de Vilanova). A large amount of clinical data was available for all patients included in the study.
Tumors were classified as FTC, FVPTC, FA, and NH according to the criteria proposed by the World Health Organization for the classification of tumors of the endocrine system by three pathologists experienced with thyroid pathology (X.M., P.G., and C.I.). Representative tumor regions were marked in the corresponding paraffin blocks. In the FTC group, there was undisputable evidence of vascular and capsular invasion. Strict criteria were used for the classification of FVPTC; the tumors exhibited a follicular pattern of growth with characteristic cytological features. For classification as FA, the presence of a capsule and the absence of hyperplasic changes in the adjacent thyroid tissue were required. A tissue array device (Beecher Instruments, Silver Spring, MD) was used to construct the tissue microarray (TMA). Three cylinders (diameter ≤1 mm) from two different tumor areas were included for each case, and when possible, two cylinders from non-tumor areas were included as well. FNAB were classified according to the Bethesda System (Supplementary Table S1; Supplementary Data are available online at
Immunohistochemistry analysis
Twenty-two of the 30 transcripts were identified as prognostic classifier-encoding proteins (14). Most of these genes were clustered in the MAPK and TGF-β signaling pathways, or they were involved in focal adhesion or regulating the actin cytoskeleton and cell cycle. In this study, the expression of 19 proteins was assessed; it was not possible to optimize the protocol for the remaining three markers. Cell blocks and biopsies were sectioned at a thickness of 3 μm, and they were dried for 1 h at 60°. Dewaxing, rehydration, and antigen retrieval was achieved via heat treatment in a PT Link (Dako, Glostrup, Denmark) at pH 9 or 6. Before staining the sections, endogenous peroxidase activity was blocked. In the model design phase, immunohistochemistry (IHC) evaluation included the 19 antibodies listed in Table 1. In the validation phase, IHC evaluation included only the antibodies that had enough power to differentiate between benign and malignant lesions: APLP2, PRC1, and RRM2. After incubation, the reaction was visualized with EnVisionTM FLEX (Dako) using diaminobenzidine chromogen as the substrate. The sections were counterstained with hematoxylin. Appropriate external and internal positive and negative controls were used. Antigen preservation was determined by Vimentin and Ki-67 immunostaining. IHC evaluation was performed by a pathologist and a researcher using uniform, pre-established criteria. Immunoreactivity was graded semi-quantitatively by considering both the percentage and the intensity of the staining. Histological scores (H scores) for the nucleus and cytoplasm were generated for each sample and ranged from 0 to 300. The score was obtained by applying the following formula: H score = 1 × (% light staining) + 2 × (% moderate staining) + 3 × (% strong staining). Because each TMA included six different tumor cylinders for each case, the evaluation was performed only after examining all samples. Moreover, 40 hyperplastic nodules (NH) and 38 FNAB were used to test the goodness of fit of the generated models.
Statistical analysis
All H scores corresponding to an individual biopsy (three to six per case) were averaged to obtain a unique H score for the patient. The median and interquartile ranges (IQR; showing the first and third quartiles) were used to describe non-normally distributed quantitative variables. The supervised random forest analysis helped to identify a reduced subset of the most important proteins using the mean decrease in the Gini index as a criterion to identify proteins whose expression allowed sample classification as benign or malignant. Next, recursive partitioning classification trees and logistic regression analyses were performed to identify cutoff points for the proteins whose expression maximized the discrimination capability of the model while maintaining a statistically significant contribution to the logistic regression model according to the likelihood ratio test. Finally, the logistic regression model coefficients were translated in an integer scoring system for the expression of PRC1, RRM2, and APLP2 (i.e., the proteins with discriminative potential). A significance level of 0.05 was used. The data were analyzed using R (
Results
Clinical characteristics of patients
In the main tissue series, 35 (27.3%) patients were men; the median age was 45 years (IQR 34–59 years), and the median tumor size was 3.5 cm (IQR 2.5–4.5 cm). In the validation group, 57 (52.3%) patients were men; the median age was 51 years (IQR 34–64 years), and the median tumor size was 3.9 cm (IQR 3.0–4.7 cm). In the NH series, seven (17.5%) patients were men; the median age was 56 years (IQR 44–67 years), and the median tumor size was 3.3 cm (IQR 1.9–4 cm). Finally, in the FNAB series, 10 (26.3%) patients were men; the median age was 45 years (IQR 38–54 years), and the median tumor size was 4.0 cm (IQR 3.0–5.0 cm).
Differential diagnostic model
After statistical analysis, three proteins—APLP2, PRC1, and RRM2—were selected to design a differential diagnosis model based upon their capacity to distinguish between benign and malignant lesions. The other 16 proteins were excluded from the study because they did not discriminate between cases according to benignity or malignancy (Supplementary Table S2 and Supplementary Fig. S1). Nuclear expression of APLP2 was more abundant in benign than in malignant lesions, whereas nuclear expression of RRM2 and cytoplasmic expression of PRC1 were stronger in malignant tumors (Fig. 1). With this set of proteins, a score system model was generated: 4 × (APLP2 n = 0) + 2 × (RRM2 n > 5) + 1 × (PRC1 c ≥ 40). This model provides a different score for each protein and evaluates each case, depending on the protein expression profile. Briefly, APLP2 nuclear staining with an H score equal to zero receives four points; if the H score was non-zero, no points were added. RRM2 nuclear staining with an H score greater than five received two points, while an H score lower than five received no points. PRC1 cytoplasmic staining with an H score ≥40 earned one point; otherwise, no points were added. One benign and one malignant case were discarded during the analysis due to technical reasons. A score was recorded to distinguish definitively benign tumors (scores 0–1) from highly probable malignant tumors (scores 5–6). The sensitivity obtained by a score greater than one was 100%, with a specificity of 71.3% (Table 2). In other words, none of the malignant cases produced a score of benignity (i.e., <2), and only 1/81 (1.2%) FA was classified as malignant (score 6–7). Receiver operating characteristic (ROC) analysis revealed an area under the curve of 0.925 ([confidence interval (CI) 0.885–0.966]; Fig. 2A).

Immunohistochemistry expression of APLP2 in follicular carcinoma (

Receiver operating characteristic curve of (
A score of 0–1 identified benign tumors, 2–5 likely benign or malignant tumors, and 6–7 malignant tumors.
FTC, follicular thyroid carcinoma; FVPTC, follicular variant of papillary thyroid cancer; FA, follicular adenoma.
Validation of the differential diagnostic model
A series of 109 follicular tumors was collected and used to test the goodness of fit of the generated model. One of the FTC tumors was discarded from the analysis due to technical reasons. Use of the same cutoff point in this series resulted in a sensitivity of 100% and a specificity of 50.7% (Table 3). Again, none of the malignant cases produced a score in the benign range (<2), and in this case, none of the 69 FA cases was classified as malignant (score 6–7). ROC analysis indicated an area under the curve of 0.983 ([CI 0.967–0.999]; Fig. 2B). The NH series used for external validation reached a specificity of 94.8%.
A score of 0–1 identified benign tumors, 2–5 likely benign or malignant tumors, and 6–7 malignant tumors.
In the FNAB series, the H score did not show a direct proportional relationship with the TMA H score. The RRM2 and PCR1 expressions differed, depending on the type of sample, (i.e., cell block from a FNAB or TMA). Therefore, a different model based on cytoplasmic expression of RRM2 and PCR1 was constructed: 1 × (PCR1c > 20) + 2 × (RRM2c < 100). Briefly, the cytoplasmic expression of PRC1 and RRM2 received one point and two points, respectively (Supplementary Fig. S2). Seven (18%) of the samples were discarded from the analysis due to insufficient number of cells valid for assessment. A score was recorded to distinguish between benign FNAB (scores 1) from malignant FNAB (score 3). The sensitivity obtained by a score of one was 100%, with a specificity of 45%. Interestingly, the two FTC classified with a score of two corresponded to a diagnosis of minimally invasive FTC and encapsulated (minimally invasive) FTC. Moreover, none of the malignant cases yielded a score of benignity (Supplementary Table S3).
Discussion
In the present study, significant differences in the patterns of IHC expression of the newly proposed markers, APLP2, PRC1, and RRM2, were identified in clinical samples comprised of follicular neoplasms. The design of an accurate and sensitive model of protein marker expression was validated in an independent series to ensure their diagnostic utility to discriminate between benign and malignant lesions, therefore yielding a powerful diagnostic tool.
Prior to a previous study (14), APLP2, RRM2, and PRC1 had not been associated with thyroid cancer, and to the best of the authors' knowledge, this is the first report of the potential diagnostic value of these proteins in cases of thyroid cancer. Several IHC markers have been proposed for use alone or in combination to improve the accuracy of the diagnosis of follicular epithelial carcinomas. Some of these proteins are involved in either cell adhesion (E-cadherin and syndecan-1), cell differentiation, or malignant transformation (galectin-3), and can distinguish between epithelial malignancies (cytokeratin-19) and FA, as is the case for Hector Battifora mesothelial 1 (HBME-1) (15 –17). Moreover, many more IHC markers have been studied, such as PPARgamma (18), TLR-4 (19), FASN, phopho-c-Met (20), thyroid peroxidase (21), cytokeratin-19 (22,23), HESC5:3 (24), and HMGA2 (25), among others. In general, a single marker is not optimal in terms of sensitivity and specificity. Therefore, panels of two or more antibodies are typically more effective and improve the accuracy of the diagnosis (26 –28). Unfortunately, none of the markers studied to date have been confirmed to be clinically applicable for diagnosis and/or prognostic value (9,26). In this study, better sensitivity was achieved compared with other studies (26,29). Moreover, the results were validated using an independent series.
Tests such as the Gene Afirma Expression Classifier, which identifies benign nodules with indeterminate FNAB (3), have been developed based on molecular markers. Although its sensitivity is 92% and the specificity is 52%, there is not a clear consensus regarding the capacity of the Expression Classifier in reducing diagnostic surgeries (30,31). Also, a recent study revealed a high rate of false-positive results in Hürthle cell–predominant nodules (32). More recently, other molecular tests have been developed to detect carcinomas from indeterminate FNAB, such as the ThyroSeq v2 (multi-gene next-generation sequencing panel) (33), and ThyraMIR/ThyGenX (a microRNA-based expression classifier and seven-gene mutation panel) (34). These tests showed sensitivities of 90% and 89%, and specificities of 92% and 85%, respectively. However, it is too early to assess their performance, and clinical follow-up is still recommended in those patients with negative results in molecular tests (35). The possibility of improving the performance of different molecular test, such as the one described here, includes the future combination of these tests.
APLP2, PRC1, and RRM2 were identified based on an initial mRNA signature that discriminated among thyroid carcinomas of the follicular epithelium (14). Using these three novel markers, a scoring system model was generated: 4 × (APLP2 n = 0) + 2 × (RRM2 n > 5) + 1 × (PRC1 c ≥ 40), which had a sensitivity of 100% in both the design and validation phases and a specificity of 71.3% and 50.7%, respectively. Notably, none of the patients in the design group or the validation group with malignant follicular lesion produced a score in the benign range (0/88). Furthermore, none of the patients in the validation group and only one patient in the design group with FA was classified with a score indicating malignancy (1/149). In addition, this model was applied to a series of benign tumors (NH) for external validation and reached a specificity of 94.8%. These results suggest that this model could provide an accurate differential diagnosis of resected tumors. The H score obtained in FNAB samples differed from that of the TMAs. Therefore, a specific model for FNAB was developed. Furthermore, this model yielded excellent results. Interestingly, none of the patients in the FNAB series with malignant follicular lesions showed a score in the benign range (0/31). Moreover, the only two (18%) cases with malignant lesions and a score of two were minimally invasive FTC.
The three novel markers reported herein belong to distinct signaling pathways: PRC1 (protein regulator of cytokinesis 1) is involved in cytokinesis (36,37); APLP2 (amyloid beta [A4] precursor-like protein 2) is a protein related to class I molecules of the major histocompatibility complex, which has been implicated in the pathogenesis of Alzheimer's disease (38); and RRM2 (Ribonucleotide reductase M2 polypeptide) is the small subunit of ribonucleotide reductase, which is essential for DNA synthesis and cell proliferation (39). Interestingly, these three proteins have been reported to be altered in different cancers. For instance, PRC1 overexpression has been reported in breast, colon, and pancreas cancers, lung large cell carcinoma, and cholangiocarcinomas, but not in normal cells (40) or in thyroid cancer. APLP2 overexpression has been demonstrated in neuroendocrine, stomach, and lung tumors, and has been shown to promote cell proliferation in pancreatic cancer (41,42). RRM2 overexpression has been described in esophageal and gastric cancers, as well as in invasive and metastatic colon cancers (38). More recently, it has been reported to be overexpressed in papillary and undifferentiated thyroid cancers in a study that linked invasion and metastasis to RRM1, another member of its family (43).
This study has confirmed that TMA IHC is an excellent tool to assist conventional pathologic examination and differential diagnosis and to validate data obtained from cDNA microarrays. The nuclear expression of APLP2 was not only significantly higher in FA but was the most important variable used to differentiate FA tumors from follicular growth pattern tumors (FTC and FVPTC), whereas RRM2 contributed to the discrimination of both FA and FTC (Supplementary Fig. S1). However, this study is not without limitations. First, a limited number of FTC cases were included in the design phase. Unfortunately, it was not possible to include any FVPTC case in the validation phase. Additionally, the scoring model was tested only in a limited number of cases in the adequate clinical context (i.e., cytology samples obtained from FNAB). It is therefore necessary to evaluate these markers in a larger series of FNAB cell blocks to assess the clinical applicability of this tool in differential preoperative diagnosis.
In summary, the present results suggest that the combination of APLP2, PRC1, and RRM2 might be useful for the differential diagnosis of benign and malignant follicular lesions. Thus, these findings support the potential of these proteins to improve not only the diagnosis of follicular thyroid lesions and FNAB, but also the management and quality of life of patients by reducing the number of exploratory surgeries. Future prospective clinical studies should be conducted on a larger cohort of follicular thyroid samples to validate the clinical utility of the model.
Footnotes
Acknowledgments
This research was supported by grants from the Spanish Ministry of Health, the Carlos III National Institute of Health (PI08/1022) and European Regional Development Fund (ERDF). CIBER for Diabetes and Associated Metabolic Diseases (CIBERDEM) is an initiative of the Carlos III Health Institute, Spain. EC was supported by a pre-doctoral fellowship from AGAUR (Generalitat de Catalunya) and by the Alícia Cuello de Merigó Foundation. This work was also supported by IRBLleida Biobank (B.0000682) and Plataforma Biobancos PT13/0010/0014. We thank Citology Unit of University Hospital Arnau de Vilanova for help with the FNAB study.
Author Disclosure Statement
No competing financial interests exist.
