Abstract

Keywords
“…results suggest that we may soon have a robust and reliable approach to breast cancer molecular subtype classification, in a form that can be readily implemented in a clinical laboratory.”
For decades, clinicians have been well aware that breast cancer (BC) is a clinically heterogeneous disease. Tumor size, lymph-node involvement, histological type, grade, and both estrogen receptor (ER) and HER2 receptor status, all influence prognosis and response to systemic therapies but they do not fully capture the varied clinical course of BC [1].
These aforementioned clinical variables have been combined into multivariate prediction models, such as the Nottingham Prognostic Index [2] and Adjuvant! Online [3], for prognosis; or the nomogram, published by Rouzier et al., for the prediction of response to preoperative chemotherapy [4]. Regardless of the clinicopathologic model used, there remains substantial variability in disease outcome within each prediction category. This is probably due to the poor reproducibility of key clinical parameters as defined by immunohistochemistry (IHC) assays (e.g., ER or histological grade) [5], and the transcriptional heterogeneity found among breast tumors. The hope in the community has been that genomic approaches will allow us to overcome these limitations so that ‘quantitative molecular analysis of breast cancer could yield diagnostic tests that might be more accurate than existing clinical prediction models, or complement them’ [6].
“…the use of IHC technique [for subtyping] is questionable owing to its poor reproducibility … its semiquantitative nature and its weak concordance with the molecular subtypes defined by gene expressions.”
High-throughput technologies, such as gene expression profiling, provide us with a unique opportunity to explore the molecular basis for BC by simultaneously analyzing thousands of genes. Microarray-based gene expression studies have revealed that, in addition to being clinically heterogeneous, BC is also a molecularly heterogeneous disease. These studies highlight the presence of distinct molecular subtypes that exhibit different gene expression patterns and clinical outcomes [7–14].
The relevance of these molecular subtypes in terms of basic and translational research has led to the progressive incorporation of such molecular profiles into prognostic assessments [14,15], the prediction of therapeutic efficacy [16] and the design of clinical trials [17–19].
During the past decade, several classification models have been published that enable BC molecular subtypes to be identified using gene expression data. In their seminal work, Perou et al. highlighted four breast tumor subtypes: the ‘basal-like’ (characterized by cytokeratins 5 and 17); the HER2-enriched (mostly, but not all, HER2 amplified); the ‘luminal’ (expressing luminal cytokeratins 8 and 18 and often differentiated into two or three subgroups); and the ‘normal-like’ tumors [7].
These molecular subtypes were first identified through hierarchical clustering of a small data-set of breast tumor gene expression profiles, using a large set of highly variably expressed genes referred to as ‘intrinsic’ genes [7]. The authors then designed a classification model, called the Single Sample Predictor (SSP), that enables the subtype of a single tumor to be identified using a nearest centroid classifier based on the initial hierarchical clustering [9]. This first SSP has been further refined by using different versions of the intrinsic gene list [11,14].
Despite their value, SSPs have severe limitations. Pusztai et al. demonstrated that small changes in the initial set of breast tumors may have a dramatic impact on the hierarchical clustering used in the SSPs, raising some doubt about the stability of the method [20]. Kapp et al. challenged the use of hundreds of intrinsic genes, and their results suggested that only genes related to ER and HER2 phenotypes led to a stable identification of three main subtypes: ER−/HER2 (basal-like tumors), HER2+ and ER+/HER2− (luminal tumors) [21]. Weigelt et al. reported that the subtype classifications depended on the list of intrinsic genes since SSPs were only moderately concordant [22].
In an attempt to address these issues, Sotiriou et al. developed a novel classification model called the Subtype Classification Model (SCM), which is based on a parametric clustering technique (a mixture of Gaussians) in a low-dimensional space defined by three gene modules (a list of genes specifically correlated to ER, HER2 and AURKA), to robustly quantify the main discriminators of BC – the ER, HER2 and proliferation phenotypes, respectively. Two versions of these gene modules have been published thus far [12,13].
“Although the consistency and robustness of the SCMs make these models promising candidates for translation into clinic, they still use a large number of genes, making their application in a clinical routine both costly and technically challenging.”
The complex nature of molecular classification using transcriptional profiling has led to numerous efforts to develop IHC markers that can reproduce this molecular subtyping. Combinations of various IHC markers, including cytokeratins, ER and HER2 status and proliferation-related proteins have been proposed to define the subtypes of BC [23–25]. However in this context, the use of IHC is questionable owing to its poor reproducibility when compared with gene expression profiling [26], its semiquantitative nature and its weak concordance with the molecular subtypes defined by gene expressions.
Although the molecular taxonomy of BCs, as defined by these approaches, has had a significant impact on the way clinicians perceive the disease, we still know surprisingly little about the concordance between these classification models, their prognostic or predictive value and the robustness of the classification algorithms. In addition, the availability of multiple models could lead to confusing results since investigators might not make the same model selections and consequently assign a different subtype to the same tumor sample. Subtype classification is increasingly being incorporated into clinical trials [17–19], and efforts are being made to adapt molecular subtyping to routine clinical use [14], therefore, it is critically important to adopt standardized methodologies in BC classification.
In a recent meta-analysis of BC studies that included gene expression data obtained from 4607 patients, Haibe-Kains et al. highlighted the advantages and disadvantages of the existing classification models for molecular subtype identification [27]. Concurring with the results of Weigelt et al. [22], the authors show that the published SSPs were only moderately consistent with the subtype classification, which is strongly depending upon the intrinsic genes in the classification models.
On the contrary, SCMs were highly consistent and yielded the best concordance with the traditional clinical parameters (such as ER and HER2 status and histological grade). Interestingly, none of the classification models were concordant with the progesterone receptor status, thereby challenging its relevance for molecular subtyping.
Haibe-Kains and colleagues also assessed the robustness of the various classification models; that is, the ability to assign the same tumors to the same subtypes whatever the gene expression data used to build these models [27]. In other words, if the molecular subtypes are real, a classification model should not depend on the data used to fit it; otherwise the model is considered to be unreliable. The authors showed that SCMs were statistically more robust than SSPs for identifying the three main BC subtypes (basal-like, HER2-enriched and luminal), as well as providing better discrimination between the low- and high-proliferative luminal tumors (referred to as luminal A and B, respectively). The authors also confirmed the clinical relevance of the subtype classifications for prognostic purposes in a large series of 1315 untreated node-negative patients with BC.
Although the consistency and robustness of the SCMs make these models promising candidates for translation into clinic, they still use a large number of genes, making their application in a clinical routine both costly and technically challenging. Haibe-Kains et al. developed a three-gene SCM that used only ER, HER2 and AURKA genes, and yet this proved to be as robust as the original SCMs [27].
These results suggest that we may soon have a robust and reliable approach to BC molecular subtype classification, in a form that can be readily implemented in a clinical laboratory. Such a test, if widely used in a standardized fashion, could dramatically change the way in which patients are managed in a clinical setting and, hopefully, could lead to substantial improvements in outcome and survival.
Footnotes
Acknowledgements
The author would like to thank Professors Christos Sotiriou, Gianluca Bontempi and John Quackenbush for making this research possible, as well as Mary Kalamaras for her editorial assistance.
The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
