An Integrative Approach for Identifying Network Biomarkers of Breast Cancer Subtypes Using Genomic,Interactomic,and Transcriptomic Data

Abstract

Breast cancer is a complex disease that can be classified into at least 10 different molecular subtypes. Appropriate diagnosis of specific subtypes is critical for ensuring the best possible patient treatment and response to therapy. Current computational methods for determining the subtypes are based on identifying differentially expressed genes (i.e., biomarkers) that can best discriminate the subtypes. Such approaches, however, are known to be unreliable since they yield different biomarker sets when applied to data sets from different studies. Gathering knowledge about the functional relationship among genes will identify “network biomarkers” that will enrich the criteria for biomarker selection. Cancer network biomarkers are subnetworks of functionally related genes that “work in concert” to perform functions associated with a tumorigenic. We propose a machine learning framework that can be used to identify network biomarkers and driver genes for each specific breast cancer subtype. Our results show that the resulting network biomarkers can separate one subtype from the others with very high accuracy.

1. Introduction

Breast cancer (BC) is one of the leading causes of cancer-related deaths among women in North America (DeSantis et al., 2014). Although BC is a heterogeneous disease, it can be categorized into different subtypes that can be distinguished based on gene expression (GE) profiles and copy number variations (CNVs) (Curtis et al., 2012). Appropriate diagnosis of the specific subtypes of BC is vital to provide the best treatment to the patient. Different methods such as MRI (Loo et al., 2011), mammography (Razzaghi et al., 2013), and CT scan (Mavi et al., 2006) have been used to phenotypically examine the changes in the tissue, but provide little informative information to direct therapy. Most bioinformatic methods have focused on identifying BC biomarkers as small subsets of differentially expressed genes. However, differentially expressed genes have limited predictive performance due to (1) the heterogeneity within tissues and across patients and (2) the dependence among genes, gene products, or pathways. To accurately identify effective BC biomarkers, new bioinformatic methods integrating additional biological information with GE data have become necessary. Within the last 5 years, new classes of biomarkers called cancer network biomarkers (NBs) have been defined and studied (Liu et al., 2012a,b; Zhang and Chen, 2013). A cancer network biomarker (NB) is a disease-related subnetwork of interacting genes identified by an appropriate integration of a secondary network (e.g., protein interaction network or cellular pathway network) data with the primary GE data, thus taking into account the dependencies among genes.

In this study, we propose a machine learning framework that can be used to identify differential NBs specific to each BC subtype. First, we select and combine relevant features using CNV, copy number aberration (CNA), and GE data, to obtain a set of candidate genes for each BC subtype consisting of (1) genes that are differentially expressed in the subtype and (2) genes that have significant copy numbers in the subtype. Then, each gene in the candidate set is used to seed the search for discriminative NBs in an input protein–protein interaction (PPI) network. We have devised different methods for identifying NBs that best separate each subtype.

2. Methods

We have used the METABRIC data set (Curtis et al., 2012), which contains the copy number values and GE levels of 2000 primary breast tumors with long-term clinical follow-up. It can be accessed from the European Genome-Phenome Archive using the accession number EGAS00000000083. In Curtis et al. (2012), the CNAs and CNVs generated using Affymetrix SNP 6.0 arrays and GE data were obtained using Illumina HT 12 technology. The data set contains two sets of data, validation set and discovery set. Due to the lack of class labels in the validation set, in this paper. We only use the discovery set, which contains 997 samples from 10 subtypes of BC. Each sample contains expression data for 48,803 probe IDs. The expression of all probes corresponding to the same gene has been merged based on the median expression of those probes, which maps all the probes to 24,351 unigenes. The number of samples corresponding to each subtype is listed in Table 1.

Table 1.

Number of Samples Corresponding to Each of 10 Subtypes

Subtypes	1	2	3	4	5	6	7	8	9	10
No. of samples	76	45	156	167	94	44	109	143	67	96

To obtain an NB corresponding to each subtype, we consider each subtype as positive class and the remaining subtypes as negative class. Thus, by performing a one-against-all classification scheme, separately for each subtype, we can obtain the specific NB that best discriminates that subtype from the other subtypes. Figure 1 illustrates the proposed framework for finding NBs corresponding to each subtype.

FIG. 1.

The proposed framework for finding NBs corresponding to each subtype. CAN, ; CNV, copy number variation; GE, gene expression; NBs, network biomarkers; PPI, protein–protein interaction.

2.1. Obtaining candidate genes

In the first step, we use CNA, CNV, and GE data to find the most informative genes, separately for each subtype, which are used later as seeds to find the best separating NBs of a given subtype. To do so, we first use CNA/CNV information to find those genes that have very high genotypic aberration in each subtype based on their GISTIC score (Beroukhim et al., 2007). GISTIC identifies significant aberrations using two steps. In the first step, it calculates the G-score statistic, which involves both the frequency of occurrence and the amplitude of the aberration. In the second step, it assesses the significance of each aberration using Fisher's exact test (Raymond and Rousset, 1995). To make sure that we only target aberrations in the copy number and not common variations across different populations, we use the HapMap database (Consortium et al., 2010). HapMap is a catalog of common genetic variants that occur in human. We only consider those genes for a significant test that have CNA but no CNV. We also use GE data to identify the top differentially expressed genes for each subtype. For this, we used Chi2 (Liu and Setiono, 1995) to rank genes based on their ability to separate each subtype from the remaining subtypes. At the end, after obtaining the top genes using CNA/CNV and GE data separately, if CNA/CNV analysis determined N genes as significant in terms of their genomic aberrations, we select the top N genes from GE data; then out of these two gene sets, we take the intersection as candidate genes, which will be used as seeds in our PPI network data.

2.2. Obtaining NB for each subtype

In this step, we use the candidate genes obtained from the previous step as seeds in the PPI network data. First, we combined the human PPI network data obtained from BioGrid (Stark et al., 2011), HPRD (Prasad et al., 2009), Intact (Kerrien et al., 2012), DIP (Xenarios et al., 2002) and MINT (Ceol et al., 2010) into a single unified large PPI network consisting of 230,000 PPIs and 15,823 proteins as the union of all aforementioned databases.

Second, we mapped all candidate genes onto our PPI network to be used as seeds for finding the NBs. Starting from a given seed node v, the search for the best separating NB proceeds as follows. We iteratively aggregate its neighboring nodes u in a greedy manner, using breath-first search algorithm. A neighbor u is inserted into the current aggregate N if and only if its inclusion (i.e., the new aggregate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N + u$$ \end{document} ) increases the correlation between the expression of the genes in the aggregate and the given subtype; that is, when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert correlation \; ( N + u , subtype ) - correlation \; ( N , subtype ) \vert > \Delta$$ \end{document} , where Δ is 0.001. Then, the same process is repeated on the new aggregate N + u. This process continues until all possible neighbors (with any distance) from the new aggregate are evaluated, resulting in a subnetwork, S_v, obtained from seed v. The same process is also applied to all the seeds obtained for a given subtype, and the union of all the subnetworks is considered as the final NB of that given subtype.

Since the order of candidate genes may alter the expansion of subnetworks, depending on which candidate gene reaches a certain gene first, we shuffle the candidate genes 100 times and obtain the network for each case individually. At the end, we merge all 100 networks. In this case, each individual interaction has a confidence score from 1 to 100, which represents the number of times each interaction appeared in all 100 networks. We categorize interactions in three groups, low, medium, and high confidence, which contain those interactions that present in less than 30%, between 30% and 70%, and >70% of the networks, respectively. Table 2 shows the distribution of the interactions in each subtype NB.

Table 2.

Number of Interactions in Network Biomarkers Corresponding to Each Subtype

Subtype	Total No. of interactions	Low confidence	Medium confidence	High confidence
1	2389	2230	126	33
2	3013	2890	91	32
3	2524	2260	177	87
4	1444	1170	184	90
5	1999	1866	104	29
6	2900	2608	211	81
7	2294	2102	118	74
8	2750	2585	106	59
9	3000	2787	161	52
10	936	814	94	28

Interactions have been categorized into three groups: low, medium, and high confidence, which contain interactions that are present in less than 30%, between 30% and 70%, and >70% of the networks, respectively.

2.3. Evaluating the predictive performance of each NB

The following measures are used for evaluating the predictive performance of each NB. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Accuracy = { \frac { TP + TN } { TP + FN + FP + TN } } , \tag { 1 } \end{align*} \end{document}

F-measure uses both precision and recall measures to compute the score as follows: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} F { \bf { - } } measure = 2 \times { \frac { Percision \times Recall } { Precision + Recall } } , \tag { 2 } \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Precision = { \frac { TP } { TP + FP } } , \tag { 3 } \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Recall = { \frac { TP } { TP + FN } } , \tag { 4 } \end{align*} \end{document}

Another measure, the area under the receiving operating characteristics (ROC) curve, AUC, shows the trade-off between Specificity and Sensitivity (Recall), where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Sensitivity { \kern 1pt } \, ( Recall ) = { \frac { TP } { TP + FN } } , \tag { 5 } \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Specifity = { \frac { TN } { TN + FP } } , \tag { 6 } \end{align*} \end{document}

Above TP, TN, FP, FN means true positive, true negative, false positive, and false negative, respectively.

3. Results

Table 3 shows the number of selected genes and interactions in the obtained NB corresponding to each of the 10 BC subtypes. Since the classes are highly imbalanced, using a more robust performance measure such as AUC provides less bias insight regarding the performance of the NBs for each subtype. As shown in the table, the AUC of the NBs for almost all of the subtypes is >0.95, which indicates the excellent predictive performance of each NB.

Table 3.

Comparison Between the Number of Genes, Interactions, and the Performance of Network Biomarkers for 10 Breast Cancer Subtypes

Subtype	No. of genes	No. of interactions	Phenotype correlation	Accuracy (%)	F-measure	Area under the curve
1	2385	2120	−0.947	94.1	0.928	0.970
2	2948	2432	−0.913	96.6	0.959	0.966
3	3309	3089	0.916	93.7	0.941	0.939
4	4557	3846	0.929	95.6	0.922	0.952
5	1541	1382	−0.96	97.1	0.964	0.993
6	5382	3987	−0.902	95.7	0.939	0.961
7	3111	2879	−0.947	94.8	0.934	0.952
8	4343	3622	−0.923	93.84	0.943	0.971
9	2266	2151	−0.949	95.6	0.935	0.982
10	2921	2662	0.951	96.1	0.963	0.975

We trained a random forest classifier containing 50 trees along with a 10-fold cross-validation scheme to evaluate the effectiveness of candidate and high-confidence genes involved in each subtype's NB in discriminating each subtype individually. Tables 4 and 5 show the performance of candidate genes and high-confidence genes in each subtype, respectively. As shown in the tables, although candidate genes themselves can provide an accurate gene signature for each subtype of BC, adding high-confidence genes to candidate gene sets increases the classification performance.

Table 4.

Using Candidate Genes Corresponding to Each Subtype for Classification

Subtype	Candidate genes	Accuracy (%)	F-measure	Area under the curve	MCC
1	42	93.78	0.935	0.950	0.521
2	16	95.08	0.945	0.832	0.314
3	32	85.55	0.853	0.854	0.436
4	96	87.96	0.873	0.891	0.531
5	18	91.07	0.908	0.897	0.449
6	69	95.78	0.949	0.868	0.338
7	16	88.66	0.883	0.840	0.382
8	27	86.86	0.866	0.881	0.448
9	59	94.48	0.932	0.904	0.423
10	75	95.68	0.957	0.965	0.758

MCC,.

Table 5.

Using High-Confidence Genes (Present in >70% of the Networks) Corresponding to Each Subtype (Including Candidate Genes) for Classification

Subtype	High-confidence genes	Accuracy (%)	F-measure	Area under the curve	MCC
1	103	94.98	0.947	0.964	0.607
2	59	97.59	0.975	0.990	0.709
3	148	88.56	0.877	0.893	0.516
4	225	87.96	0.869	0.904	0.516
5	68	97.09	0.972	0.992	0.840
6	187	98.99	0.990	0.997	0.875
7	104	92.17	0.908	0.906	0.515
8	100	91.37	0.907	0.930	0.610
9	144	95.38	0.948	0.962	0.562
10	115	95.88	0.959	0.968	0.765

Figure 2 shows the genes with medium and high confidence in Subtype-1 NB. As shown in the figure, some of the hub genes in the subnetwork such as cyclin-dependent kinase 1 (CDK1) are known indicators in BC prognosis (Kim et al., 2008) and further investigations for determining their possible roles in Subtype-1 of BC are in progress.

FIG. 2.

The NB of Subtype-1, including medium- and high-confidence interactions.

We used IntOGen's mutational breast cancer driver genes and compared them with the genes that we identified in the NBs of each BC subtype. Table 6 shows all mutational driver genes and their overlap with NB corresponding to one of the subtypes. As shown in the table, out of 184 mutational driver genes, our model covered 125 of them as part of NBs in different BC subtypes. This is impressive since our model covered these genes without having access to any mutational data corresponding to the METABRIC data set.

Table 6.

Mutational Driver Genes Identified in the Network Biomarkers of Breast Cancer Subtypes

Gene	Present	Gene	Present	Gene	Present	Gene	Present	Gene	Present	Gene	Present
PIK3CA		LPHN2		KDM5C	✓	KALRN	✓	ERCC2	✓	MAX
TP53	✓	CDKN1B	✓	APC	✓	EIF4A2	✓	HSPA8	✓	EIF2C3
PTEN	✓	TBL1XR1	✓	ARID2	✓	MGA	✓	NUP107		ARNTL	✓
AKT1	✓	BRCA2	✓	CIC	✓	MECOM	✓	ERBB2IP		KLF4	✓
SF3B1	✓	BRCA1	✓	SMAD4	✓	ARHGAP35	✓	BMPR2	✓	G3BP2	✓
KRAS		ANK3		BAP1	✓	NUP98	✓	MLH1		TCF12	✓
GATA3	✓	ERBB2	✓	PBRM1		STAG1	✓	CLTC	✓	CARM1	✓
MAP3K1	✓	MYH9	✓	DDX5	✓	SMARCA4		NOTCH1	✓	TCF7 L2	✓
MLL3		MED23	✓	KEAP1	✓	BCOR	✓	SUZ12		SEC24D
CDH1	✓	MLLT4	✓	STK11	✓	PTPRU		HLA-A	✓	ZFP36 L2	✓
NCOR1	✓	ARID4B	✓	RPL5	✓	FLT3	✓	CNOT3		CAST	✓
MAP2K4	✓	RPGR		PHF6		ARFGEF2	✓	SOS2	✓	CLASP2	✓
RUNX1	✓	HCFC1		FUBP1	✓	BPTF	✓	HLF	✓	ACSL6
NF1		MYH14		EIF1AX		FOXP1	✓	DHX15	✓	MUC20
RB1	✓	NOTCH2	✓	MACF1	✓	CEP290		EIF4G1	✓	NF2	✓
ATM	✓	SPTAN1	✓	AHNAK	✓	MED24	✓	ACO1	✓	ITSN1	✓
ARID1A		PRKAR1A	✓	MED12	✓	CSDE1	✓	LCP1	✓	RBM5	✓
TBX3		CCAR1	✓	AKAP9	✓	EP300	✓	PIP5K1A		AQR	✓
MLL2		RFC4		TAF1		FN1	✓	NR4A2	✓	MSR1
CBFB		CAD	✓	SVEP1	✓	BNC2	✓	CHEK2	✓	THRAP3
CTCF		SRGAP1	✓	ASPM		CHD9	✓	MKL1	✓	GOLGA5	✓
CHD4		ACVR1B	✓	ATR	✓	POLR2B	✓	CUL1	✓	ACTB	✓
PIK3R1	✓	GPS2		MLL		PIK3CB	✓	TNPO1		RHEB
STAG2		PRKCZ	✓	MTOR		LRP6		DIS3		ATF1	✓
CASP8	✓	FBXW7	✓	ASH1L	✓	FMR1		FUS	✓	ATIC	✓
FOXA1	✓	BRAF		NSD1		SOS1	✓	CLSPN	✓	PCSK6
MYB	✓	NRAS	✓	CDK12	✓	PCDH18		STK4	✓	CCT5	✓
ZFP36 L1		IDH1	✓	MYH11	✓	DDX3X	✓	RBBP7	✓	HNRPDL
PAX5		SETD2		TRIO	✓	AFF4	✓	SFPQ	✓	TGFBR2	✓
TFDP1		EGFR	✓	SETDB1	✓	TOM1	✓	ELF1		STIP1	✓
PTGS1		NDRG1	✓	PIK3R3		CSNK1G3

We also computed the odds ratio (Scotia, 2010) of having a deletion or amplification in each candidate gene and compared their relationship to the expression of that specific gene across different subtypes. Figure 3 shows the odds ratio and GE of candidate genes for Subtype-1 of BC. Odds ratio shows how a deletion/amplification in a specific gene is likely to separate a subtype from the others; the higher the ratio, the more effective is that aberration in separating one subtype from the rest. In most of the cases, we found the CNA and GE as two independent factors, which means that having a high odds ratio for a gene does not necessarily mean that the GE pattern for that gene has totally different patterns in one subtype against the other subtypes (see Supplementary Data for odds ratio and GE of candidate genes for Subtypes-2–10).

FIG. 3.

The odds ratio and GE of candidate genes for Subtype-1.

Some of genes appeared in more than one subtype as a high-confidence gene. Figure 4 depicts these genes along with the subtypes these genes belong to. As shown in the figure, some of the genes such as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$COPS5$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$GRB2$$ \end{document} , and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$MAP1LC3B$$ \end{document} appeared in the network of four subtypes, despite not being among the candidate genes in any of the subtypes. This implies that these genes, despite having nonsignificant CNA and GE simultaneously, actively participate in differentiation of several BC subtypes.

FIG. 4.

List of high-confidence genes participating in NB of more than one subtype. Markers with black circle depict the candidate genes.

4. Discussion

We used the IPAD pathway analysis database and tool (Zhang and Drabier, 2012) to determine the diseases associated with the hub nodes in the NBs obtained for each subtype. For example, out of nine genes that had >10 connections in the NB corresponding to Subtype-1, at least three of them have been related to BC, in the literature. Table 7 shows the involvement of these hub genes in BC.

Table 7.

Breast-Related Diseases Corresponding to Candidate Genes of Each Subtype Network Biomarker with p-Value of <0.05

Disease ID	Disease name	Involved genes in the disease	Subtype	p
MESH:D058922	Inflammatory breast neoplasms	DDIT4; ECH1; COX17; UQCRB; TMPO; CCNE2; MAPKAPK5; CDK1; FANCI; GHRL; ETV6; CX3CR1; ZWINT; H2AFJ	1	2.8e-2
MESH:D018270	Carcinoma, ductal, breast	CX3CR1; ETV6; LTF; CDK1; FANCI; GHRL; SLC5A1; ZWINT; IFNGR2; ECH1; DDIT4; CLTC; UQCRB; COX17; VGLL4; ARID5B; MAPKAPK5; CCNE2; TMPO; H2AFJ; CCT2	1	4.49e-2

5. Conclusion

We have introduced a novel framework for identifying NBs related to each of 10 BC subtypes. In the proposed framework, first we use the CNA/CNV information along with GE data to determine a set of candidate genes for each BC subtype. We then use seeds to find the differential NBs of a given subtype with the candidate genes already generated for the subtype. The NB of a subtype is the collection of subnetworks, each seeded with a candidate gene. We also used different performance measures to evaluate the performance of the obtained NBs. Our results show that NBs can separate one subtype from another with very high degree of accuracy. This may provide great utility in properly stratifying patients for treatment. Moreover, the obtained NBs may also allow BC researchers to gain insight into the mechanisms driving different BC subtypes.

Footnotes

Acknowledgments

This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grants and by the Windsor Essex County Cancer Centre Foundation (WECCCF) Seeds4Hope grant.

Author's Contributions

A.N., L.R., F.F., and I.R. conceived the model. I.R. and F.F. implemented the algorithms and conducted the experiments. L.P. and M.D. conducted the biological validation. All authors have read and approved the final manuscript.

Author Disclosure Statement

No competing financial interests exist.

References

Beroukhim

, Getz

, Nghiemphu

, et al. 2007. Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. Proc. Natl. Acad. Sci. U S A, 104, 20007–20012.

Ceol

, Aryamontri

A.C.

, Licata

, et al. 2010. Mint, the molecular interaction database: 2009 update. Nucleic Acids Res. 38, D532–D539.

Consortium

I.H.

, et al. 2010. Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52–58.

Curtis

, Shah

S.P.

, Chin

S.-F.

, et al. 2012. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486, 346–352.

DeSantis

, Ma

, Bryan

, et al. 2014. Breast cancer statistics, 2013. CA Cancer J. Clin., 64, 52–62.

Intogen's mutational breast cancer driver genes. Available at: www.intogen.org/search?cancer=brca Accessed December 7, 2016.

Kerrien

, Aranda

, Breuza

, et al. 2012. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40, D841–D846.

Kim

, Nakayama

, Miyoshi

, et al. 2008. Determination of the specific activity of cdk1 and cdk2 as a novel prognostic indicator for early breast cancer. Ann. Oncol., 19, 68–72.

Liu

, and Setiono

1995. Chi2: Feature selection and discretization of numeric attributes, 388–388. In 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. IEEE Computer Society.

10.

Liu

K.-Q.

, Liu

Z.-P.

, Hao

J.-K.

, et al. 2012a. Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics, 13, 126.

11.

Liu

, Liu

Z.-P.

, Zhao

X.-M.

, et al. 2012b. Identifying disease genes and module biomarkers by differential interactions. J. Am. Med. Inform. Assoc., 19, 241–248.

12.

Loo

C.E.

, Straver

M.E.

, Rodenhuis

, et al. 2011. Magnetic resonance imaging response monitoring of breast cancer during neoadjuvant chemotherapy: Relevance of breast cancer subtype. J. Clin. Oncol., 29, 660–666.

13.

Mavi

, Urhan

, Jian

Q.Y.

, et al. 2006. Dual time point 18f-fdg pet imaging detects breast cancer with high sensitivity and correlates well with histologic subtypes. J. Nucl. Med., 47, 1440–1446.

14.

Prasad

T.K.

, Goel

, Kandasamy

, et al. 2009. Human protein reference database—2009 update. Nucleic Acids Res. 37(Suppl 1), D767–D772.

15.

Raymond

, and Rousset

1995. An exact test for population differentiation. Evolution, 49, 1280–1283.

16.

Razzaghi

, Troester

M.A.

, Gierach

G.L.

, et al. 2013. Association between mammographic density and basal-like and luminal a breast cancer subtypes. Breast Cancer Res. 15, R76.

17.

Scotia

2010. Explaining odds ratios. J. Can. Acad. Child Adolesc. Psychiatry, 19, 227.

18.

Stark

, Breitkreutz

B.-J.

, Chatr-Aryamontri

, et al. 2011. The biogrid interaction database: 2011 update. Nucleic Acids Res. 39(Suppl 1), D698–D704.

19.

Xenarios

, Salwinski

, Duan

X.J.

, et al. 2002. Dip, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305.

20.

Zhang

, and Chen

J.Y.

2013. Breast cancer subtyping from plasma proteins. BMC Med. Genomics, 6(Suppl 1), S6.

21.

Zhang

, and Drabier

2012. Ipad: The integrated pathway analysis database for systematic enrichment analysis. BMC Bioinformatics, 13(Suppl 15), S7.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

1.97 MB

0.00 MB