Detecting breast cancer using artificial intelligence: Convolutional neural network

Abstract

BACKGROUND:

One of the most broadly founded approaches to envisage cancer treatment relies upon a pathologist’s efficiency to visually inspect the appearances of bio-markers on the invasive tumor tissue section. Lately, deep learning techniques have radically enriched the ability of computers to identify objects in images fostering the prospect for fully automated computer-aided diagnosis. Given the noticeable role of nuclear structure in cancer detection, AI’s pattern recognizing ability can expedite the diagnostic process.

OBJECTIVE:

In this study, we propose and implement an image classification technique to identify breast cancer.

METHODS:

We implement the convolutional neural network (CNN) on breast cancer image data set to identify invasive ductal carcinoma (IDC).

RESULT:

The proposed CNN model after data augmentation yielded 78.4% classification accuracy. 16% of IDC ( $-$ ) were predicted incorrectly (false negative) whereas 25% of IDC ( $+$ ) were predicted incorrectly (false positive).

CONCLUSION:

The results achieved by the proposed approach have shown that it is feasible to employ a convolutional neural network particularly for breast cancer classification tasks. However, a common problem in any artificial intelligence algorithm is its dependence on the data set. Therefore, the performance of the proposed model might not be generalized.

Keywords

Convolutional neural network deep learning artificial intelligence machine learning breast cancer ductal carcinoma

1. Introduction

Cancer is an ensemble of diseases with gigantic molecular miscellany between tumors of afflicted patients. Invasive ductal carcinoma (IDC), also known as infiltrating ductal carcinoma, or carcinoma of ‘No Special Type’ (NST) is characterized by hard lumps with asymmetrical borders. The IDC lump typically feels firmer than that of a benign breast protuberance. Invasive breast cancers spread from the origin (either the milk ducts or the lobules) into the adjoining breast tissue. These comprise approximately 70% [1, 2] of all breast cancer cases and have an inferior prognosis compared to the in-situ sub-types [3, 4].

One of the most broadly founded approaches to envisage targeted treatment is based on the visual examination of bio-marker appearance on tissue sections from a tumor by a pathologist. An exemplar in breast cancer is the semi-quantitative evaluation of the appearance of the human epidermal growth factor receptor 2 (HER2) as identified by immuno-histo-chemistry (IHC) which circumscribes patient aptness for anti-HER2 therapies. For patients whose tumor piquantly overexpresses HER2, the addition of treatment besieged against HER2 is chiefly efficient at improving clinical outcomes compared to chemotherapy alone [5].

On a mammogram, IDC typically appears like a mass with spikes radiating from the edges; it may also emerge as a smooth-edged protuberance or as calcification in the tumor area. However, due to the small size and low contrast compared to the background of images, it is challenging and time-consuming for radiologists to make an independent and accurate assessment of micro-calcification. The problem is especially perplexing for inexperienced radiologists when facing a plethora of mammograms generated in widespread screening [6].

1.1 Motivation

Substantial diagnostic capriciousness has been reported between pathologists and it is deduced that 4% of negative cases and 18% of positive cases are misdiagnosed [7]. In particular, scoring variability has been publicized to be imperative for cases that demonstrate heterogeneous HER2 expression within the tumor cell population [8]. To ensure diagnostic precision, pathologists and oncologists customarily entreat for second opinions. However, second opinions are not effortlessly available and can take several weeks. This situation is likely to become more challenging in the next decade with the mounting number of biomarkers to be gauged by pathologists for clinical decision-making and the dearth of newly trained pathologists [9]. Consequently, there is a necessity to develop helpful automated tools to overcome these tribulations and convalesce diagnostic performance of breast cancer.

1.2 Potential of artificial intelligence

Interest in AI has gone through fluctuating phases of expectation and disappointment since the late 1950s primarily due to limited computational ability. However, the development in computing infrastructure and machine learning algorithm has shifted the AI paradigm. Now adoption of AI technology has accelerated in various arenas.

In healthcare, some research focusing on the role of AI in cancer detection has advocated that by proficiently examining large numbers of images, artificial intelligence (AI)-based methods can curtail intra and inter-observer performance inconsistency AI-aided diagnosis holds great potential to facilitate clinical decision-making in monogrammed oncology. Prospective benefits of using computer-aided diagnosis embrace abridged diagnostic turn-around time and improved bio-marker scoring reproducibility. In the last decade, viable algorithms have been consented by the Food and Drug Administration (FDA) for computer aided HER2 scoring. Moreover, many studies have acknowledged that image analysis improves IHC bio-marker scoring accuracy and reproducibility in tumors [10, 11].

Lately, deep learning techniques have radically enriched the ability of computers to identify objects in images [12, 3] fostering the prospect for fully automated computer-aided diagnosis. Given the noticeable role of nuclear structure in cancer cells, various non-parametric methods such as deep learning have been employed for classifying histopathology images and diagnosing breast cancers [13]. Among deep learning models [14], convolutional neural networks (CNN) is debatably the most studied and validated method in a range of image understanding tasks such as face recognition and character recognition.

1.3 Problem statement

As the efficiency and effectiveness of artificial intelligence (AI) in enhancing diverse aspects of healthcare is burgeoning significantly, it becomes likely that AI will soon become an integrated system in a routine clinical process.

This study is driven by the potential role of AI in the recent future. In this study, we propose and implement an image classification technique using convolutional neural networks to identify breast cancer.

2. Methodology

2.1 Data description

The original data was collected by Andrew Janowczyk [15]. It consists of 162 whole mount slide images of breast cancer specimens scanned at 40x. From that, we used 5547 (50.26% IDC and 49.74% non-IDC) patches of size 50 $\times$ 50. We coded the images as 0 (IDC $-$ ) representing non-IDC images and 1 (IDC $+$ ) otherwise (Figs 1 and 2).

Figure 1.

Data sample used for training the model. The positive (green) categorizes refers to IDC $+$ (invasive tumor tissue) and negative (red) categories refers to IDC $-$ (healthy or not invasive tumortissue) [3].

Figure 2.

The area confined by green line indicated invasive tumor tissue zone [3].

All the analysis were performed using python 2.7.

2.2 Data pre-processing

Data pre-processing involves scaling the image data based on pixel intensities and augmenting the data.

Data scaling is an essential step to ensure every input parameter has a similar data distribution. Data scaling helps to train the data easily. The image data set has three channels of data corresponding to the colors Red, Green, Blue (RGB) and their pixels vary in the range of [0, 255]. These values would be too high for the model to process. So, we scaled the image with a 1/255 factor. The scaled image pixel intensities vary in the range of [0, 1]. The data set was then augmented, in which we randomly rotated the images by 90 degrees to translate images vertically or horizontally (Fig. 3).

Figure 3.

Images and its different transformations performed during augmentation.

2.3 Model implementation

2.3.1 Convolutional neural network

The convolution network is one of the most efficacious biology-inspired artificial intelligence networks. Though the convolution network has been engaged by various fields, some fundamental design principles of the neural network arise from neural science. The visual structure of mammals is comparable to a convolution operation. For instance, a two-dimensional image “I” is considered as the input, and the convolution kernel is characterized by “K”; the convolution of the input image is, as follows:

$\displaystyle S(i,j)=(I\times K)(i,j)=\sum_{m}\sum_{n}I(m,n)K(i-m,j-n)$ (1)

Generally, Eq. (1) can be easily applied in a machine learning library. In the above equation, convolution should reverse the convolution kernel, and only then, sum the weights [17]; however, the law of commutation is less performed in a neural network instead, cross-correlation is desirable in a neural network as shown in Eq. (2).

$\displaystyle S(i,j)=(I\times K)(i,j)=\sum_{m}\sum_{n}I(i+m,j+n)K(m,n)$ (2)

A Convolutional Neural Network (CNN) involves the implementation of filters or feature detectors over an entire image to measure the correspondence between individual image patches and signature patterns within the training set [3]. Then, a pooling function is implemented to minimize the dimensionality of the feature space. The pooling operation can be performed by calculating the maximum or the average of inputs connected from the previous layer to the kernel for a given position. This study uses the Max pooling technique. It is more popular among applications and helps eliminate noise without impacting the activation value of the layer. The image patches collected are then used as inputs to the CNN architecture (Fig. 4) in which two dedicated layers are used for convolution and pooling while the other layers are fully-connected.

Figure 4.

Convolutional neural network architecture.

The fully connected layer was used for learning non-linear decision boundaries and consecutively perform the classification task. This step depends on the preceding densely connected layers in a typical feed-forward manner.

The hidden layers are then passed through the ReLU activation layer. It only allows positive activation to pass through the next layer. The output node in this study was set as a sigmoid activation function, which varies between 0 and 1 for input ranging from negative to positive.

Tables 1 and 2 show the configuration of the network, with and without data augmentation, used in this study.

Table 1

Specification of CNN configuration without data augmentation

	Input: 50 $\times$ 50 $\times$ 3
Hidden layer 1	Conv2d	• kernel size $=$ 3 $\times$ 3 • no.of filters $=$ 32 • padding $=$ same • strides $=$ 1
	Activation function – ReLU
	Max pooling	kernel size $=$ 2 $\times$ 2
Hidden layer 2	Conv2d	• kernel size $=$ 3 $\times$ 3 • no.of filters $=$ 32
	Activation function – ReLU
	Max pooling	kernel size $=$ 2 $\times$ 2
Hidden layer 3	Fully connected	576 neurons
	Activation function – ReLU
Hidden layer 4	Output layer	no.of classes $=$ 2
	Activation function – Softmax

Table 2

Specification of CNN with data augmentation

	Input: 50 $\times$ 50 $\times$ 3
Hidden layer 1	Conv2d	• kernel size $=$ 3 $\times$ 3 • no.of filters $=$ 32
	Activation function – ReLU
Hidden layer 2	Conv2d	• kernel size $=$ 3 $\times$ 3 • no.of filters $=$ 32
	Activation function – ReLU
	Max pooling	kernel size $=$ 2 $\times$ 2
Hidden layer 3	Conv2d	• kernel size $=$ 3 $\times$ 3 • no.of filters $=$ 32
	Activation function – ReLU
	Max pooling	kernel size $=$ 2 $\times$ 2
Hidden layer 4	Fully connected	256 neurons
	Activation function – ReLU
Hidden layer 5	Output layer	no.of classes $=$ 2
	Activation function – Softmax

3. Results

3.1 Model performance before data augmentation

Figure 5 shows that confusion matrix, representing the classification performance of the model before data augmentation. The CNN network achieved 77% accuracy. 23% of IDC ( $-$ ) were predicted incorrectly and 22% of IDC ( $+$ ) were predicted incorrectly which implies that there is very little bias.

Here the total number of classification performed were 1,110 (391 $+$ 135 $+$ 119 $+$ 465). Out of 1,110, 391 were correctly classified as IDC ( $-$ ) whereas, 119 were originally IDC ( $+$ ) but was miss-classified as IDC ( $-$ ). Similarly, 465 data points were correctly classified as IDC ( $+$ ), whereas, 135 were originally IDC ( $-$ ) but were miss-classified as IDC ( $+$ ).

Thus, total correct classification were 856 (0.7711) and miss-classification were 254 (0.2288).

Figure 5.

Classification matrix of CNN without data augmentation.

Figure 6.

Classification matrix of CNN with data augmentation.

Figure 7.

A. Learning curve of CNN model. B. Change in performance with epoch.

Figure 8.

Actual versus predicted IDC.

3.2 Model performance after data augmentation

Figure 6 illustrates the confusion matrix, representing the classification performance of the model after data augmentation. The CNN after data augmentation yielded 78.4% accuracy, 16% of IDC ( $-$ ) were predicted incorrectly whereas 25% of IDC ( $+$ ) were predicted incorrectly.

Here the total number of classification performed were 1,110 (371 $+$ 167 $+$ 72 $+$ 500). Out of 1,110, in this scenario, 371 were correctly classified as IDC ( $-$ ) whereas, 72 were originally IDC ( $+$ ) but was miss-classified as IDC ( $-$ ). Similarly, 500 data points were correctly classified as IDC ( $+$ ), whereas, 167 were originally IDC ( $-$ ) but were miss-classified as IDC ( $+$ ).

Thus, total correct classification were 871 (0.7846) and miss-classification were 239 (0.2153).

Therefore, the model’s miss-classification reduces by approximately 0.0135 after data augmentation. This indicates that model might have generated some bias or over fitting in the prediction. Figure 7A illustrates the learning curve of the model and Fig. 7B shows the model’s performance on validation and training set. Since the classification accuracy of the model on the validation and training set is not significantly different, we can imply that the learning rate and classification accuracy of the CNN model after data augmentation is better and that the network architecture does not over fit. Thus, we recommend data augmentation for this particular data set.

3.2.1 Actual versus predicted

Figure 8 illustrates the actual annotations made by medical expert and the effected area identified by our model. The green perimeter represents the area marked by medical experts as risk prone zone (IDC tissue region).

The heat map on the right side of the figure represents the probability density generated by our model. It shows the highest likelihood of IDC tissue regions in red, orange, and yellow (red being the zone of the highest probability) colors whereas the lowest chance of IDC regions are indicated by green and blue.

4. Discussion and conclusion

The pathology society is exhibiting amassing interest in deep learning as demonstrated by studies reporting deep learning-based image analysis detecting tumor regions within tissues. Developments in AI methods allow comprehensive and unprejudiced analysis of diagnostic features in micro-calcification and masses. In order to augment the chances of clinical benefit, afresh developed cancer treatments are targeted at exclusive molecular alterations that can be acknowledged in the tumor prior to the commencement of treatment [16]. Through the automatic identification and classification of micro-calcification, computer-based methods can be advised to aid early detection and diagnosis. Table 3 shows the comparative analysis of different models for breast cancer prediction. Our model outperforms several existing methods. Our study outperformed Gail Model by about 4.50% [17]. The Gail Model is a well-known and commonly used breast cancer risk assessment tool. However, it fails to detect breast cancer risk in women with a significant family history [18]. Our study also outperformed other AI models such as LASSO penalized regression [19], CNN [20], CNN $+$ SVM [21], and linear regression [21].

A wide variety of machine learning classifiers have been established for early diagnosis of breast cancer. The widely used practices are based on support vector machines (SVM) [2, 1], k-nearest neighbor (KNN) [22] method and linear discriminant analysis (LDA). However, the predictive and classification power of these methods are restricted due to the computational costs of categorizing absolute features for subset characterization and optimization. In this study we propose CNN model that not only overcome the limitations of classical machine learning algorithms but also produce the comparable predictive accuracy without over-fitting.

The results achieved by the proposed approach have shown that it is feasible to employ convolutional neural network as a decision support system, particularly for breast cancer classification task. However, a common problem in any artificial intelligence algorithm is its dependence on data set. Therefore, the performance of the proposed model might not be generalized. Moreover, provided low number of samples in such a kind of data sets, training a CNN is not feasible. In the future, we intend to use the activation layers of a CNN trained on other data sets to extract patches based on class-activated regions.

Table 3

Comparative analysis of predictive models

	Artificial intelligence models	Accuracy
1	Our model ${}^{*}$	0.7840 ${}^{*}$
2	GAIL model [17]	0.7500
3	LASSO penalized regression [19]	0.6300
4	CNN [20]	0.7760
5	CNN $+$ SVM [21]	0.7690
6	Linear regression [21]	0.7750

${}^{*}$ Our model outperform other models used on similar dataset for classification purposes.

With deep learning algorithms meeting expectations, and in some occasions surpassing, the performance of clinicians, the promise is already apparent. However, despite AI’s good performance, as demonstrated in this study, particularly in cancer diagnosis, substantial challenges, not limited to ethical and regulatory concerns [23], act as a hurdle to the adoption and integration of AI in health care. Healthcare devices or systems augmented by AI or machine learning algorithms have an ability to independently learn from given data and real-world use and can consecutively improve the performance of the care providers over time [24]. This uniqueness of AI distinguishes it from the bundle of other software used in health care and thus impose novel regulatory concerns.

As proposed and advocated in our previous work [25, 26], it is one of the primary responsibilities of the FDA and other stakeholders not limited to health services, and clinicians to ensure patient safety and quality of care. Black box models such as deep learning algorithms are un-explainable in their functioning and has the potential to evolve with time-based on new data. Such models may require special policies and guidelines to ensure safety. For instance, in the context of this study. If our proposed model is exposed to data that is incorrectly labeled for a sufficient period of time, will generate wrong classifications. Concerns also emerge about patient safety, interpretation of AI output, and latent risks associated with it. AI systems and software are not typically align with current models of care delivery. Thus, most clinicians are trained to operate or interpret AI systems. Clinicians must also be made aware of the false negative classification.

Additionally, future research must focus on developing Regulatory standards to measure and scrutinize AI algorithmic safety and impact [25]. Consecutively, address the issues of liability. For instance, “there is the question of who is responsible when errors result from the use of AI software or AI-augmented devices in the clinical context” [24]. Existing medico-legal guidelines are also superficial and inefficient in defining the boundary distinguishing the responsibility of clinicians and AI system when AI agents guide clinical care [25, 27]. Given these limitations of AI, its implications in the domain of healthcare and diagnostics is yet to be seen.

Footnotes

Conflict of interest

None to report.

References

DeSantis

C.E.

Sauer

A.G.

Newman

L.A.

, and Jemal

Breast cancer statistics, 2017, racial disparity in mortality by state, CA: A Cancer Journal for Clinicians 67(6) (2017), 439–448. doi: 10.3322/caac.21412.

DeSantis

Bryan

, and Jemal

Breast cancer statistics, 2013, CA: A Cancer Journal for Clinicians 64(1) (2014), 52–62. doi: 10.3322/caac.21203.

Cruz-Roa

Basavanhally

González

Gilmore

Feldman

Ganesan

Shih

Tomaszewski

, and Madabhushi

Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks, International Society for Optics and Photonics 9041 (2014), 904103–904103.

Cruz-Roa

Gilmore

Basavanhally

Feldman

Ganesan

Shih

N.N.C.

Tomaszewski

González

F.A.

, and Madabhushi

Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent, Scientific Reports 7(1) (2017), 46450–46450. doi: 10.1038/srep46450.

Arteaga

C.L.

Sliwkowski

M.X.

Osborne

C.K.

Perez

E.A.

Puglisi

, and Gianni

Treatment of HER2-positive breast cancer: Current status and future perspectives, Nature Reviews Clinical Oncology 9(1) (2012), 16–32. doi: 10.1038/nrclinonc.2011.177.

Muttarak

Peh

W.C.G.

Euathrongchit

Lin

S.E.

TAN

A.G.S.

Lerttumnongtum

, and Sivasomboon

Spectrum of imaging findings in melioidosis, The British Journal of Radiology 82(978) (2009), 514–521. doi: 10.1259/bjr/15785231.

Wang

, and Kaufman

R.J.

The impact of the endoplasmic reticulum protein-folding environment on cancer development, Nature Reviews Cancer 14(9) (2014), 581–597. doi: 10.1038/nrc3800.

O’connor

J.P.

Aboagye

E.O.

Adams

J.E.

Aerts

H.J.

Barrington

S.F.

Beer

A.J.

Boellaard

Bohndiek

S.E.

, and Brady

Imaging biomarker roadmap for cancer studies, Nat Rev Clin Oncol.

Robboy

S.J.

Weintraub

Horvath

A.E.

Jensen

B.W.

Alexander

C.B.

Fody

E.P.

Crawford

J.M.

Clark

J.R.

Cantor-Weinberg

Joshi

M.G.

Cohen

M.B.

Prystowsky

M.B.

Bean

S.M.

Gupta

Powell

S.Z.

Speights

V.O.

Gross

D.J.

, and Black-Schaffer

W.S.

Pathologist workforce in the united states: I. Development of a predictive model to examine factors influencing supply, Archives of Pathology & Laboratory Medicine 137(12) (2013), 1723–1732. doi: 10.5858/arpa.2013-0200-oa.

10.

Gavrielides

M.A.

Gallas

B.D.

Lenz

Badano

, and Hewitt

S.M.

Observer variability in the interpretation of her2/neu immunohistochemical expression with unaided and computer-aided digital microscopy, Arch Pathol Lab Med 135(2) (2011), 233–275.

11.

Stålhammar

Martinez

N.F.

Lippert

Tobin

N.P.

Mølholm

Kis

Rosin

Rantalainen

Pedersen

Bergh

Grunkin

, and Hartman

Digital image analysis outperforms manual biomarker assessment in breast cancer, Modern Pathology 29(4) (2016), 318–329. doi: 10.1038/modpathol.2016.34.

12.

LeCun

Bengio

, and Hinton

Deep learning, Nature 521(7553) (2015), 436–444. doi: 10.1038/nature14539.

13.

Radhakrishnan

Damodaran

Soylemezoglu

A.C.

Uhler

, and Shivashankar

G.V.

Machine learning for nuclear mechano-morphometric biomarkers in cancer diagnosis, Scientific Reports 7(1) (2017), 17946–17946. doi: 10.1038/s41598-017-17858-1.

14.

Vang

Y.S.

Chen

, and Xie

Springer International Publishing, 914–922.

15.

Janowczyk

, and Madabhushi

Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases, Journal of Pathology Informatics 7(1) (2016), 29–29. doi: 10.4103/2153-3539.186902.

16.

Thangue

N.B.L.

, and Kerr

D.J.

Predictive biomarkers: A paradigm shift towards personalized cancer medicine, Nature Reviews Clinical Oncology 8(10) (2011), 587–596. doi: 10.1038/nrclinonc.2011.121.

17.

Wang

Huang

Dai

Song

, and Chen

Assessment of performance of the gail model for predicting breast cancer risk: A systematic review and meta-analysis with trial sequential analysis, Breast Cancer Res 20(1) (2018).

18.

Force

U.S.P.S.T.

, and U.S. Preventive Services Task Force, BRCA-Related Cancer: Risk Assessment, Genetic Counseling, and Genetic Testing, 2014. https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/brca-related-cancer-risk-assessment-genetic-counseling-and-genetic-testing.

19.

Mavaddat

Michailidou

Dennis

Lush

Fachal

, and Lee

Polygenic risk scores for prediction of breast cancer and breast cancer subtypes, Am J Hum Genet 104(1) (2019), 21–34.

20.

Araújo

Aresta

Castro

Rouco

Aguiar

Eloy

Polónia

, and Campilho

Classification of breast cancer histology images using convolutional neural networks, Plos One 12(6) (2017), 0177544–0177544. doi: 10.1371/journal.pone.0177544.

21.

Vang

Y.S.

Chen

, and Xie

Deep learning framework for multi-class breast cancer histology image classification, arXiv:1802.00931, 2018. https://arxiv.org/pdf/1802.00931v1.pdf.

22.

Şahan

Polat

Kodaz

, and Güneş

A new hybrid method based on fuzzy-artificial immune system and -nn algorithm for breast cancer diagnosis, Computers in Biology and Medicine 37(3) (2007), 415–423. doi: 10.1016/j.compbiomed.2006.05.003.

23.

Cath

Governing artificial intelligence: Ethical, legal and technical opportunities and challenges introduction, Philosophical Transactions of the Royal Society a-Mathematical Physical and Engineering Sciences 376 (2018), 1–8. doi: 10.1098/rsta.2018.0080.

24.

Reddy

Allan

Coghlan

, and Cooper

A governance model for the application of AI in health care, Journal of the American Medical Informatics Association 27(3) (2020), 491–497. doi: 10.1093/jamia/ocz192.

25.

Choudhury

A framework for safeguarding artificial intelligence systems within healthcare, British Journal of Healthcare Management 25(8) (2019), 1–6.

26.

Choudhury

Asan

, and Mansouri

Role of Artificial Intelligence, Clinicians & Policymakers in Clinical Decision Making: A Systems Viewpoint, in: 2019 International Symposium on Systems Engineering (ISSE), Edinburg (United Kingdom), IEEE, 2020, pp. 1–8, doi: 10.1109/ISSE46696.2019.8984573. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8984573&isnumber=8984263.

27.

Reddy

Fox

, and Purohit

M.P.

Artificial intelligence-enabled healthcare delivery, Journal of the Royal Society of Medicine 112(1) (2019), 22–28. doi: 10.1177/0141076818815510.