A novel three-step deep learning approach for the classification of breast cancer histopathological images

Abstract

Breast cancer is a widespread and significant health concern among women globally. Accurately categorizing breast cancer is essential for effective treatment, ultimately improving survival rates. Moreover, deep learning (DL) has emerged as a widely adopted approach for precise medical image classification in recent years, showing promise in this domain. However, despite the availability of DL models proposed in the literature for automated classification of breast cancer histopathology images, achieving high accuracy remains challenging. A minor modification to pre-trained models and simple training strategies can further enhance model accuracy. Based on the approach, this paper proposed an anti-aliased filter in a pre-trained ResNet-34 and a novel three-step training process to improve BC histopathology image classification accuracy. The training involves systematically unfreezing layers and imposing additional constraints on the rate of change of learnable parameters. In addition, four-fold on-the-fly data augmentation enhances model generalization. The Ada-Hessian optimizer adjusts learning rates based on first and second-order gradients to improve convergence speed. The training process utilizes a large batch size to minimize the training loss associated with batch normalization layers. Even with the limited GPU size, the gradient accumulation technique achieves a large batch size. Collectively, these strategies minimize training time while maintaining or improving the accuracy of BC histopathology image classification models. In the experimental implementation, the proposed architecture achieves superior results compared to recent existing models, with an accuracy of 98.64%, recall (98.98%), precision (99.35%), F1-Score (99.17%), and MCC (97.36%) for binary classification. Similarly, the model achieves an accuracy of 95.01%, recall (95.01%), precision (94.95%), F1-Score (94.94%), and MCC (93.42%) for the eight-class category of BC images.

Keywords

Deep learning anti-aliased ResNet BreakHis breast cancer fine-tuning transfer learning

1 Introduction

As per the Global Cancer Statistics (GCS) 2020 [1], female breast cancer is the most prevalent cancer worldwide, with 2.3 million new cases reported in 2020. Breast cancer accounts for around 11.7% and 6.9% of new cases and deaths, respectively, in the total cancer statistics. Therefore, early detection and proper treatment significantly impact survival rates. Various conventional methods, such as palpation, mammography, ultrasound, magnetic resonance imaging (MRI), and thermography, have been used for initial breast cancer screening. In cases of suspicion, tissues from the malignant prone area are extracted using various biopsy techniques. Subsequently, histopathology slides are prepared by staining with two dyes, Haematoxylin and Eosin (H&E). Hematoxylin stains the nuclei in the tissue into dark blue, enabling quick identification of the nuclei’s size and shape. At the same time, eosin changes the cytoplasm to pink, determining the volume of the cytoplasm and its morphological changes [2]. Professional pathologists will then inspect the samples under a microscope with different magnification factors to assess tumor staging and grading. This information is vital in deciding the best possible treatment. However, analyzing such histopathological samples is tedious, time-consuming, and prone to errors.

In the past few years, significant progress has been made in reducing the workload of pathologists by converting histological samples into high-resolution digital images with different magnification factors. This approach allows for easy identification of the nuclei and cytoplasm containing morphological changes. Furthermore, viewing these images by changing magnification factors improves the detection and identification of the cancer stage. To simplify this process further, the pathologist’s experience is exploited to extract the most relevant handcrafted features from the images using advanced image processing algorithms and training a machine learning model. This machine learning model serves as an automatic, computer-aided diagnostic (CAD) tool for classifying histopathology images, reducing the pathologist’s effort. However, the accuracy of the tool depends on the pathologist’s ability to identify relevant details in histopathology images.

In the modern era, the success of deep learning methods, particularly the Deep Convolutional Neural Networks (DCNN) model for image classification tasks, has proven to be a superior alternative to conventional image processing. DCNN’s ability to learn complex features from annotated datasets automatically makes it an obvious choice for the medical domain. However, training models from scratch requires significant computational resources, a large amount of data, and considerable time [3]. To address these issues, a new architecture is proposed using a pre-trained model with suitable training strategies. To summarize, the main objectives of this article are to propose a new architecture that addresses the issues associated with training DCNN models from scratch, to utilize pre-trained model with suitable training strategies, and to highlight the potential of these strategies in reducing the computational cost and improving accuracy.

Proposing a modified ResNet-34 with an anti-aliased filter to automate the detection of BC histopathology images into two and eight-class, regardless of the magnification images.

Applying a global contrast normalization to reduce the intra -contrast variation and a four-fold on-the-fly data augmentation to improve model generalization.

Proposing a three-step training strategy for effective utilization of Transfer Learning(TL) and Fine-Tuning(FT) in model training

Improving the proposed model’s convergence by using an Ada-hessian optimizer.

Exploiting the accumulation of gradient normalization technique to increase the batch size from 32 to 128 without adding additional GPU resources to reduce model loss during training.

The structure of this paper is as follows: Section 2 analyzes related works in the field, providing a comprehensive review of the existing literature. Section 3 describes the proposed methodology in detail, explaining the approach to address the challenges associated with breast cancer image classification. Section 4 presents the experiments conducted to evaluate the effectiveness of the proposed methodology, including the dataset used, experimental setup, and performance metrics. Section 5 includes the results obtained from the experiments and provides a detailed discussion. Section 6 compares the proposed model with state-of-the-art models. Finally, section 7 ablation studies, and section 8 concludes the paper by summarizing the key contributions of the proposed methodology, discussing its limitations, and outlining possible directions for future research.

2 Related works

This section discusses various classification architectures for BC histopathology images using the Break-His dataset on magnification dependent (MD) and independent (MI). MD refers to a model that will consider the magnification factor of histopathology images during usage. In contrast, MI refers to the model that will not consider the magnification factor of histopathology images while using the model. Most of the initial works are related to MD.

F.A. Spanhol et al. [4] extracted six types of features by using conventional texture descriptors, namely local binary patterns (LBP) [5], completed local binary patterns (CLBP) [6], local phase quantization (LPQ) [7] and so on and trained on four different machine learning algorithms, 1-nearest-neighbour (1-NN), quadratic linear analysis (QDA), SVM, and random forest (RF) classifiers for binary classification. The accuracies are in the range of 80% –85% for different magnification factors. Here models primarily belong to machine learning. N. Bayramoglu et al. [8] proposed a simple DCNN with three convolution layers with kernel sizes of 7x7, 5x5, and 3x3. The model reported average accuracies ranging from 82.1% to 84.63% across four different magnification factors. One of the reasons for low accuracy could have resulted in the DCCNs being unable to learn more intrinsic features from the dataset due to its sparsity. Similarly, F.A. Spanhol et al. [9], in the other work, used AlexNet [10], a comparatively dense deep learning-based architecture. Initially, patches of sizes 32x32 and 64x64 split from the images by applying a sliding window and random methods and trained the model at path level for binary classification. Subsequently, an image-level prediction was carried out by fusing the outcome of individual patches of the given image using sum, product, and max rules. It resulted in improving the classification accuracy by around 6% compared to their previous work. A.Nahid et al. [11] undertook a comparative study with three custom Deep Neural Network (CNN, LSTM, CNN+ LSTM) models using a support vector machine (SVM) and softmax as classification layers. Intrinsic features from histopathology images were extracted using k-means and mean-shift clustering algorithms to train the models. The highest accuracy of 91% was reported on the CNN at 200x magnification with mean-shift clustering and SVM for binary classification. However, this experiment could not explain the variation in performance for different magnifications from model to model.

Zhongyi et al. [12] implemented both binary and eight-class classification with the help of a class structure-based deep convolutional neural network (CSDCNN) with data augmentation. Accuracy of around 93.5±2.0 for eight-class and 96.5±2.5 for binary classification was reported. Bardou et al. [13] proposed a custom CNN from scratch with the help of five convolutional layers, two dense layers, and softmax. Then, the model was trained with data augmentation techniques for binary-class and eight-class classification. Afterward, to boost model accuracy, ten predictive outcomes of the model were ensembled. It results in accuracy vary from 96.15% to 98.33% for the binary case and 83.31% to 88.23% for the eight. Here model attained a decent accuracy in binary class. However, in eight-class, there is room for improvement. Sharma et al. [14] concluded that features learned by pre-trained CNN were better than the model trained on handcrafted features. Moreover, the model VGG-16 +SVM with data augmentation achieved an accuracy varying from 91.23% to 93.97% for eight-class.

Wang et al. [15] proposed hybrid model using CNN and a Gated Recurrent Unit Network(GRU) for binary classification and attained an accuracy of 86.21%. Similarly, Jia Liet al. [16] used pyramid gray co-occurrence matrix(PGLCM) and incremental generalized learning(IBL) concept to train the model for binary classification and reported an average accuracy of 90%. Seo et al. [17] address the binary in a different approach, where they proposed that the novel Primal-Dual Multi-Instance SVM intends to identify the area of abnormality that can predict malignancy in the given image. However, the accuracy is relatively low, ranging from 85.80% to 89.10%. Zhou et al. [18] applied a Resolution Adaptive Network with SVM to carry for two and eight classes. However, variations of accuracy among magnification factors are high. Similarly, Joshi et al. [19] used Xecption+custom classification layers for binary at a magnification factor 40x and reported 93.33% accuracy. Pandey et al. [20] implemented a two-stage classification approach using pre-trained Xception. In the first stage, the model identifies whether the input belongs to a binary category. In the second stage, classify into individual subcategories. The accuracies lie in 98.13% to 99% for the first stage and 91.03% to 94.69% for the later stage.

For the MI task, Shallu et al. [21] carried out a comparative study on three CNN architectures, VGG-16, VGG-19, and ResNet-50, with transfer learning on the pre-trained model and training the model from scratch to generate features. Subsequently, logistic regression is used as a classification layer to classify the features into benign and malignant categories. The pre-trained VGG-16 attained the best accuracy of 92.60% for binary classification. Dabeer et al. [22] trained custom CNN for binary case and achieved an accuracy of 93.45%.

In contradiction, Boumaraf et al. [23] proposed a block-wise fine-tuning strategy on pre-trained ResNet-18 for MI and MD. A global contrast normalization with 3-fold data augmentation techniques used for two and eight-class categories. The average accuracy for two and eight-class problem was 98.84% and 92.15%, respectively. However, the classification accuracy in eight-class was relatively low compared with the binary case. Moreover, Zhong et al. [24] observed that whenever the down-sampling of features occurs in CNN models leads to an aliasing effect that impended model stability and robustness. They proposed an anti-aliased filter called blur-pool to mitigate the aliasing effect.

Based on the above study, the existing models are less accurate for the eight-class than the binary. Hence, improving the model’s accuracy for eight-class is necessary. The models trained on transfer learning and fine-tuning are far better than those from scratch. On-the-fly data augmentation offers greater flexibility than pre-generate augmented data. Moreover, the model performance improves as its size increases. Further, introducing an anti-aliased filter with fixed weights at appropriate locations improves model robustness and stability, which leads to better performance. In addition, the choice of optimizer considerably reduces the model’s training. Hence, this article proposes a modified ResNet-34, slightly denser than ResNet-18 [23] with state-of-the-art training strategies.

3 A Modified ResNet-34 with Anti-aliased filter

This section discusses the proposed model architecture, briefly introducing the break-his dataset, pre-processing, and various data augmentation methods to improve the model performance. It also discusses different training strategies followed to reduce training time and enhance the model performance.

3.1 Proposed model

A modified ResNet-34 [25] model has been proposed to handle binary and eight-class classification tasks. The model incorporates stacks of residual blocks, each reducing the feature size by half and doubling the number of features compared to the previous block. Further, the Residual block comprises two types: Residual_Block-0 with a skip connection and Residual_Block-1 with down-sampling. The skip connection enables the easy flow of information within the residual block by adding the output of Residual_block-0 to the second subsequent through Relu. At the same time, Residual_block-1 with the down-sampling helps to match the output of the residual block with the input of the next residual block. Equation (1) shows the mathematical representation of the Residual_Block-0 with a skip connection: $y = F (x, W_{i}) + x$ (1) where x is the input to the block, y is the output of the block, $F (\cdot)$ is a sequence of convolutional layers, batch normalization layers, and ReLU activation functions parameterized by W_i, and the residual connection is represented by the addition operation. Similarly, in Fig. 1a, which depicts Residual_block-0 with the skip connection, it is evident that the feature dimensions remain unchanged within the block, suggesting that an anti-aliasing filter may not be necessary.

Fig. 1

Residual Blocks.

On the other hand, Equation (2) depicts the Residual_Block-1 with down-sampling.

$y = F (x, W_{i}) + D (x)$ (2) where $D (\cdot)$ is a down-sampling function. Typically, a 2D convolution (Conv2d) with a 3x3 kernel and a stride of 2 is applied to reduce the feature size in Residual Block-1. Additionally, a Conv2d operation with a 1x1 kernel and a stride of 2 is used for down-sampling. However, this reduction can introduce an aliasing effect [24], which may cause the loss of high-frequency information. Hence, to address the aliasing issue, a modification was made to the existing Conv2d layer by adjusting the stride from 2 to 1. Further, an additional layer called Anti-aliased Conv2D was introduced. This layer incorporates a Gaussian filter-based fixed-weight 3x3 or 1x1 kernel. Equation (3) represents the Gaussian filter.

$G (x, y) = \frac{1}{2 π σ^{2}} \cdot e^{- \frac{(x^{2} + y^{2})}{2 σ^{2}}}$ (3) where G (x, y): 2D Gaussian function at a point (x, y) in the plane. σ: Standard deviation. The anti-aliased conv2d layer utilizes a 3x3 Gaussian filter with standard deviation of 1. Equation (4) shows 3x3 filter values incorporated in the anti-aliased conv 2d layer.

$Anti aliased Filter = [\begin{matrix} 0.075 & 0.124 & 0.075 \\ 0.124 & 0.204 & 0.124 \\ 0.075 & 0.124 & 0.075 \end{matrix}]$ (4)

Figure 1b, highlighted in the stippled regions, indicate the modified Conv2D layers in Residual_block-1 and the Down-sample sections. Finally, Pre-trained ResNet-34 was modified in the Residual blocks as stated earlier, and existing Classification layers were altered to as per the task in hand, either 2 or 8.

3.2 Description of architecture

The basic architecture can be divided into three main components: Initial layers, a stack of residual blocks, and Classification layers. Figure 2 provides a detailed description of the proposed model, including the dimensions of input and output features at each stage.

Fig. 2

Proposed Model.

3.2.1 Initial layers

The initial layers include the input layer, 2D-convolutional layer, batch normalization, ReLU activation, and max-pooling.

3.2.1. Input layer

It accept input image of size 224x224x3 and applied to subsequent layer.

2D-convolutional layer. It applies a 2D convolution operation to the input image with a 7x7 kernel, a stride of 2, and padding of 3, resulting in an output feature map of size 112x112x64.

Batch Normalization. This normalization process ensures that the inputs to each layer have zero mean and unit variance, thus reducing the internal covariate shift [26].

ReLU. Rectified Linear Unit represented as Equation 5. $f (x) = {\begin{matrix} x, & if x > 0 \\ 0, & otherwise \end{matrix}$ (5) It introduces non-linearity to the input features by outputting the input directly if it is positive, and zero otherwise.

Max-pooling. Max Pooling is a down-sampling operation that reduces the spatial dimensions of feature maps while preserving dominant features. It selects the maximum value within rectangular pooling windows. Max Pooling provides translation invariance and reduces computational complexity in CNNs. In this case, a pooling operation with a 3x23 kernel size and a stride of 2 yields an output size of 56x56x64.

3.3 Residual blocks

Block-64. Block-64 consists of three instances of residual block-0. The output of each residual block-0 is concatenated with the previous layer and passed through a ReLU activation function. Since residual block-0 is utilized in this block, the input and output features have dimensions of 56x56x64.

Block-128. Block-128 comprises one Residual block-1 and four Residual block-0 instances. Residual block-1 reduces the feature size and doubles the number of features compared to the previous block. Down-sample aligns the output of the last block with the concatenated input. Residual block-0 is utilized to learn complex features. In Fig. 2, mentioning “3-times” in the Block-128 section indicates that these blocks are repeated three times. Similar logic can be applied to the rest of Block-256 and Block-512, producing 14x14x256 and 7x7x512, respectively. Subsequently, the output from Block-512 is fed into the classification layers, and the function of each layer is as follows:

3.4 Classification layers

The output from Block-512 is passed through the classification layers, each serving a specific function:

Adaptive Average Pooling. This operation is similar to 2D max-pooling, but instead of selecting the maximum value, it calculates the average value within the filter region 7x7. It reduces the output feature of Block-512 from a size of 7x7x512 to 1x1x512.

Fully connected layer (FC-256). The fully connected layer adjusts the learnable weights to perform the classification task effectively. It takes the output from adaptive average pooling and generates 256 feature vectors.

ReLU. This layer takes the input from the previous layer and replaces negative values with zeros.

Dropout. Dropout regularization is applied to the learning weights to prevent the co-adaptation effect [27]. In the proposed model, a dropout probability of 0.5 is used.

Fully connected layer (FC-2 or FC-8). Depending on the classification problem, a fully connected layer receives the output from Relu activation and produces either 2 or 8 feature vectors.

Log SoftMax layer. In a classification problem, softmax provides the probability scores of images belonging to specific categories. Applying the logarithm function over softmax helps penalize outliers in the dataset, thus improving model stability. Softmax maintains output values between 0 and 1, while the logarithm of softmax returns negative values. Mathematically, it is given by Equation (6). $f (x_{i}) = log (\frac{e^{x_{i}}}{\sum_{j = 1}^{n} e^{x_{j}}})$ (6) Here, x_i represents i^th element of n-dimensional vector x. The Negative log-likelihood loss (NLLLoss) function calculates the model’s error, which yields classification scores based on the classification type. The index corresponding to the highest score indicates the input image’s predicted category.

3.5 Dataset considered

Break-his dataset [4] is considered for training, testing, and evaluation of the proposed model. It consists of 7909 labeled surgical open biopsy (SOB) histopathological images of 24 patients in the benign and 58 patients in the malignant category. These categories are divided further into four sub-categories each. The former has Adenosis(A), Fibro adenoma (FA), Tubular adenoma (TA), and Phyllodes tumor (PT), as sub-categories. Similarly, the latter has Ductal carcinoma (DC), Lobular carcinoma (LC), Papillary carcinoma (PC), and Mucinous carcinoma (MC), as sub-categories. Besides images captured at 40x, the dataset also contains histopathology images that show the region of the tumor cells in 40x images magnified to 100x, 200x, and 400x. Hence, it results in a total of 2480 benign and 5429 malignant categories color images of Portable network format (PNG) with 700x460 pixels. Table 1 shows the summary of the dataset. The experiments involve binary and eight-class classification. In binary, the model intended to predict a given histopathology image as either benign(label=0) or malignant(label=1). Similarly, the model classifies the image into one of the sub-categories of benign and malignant for eight-class classification. Hence sub-categories are labeled from 0 to 7.

Table 1
Summary of Break-His dataset. [28]

Type Sub Type Patients Magnification Factor Total

40x 100x 200x 400x

Benign A 4 114 113 111 106 444

FA 10 253 260 264 237 1014

TA 3 109 121 108 115 453

PT 7 149 150 140 130 569

Total 24 625 644 623 588 2480

Malignant DC 38 864 903 896 788 3451

LC 5 156 170 163 137 626

PC 6 145 142 135 138 560

MC 9 205 222 196 169 792

Total 58 1370 1437 1390 1232 5429

Total 82 1995 2081 2013 1820 7909

3.6 Data pre-processing

The dataset provides H& E-stained histopathological images. It was observed that a small amount of colour variation from image to the image was quite evident due to the H& E staining process. In addition to that, the image acquisition process at different environmental conditions leads to possible intra and inter-contrast changes among the images. To alleviate the possible performance degradation due to such contrast changes, a global contrast normalization (GCN) was applied [29]. It brings uniformity to the contrast of the images. If the given image has a height of H, a width of W, and a channel count of 3 (RGB), then the GCN image is given by Equation (7). $\begin{matrix} X_{α, β, γ}^{GCN} = \frac{(X_{α, β, γ} - \bar{X})}{max (φ, \sqrt{\frac{1}{3 WH} \sum_{α = 1}^{H} \sum_{β = 1}^{W} \sum_{γ = 1}^{3} (X_{α, β, γ} - \bar{X})^{2}})} \end{matrix}$ (7) Where φ is a small constant 10^-8 to avoid divide by 0 condition, X_α,β,γ is the given image and $\bar{X}$ is the mean intensity of the image given by Equation (8). $\bar{X} = \frac{1}{3 WH} \sum_{α = 1}^{H} \sum_{β = 1}^{W} \sum_{γ = 1}^{3} X_{α, β, γ}$ (8)

Figure 3a & 3c shows that samples of the dataset with the magnification of 40x across each sub-category of benign and malignant images before applying GCN. Similarly, the corresponding images of benign and malignant sub-categories after applying GCN are shown in Fig. 3b & 3d. It is noticed that, variation of the contrast is reduced considerably after applying GCN.

Fig. 3

Sample histopathology images at magnification factor: 40x, before (data souce:BreakHis [28]) and after applying global contrast normalization (GCN) (a): Benign:sub-categories before GCN (b) Benign: sub-categories after GCN (c)Malignant:sub-categories before GCN (d) Malignant:sub-categories after GCN.

3.7 Data augmentation

An augmentation technique can improve the model’s performance, even with limited training data. This technique can be applied by enlarging the train data before training or augmenting different image transformations while training (on-the-fly). The former increases the data and the model training time and requires extra storage. On the other hand, the latter is performed during the training process itself. Instead of creating a separate augmented dataset, the original training data is randomly transformed or modified on-the-fly during each epoch. Hence, on-the-fly data augmentation provides flexibility in choosing different augmentation techniques, allowing the model to learn from a broader range of transformations. To take advantage of on-the-fly data augmentation, the Albumentations package [30] is used to generate three variants of image transformations: horizontal flipping, random rotation 90°, and vertical flipping. Additionally, to increase the randomness of the transformations, the “one_of” method available in the package is used. This method allows selecting one of the transformations with a predetermined probability. Setting the probability value to 0.75 will choose either one of the three transformations or the original image, eventually producing a four-fold on-the-fly data augmentation pipeline. Figure 4(a) shows the original image of the benign class, which has been resized to 224x224. The corresponding horizontal flip, random rotation, and vertical flip are depicted in Fig. 4(b), (c), and (d), respectively.

Fig. 4

Data Augmentation (a): Original Image (Class: Benign, Subclass: Adenosis, Mag: 100x) resized to 224x224 (b) Horizontal Flip (c)Random Rotate 90° (d) Vertical Flip.

3.8 Proposed training strategies

Although a large dataset can enhance the efficiency of deep learning models, obtaining a vast data in the medical domain is often challenging. To address this, a four-fold, On-the-fly data augmentation pipeline is employed to strengthen the model’s efficiency by providing adequate training data. The model contains batch normalization layers that require a large batch size to minimize training loss [31]. However, the large batch size requires a substantial amount of computational resources. To mitigate this, the “accumulation of gradient normalization” [32] technique is employed to increase the existing batch size from 32 to 128 without additional computational resources. Additionally, a dropout rate of 0.5 in the final layers and weight decay of 0.1 can help reduce model over-fitting.

For model training, a second-order optimizer, Ada-Hessian [33], with a Hessian power of 0.5 is considered. Ada-Hessian accounts for the gradient and curvature effects of the loss function during backpropagation when updating the model parameters. As a result, the model converges to a global minimum in fewer epochs. Since the proposed model uses a Log Softmax layer in the final layers, a negative log-likelihood loss (NLLLoss) function ensures a positive value from the loss function. As shown in Table 1, the dataset is imbalanced in binary and eight class categories, with the number of images in ductal carcinoma being more than three times that of other sub-categories. To prevent model bias against the majority classes, a weight vector inversely proportional to the number of images in each class is applied to the NLLLoss. Suppose x_i represents the number of images for the i^th category for an n classification problem, the weight vector can be represented by Equation (9).

$[\frac{\sum_{1}^{n} x_{i}}{x_{1}}, \frac{\sum_{1}^{n} x_{i}}{x_{2}}, . . . . . . ., \frac{\sum_{1}^{n} x_{i}}{x_{n}}]$ (9) By including a weight vector in the NLLLoss, more weight is assigned to fewer image classes, thereby addressing the class imbalance in the dataset.

In addition, a three-step training process and learning rate constraints on Block-512 can effectively utilize transfer learning and fine-tuning strategies to minimize training time while enhancing accuracy. The systematic approach of the three-step training is explained below.

3.8.1 Transfer learning

The proposed model employs the pre-trained weights of ResNet-34 up to Block-512 and randomly initializes all learnable parameters in the classification layers. Subsequently, all trainable parameters of the model up to Block-256 are frozen, and the learning rate for Block-512 is restricted to 10^-3. Through this process, the model’s classification layers gradually learn relevant features of Breast cancer histopathology images rather than abruptly updating weights during initial training. The model training starts with a learning rate (LR) of 10^-2 and decreases by half every ten epochs (50). The model’s performance is validated with the test dataset at the end of each epoch to assess the best model state and prevent overfitting. Finally, the best state of the proposed model, with the highest validation accuracy during transfer learning, is saved for further processing.

3.8.2 Fine-tuning-1

In fine-tuning-1, all trainable parameters up to Block-256 are frozen, and the learning rate restrictions on Block-512 are removed. The model is then initialized with previously saved model states and trained with an initial learning rate (LR) of 10^-3, which is reduced by half every ten epochs (50). Finally, the model’s best validation accuracy state is saved for the next training step.

3.8.3 Fine-tuning-2

In this step, all trainable parameters in the proposed model are unfrozen, and the model is initialized with previously saved model states. The model is trained with an initial learning rate (LR) of 10^-4, which is reduced by half every ten epochs (50). Finally, the model’s state corresponding to the best validation accuracy in this step is saved for model evaluation.

4 Experiments

This section briefly explains the experimental procedure of the proposed model’s training, validation, and evaluation. The various performance parameters used to assess the model are also discussed.

4.1 Model training, validation and evaluation

Figure 5 illustrates the training and validation process of the proposed model. To each image in the break-his dataset, a global contrast normalization is applied. The test dataset is created by selecting every fifth image from the break-his dataset, while the remaining images are used for the training dataset. As a result, the test dataset includes images from both the benign and malignant categories, sub-categories, and magnification factors in proportion to the training dataset. The dataset is divided into training and testing sets, with 80% and 20% of the images allocated to the respective sets. The final training and test datasets contain 6327 and 1582 images, respectively. Table 2 represents the distribution of training and testing dataset.

Fig. 5

Training and Validation of the Proposed model.

Table 2

Distribution of train and test for both classifications

Classification	Type	40x		100x		200x		400x		Total
		Train	Test	Train	Test	Train	Test	Train	Test	Train	Test
Binary	Benign	625	127	644	128	623	124	588	117	1984	496
	Malignant	1370	276	1437	286	1390	277	1232	247	4343	1086
Eight-Class	A	91	23	91	22	89	22	84	22	355	89
	FA	202	51	209	51	210	54	191	46	812	202
	TA	79	30	90	31	80	28	90	25	339	114
	PT	127	22	125	25	120	20	106	24	478	91
	DC	692	172	724	179	717	179	628	160	2761	690
	LC	123	33	137	33	130	33	111	26	501	125
	MC	163	42	177	45	157	39	136	33	663	159
	PC	116	29	113	29	109	26	110	28	448	112

Additionally, a separate augmentation pipeline is employed to train and validate the proposed architecture. The data augmentation pipeline resizes each image to 224x224 during training and uses a four-fold on-the-fly data augmentation to generate one of the proposed image transformations. The former ensures compatibility with the proposed model’s input size, while the latter addresses data scarcity. For the test dataset, the data augmentation pipeline resizes each image to 224x224 and is used for validation and evaluation. After training in each epoch, the model accuracy and loss are validated with the test dataset to avoid over-fitting. In the proposed model, the second fully connected (FC) layer in the classification layers has a variable number of neurons, depending on the classification task. For binary classification (benign or malignant), the number of neurons is set to two. In the case of eight-class classification, the model predicts the presence of one of the following classes: A, FA, TA, PT, DC, LC, MC, and PC. Accordingly, the number of neurons in the second FC layer is set to eight. The experiment was conducted using an Intel^® Core™i9-9900K CPU@3.60 Ghz×16 GB Nvidia Quadro RTX 5000, 32GB Ram, 1TB HDD, and Ubuntu 20.04. The model was implemented using the Pytorch package [34].

4.2 Model evaluation parameters

Performance of the classification task is measured using several metrics, including confusion matrix, accuracy, precision, recall, F1-measure, and Mathew’s correlation coefficient (MCC). The Table 3 summarizes the list of parameters used to evaluate the efficiency of the proposed model. For calculating the evaluation parameters, a machine learning library scikit-learn [35] package is used, and particularly the macro method applies to eight-class problem.

Table 3
Evaluation parameters used to assess the model performance

Parameter Mathematical Representation Brief description

Accuracy $\frac{t_{p} + t_{n}}{t_{p} + t_{n} + f_{p} + f_{n}}$ It is the ratio of correctly classified samples and the total number of samples.

Precision $\frac{t_{p}}{t_{p} + f_{p}}$ It measures the model’s ability to avoid false positive predictions.

Recall $\frac{t_{p}}{t_{p} + f_{n}}$ It measures the model’s ability to find all the relevant positive instances.

F1-Measure $2 * \frac{Precision * Recall}{Precision + Recall}$ It is a harmonic means of precision and recall. It is in the range [0,1].0-worst and 1- best performance.

Mathew’s Correlation Coefficient (MCC) $\frac{(t_{p} * t_{n}) - (f_{p} * f_{n})}{\sqrt{(t_{p} + f_{p}) (t_{p} + f_{n}) (t_{n} + f_{p}) (t_{n} + f_{n})}}$ A high score represents overall good performance.

Confusion Matrix It is a tabulated comparison between the actual and predicted outcomes of the model for each class It gives an overall snapshot of how good the model’s predictability is

Parameter	Mathematical Representation	Brief description
Accuracy	$\frac{t_{p} + t_{n}}{t_{p} + t_{n} + f_{p} + f_{n}}$	It is the ratio of correctly classified samples and the total number of samples.
Precision	$\frac{t_{p}}{t_{p} + f_{p}}$	It measures the model’s ability to avoid false positive predictions.
Recall	$\frac{t_{p}}{t_{p} + f_{n}}$	It measures the model’s ability to find all the relevant positive instances.
F1-Measure	$2 * \frac{Precision * Recall}{Precision + Recall}$	It is a harmonic means of precision and recall. It is in the range [0,1].0-worst and 1- best performance.
Mathew’s Correlation Coefficient (MCC)	$\frac{(t_{p} * t_{n}) - (f_{p} * f_{n})}{\sqrt{(t_{p} + f_{p}) (t_{p} + f_{n}) (t_{n} + f_{p}) (t_{n} + f_{n})}}$	A high score represents overall good performance.
Confusion Matrix		It is a tabulated comparison between the actual and predicted outcomes of the model for each class It gives an overall snapshot of how good the model’s predictability is

Note. t_p= True Positive, t_n= True Negative,f_p=False Positive and f_n =False Negative.

5 Results and discussion

This section evaluates the performance of the proposed model on two-class and eight-class magnification-independent breast cancer histopathology images. Initially, the model is trained using the training dataset for each epoch. At the end of each epoch, the model accuracy is compared to the validation accuracy to avoid over-fitting. The state of the model that achieves the best validation accuracy in each stage of the three-step training process is saved for initializing the model in the next step. This process ensures model avoids over-fitting.

Figures 7 show the model’s accuracy and loss for both classification problems. The regions separated by three vertical lines represent the model’s status in each epoch for training and validation in the three-step training process, namely transfer learning, fine-tuning-1, and fine-tuning-2. As the number of epochs increases, accuracy improves, and loss decreases in each training step. The vertical line intersecting with the training curve indicates the state that corresponds to the best validation accuracy in each of the three steps of training. For binary, the model took 35 epochs to reach accuracy from 71% to 91%, and loss decreased from 0.6 to 0.25 in the Transfer learning. Similarly, in Fine-tuning -1, for an additional 26 epochs, model accuracy went 94% while the loss was 0.18. In the case of the eight-class, the accuracy went from 40% to 86%, and the loss changed from 1.6 to 0.4 in merely 21 epochs in the first step. Similarly, the accuracy of 92% and the corresponding loss was 0.26 in Fine tuning-1 at another 49 epochs. This process indicates the proposed additional constraints’ learning rate effectiveness in the transfer learning stage. At the end of Fine-tuning-2, the models achieved an accuracy of 98.6% and 95.01%, respectively. Moreover, the 3-step training process took only 2hr:40 min for each classification.

Fig. 6

Model accuracy and loss for Binary classification.

Fig. 7

Model accuracy and loss for Eight-class classification.

Furthermore, the smoothness of the accuracy and loss curves indicates that the proposed anti-aliased filter ensures robustness and stability during training. The exponential decay of the loss curves demonstrate the effectiveness of Ada-Hessian in updating model parameters, leading to improved accuracy and decreased loss. Table 4 illustrates the computational complexity of the proposed model. Approximately 60% of the total parameters are updated during the first two training stages, accounting for two-thirds of the epochs. Consequently, the model consumes significantly less time during the transfer learning and fine-tuning-1 phase. Moreover, the model is approximately 154.5MB and requires 4.18 GB of flops. Additionally, it can process 784 and 791 frames per second, with a latency of 4.567ms and 4.432ms for both classes, respectively.

Table 4

Computational complexity Analysis

Model	Size (MB)	Type of training	Number of Parameters		FLOPS (GB)	Frames/Sec (FPS)	Latency (ms)
			Trainable	Total
Binary	154.51	Transfer Learning	13,246,210	21,416,514	4.18	784	4.567
		Fine tuning-1
		Fine tuning-2	21,416,514
Eight	154.52	Transfer Learning	13,247,752	21,418,056	4.18	791	4.432
		Fine tuning-1
		Fine tuning-2	21,418,056

After the model was trained, its performance on the test dataset was evaluated using binary and eight-class classification confusion matrices. The binary class confusion matrix provided insights into the model’s ability to distinguish between benign and malignant tumors. In contrast, the eight-class matrix assessed its capability to differentiate among various malignant and benign sub-categories. The confusion matrices for the two problems are shown in Fig. 8a and 8b. In the binary, the model made incorrect predictions for seven benign and eleven malignant images out of 1582 test images. However, in the eight-class classification, 79 images were misclassified. It indicates that the model performed better in the binary than in the eight-class. Figure 8c further demonstrates that, despite the superior performance of the binary classifier, the eight-class classifier exhibited good discriminative abilities in binary classification, where only two benign and three malignant cases were misclassified.

Fig. 8

Confusion Matrices for Binary and Eight class classification.

Figure 9 shows the category-wise totals versus the correctly predicted classifications, with B representing Benign and M representing Malignant. The misclassifications are fewer in the binary compared to the eight-class. Among the misclassified images, LC (28 out of 125), DC (20 out of 690), and PT (13 out of 89) have the highest number of incorrect classifications. Moreover, misclassifications were more prevalent among malignant sub-categories than benign sub-categories, with DC and LC being the most frequently misclassified sub-types of malignancy. Of the 79 misclassifications, 45 exclusively belonged to DC and LC. Nine PT images were misclassified as FA in the benign sub-categories.

Fig. 9

Total and Correctly Predicted for Binary and Eight-class.

The primary reason for the high misclassification rate of DC and LC is the presence of images from patient ID 13412 in both categories within the Break-His dataset. Specifically, 15 of the 26 misclassified DC images and 16 of the 19 misclassified LC images belong to the same patient ID. These particular images account for approximately 37% of the total misclassifications. Additionally, the model erroneously predicted eight images from another patient ID, 15570, as LC instead of DC.Further analysis revealed that only four of the 79 misclassified images did not appear in the top-2 predictions. It indicates that for 75 of the misclassified images, the correct classification was present within the top-2 predicted classes. A histogram, depicted in Fig. 10, illustrates the distribution of differences between the top-2 predicted classes and the actual types for the 79 misclassified images. Most images exhibited only a marginal difference, falling from 0.02 to 0.86.

Fig. 10

Top-2 Accuracy analysis for eight-class.

The proposed model for magnification independent (MI) classification was evaluated for two and eight-class problems and it have achieved an accuracy of 98.86% and 95.01% for binary and eight class, respectively. Table 5 shows the evaluation matrix of the proposed architecture of MI for two and eight class problems. The evaluation metrics, including accuracy, precision, recall, F-measure, and MCC, indicate the model performance is consistent in the respective classification. Similarly Figs. 12 represent the Precision and Recall curve (PR)for binary and eight classes, respectively. The low Area under the Curve (AUC-PR) score of 0.9375 for LC further indicates that the replication of images has a more significant impact on LC than DC. A high AUC-PR score suggests the proposed model’s robustness to a threshold.

Table 5

The model evaluation on MI for two and eight-class problem

Type	Accuracy (%)	Recall (%)	Precision (%)	F-1 (%)	MCC (%)
Binary	98.86	98.98	99.35	99.17	97.36
Eight Class	95.01	95.01	94.95	94.94	93.42

Fig. 11

Precision- Recall Curve for Binary

Fig. 12

Precision- Recall Curve for Eight-Class

Furthermore, the proposed model was evaluated for magnification dependent (MD) by treating each magnification factor (40x, 100x,200x and 400x) in the test dataset as an entity. Table 6 shows the evaluation matrix of MD for both classifications. In binary classification, the model accuracy varies from 98.08% to 99.26%, while in the eight-class classification, the highest accuracy is noted in the 40x magnification factor and the lowest accuracy in 200x. Overall, the model achieved an accuracy of 98.86% in binary classification and 95.01% in the eight-class classification.

Table 6

The model evaluation on MD for binary and eight-class

Mag Factor	Accuracy (%)	Recall (%)	Precision (%)	F-1 (%)	MCC (%)
Binary Class
40x	99.25	99.64	99.28	99.46	98.25
100x	98.79	98,59	99.64	99.11	97.21
200x	99.26	99.64	99.28	99.46	98.27
400x	98.08	97.98	99.18	99.17	95.64
Eight Class
40x	96.00	96.01	96.03	96.01	94.76
100x	95.63	95.63	95.68	95.62	94.26
200x	93.82	93.82	93.82	93.63	91.86
400x	94.50	94.51	94.48	94.41	92.72

6 Comparison with existing models

Table 7 depicts the model accuracy with recent works on MD. Comparing the performance with the other six latest counterparts [14 , 23], and [20], the proposed model gives better accuracy. An improvement of 12% to 3.4% in the binary classification of the proposed architecture is noticed when compared with [17, 18] and [19]. In addition, the proposed model’s accuracy, precision, recall, and F1-score are consistent with the finding of [23] and [20]. In the case of the Eight-class, a notable difference in accuracy with [14, 23] and [20] up by 3.1%. However, for 100x and 200x, the model accuracy is low compared to [18], but our results are consistent across different magnification factors indicating the model’s robustness for magnification.

Table 7
Comparing performance with MD on recent methods proposed in the literature

Author Model Design Details Class Mag factor Accuracy (%) Recall (%) Precision (%) MCC (%) F-Score (%)

Sharma et al. (2020) [14] Pre-trained VGG-16 +SVM with data augmentation performed better than the model trained on handcrafted features. Eight 40x 93.97 93.00 94.00 – 94.00

100x 92.92 91.00 92.00 – 91.00

200x 91.23 92.00 92.00 – 92.00

400x 91.79 91.00 92.00 – 91.00

Boumaraf et al. (2021) [23] Fine-tune block-wise on ResNet-18 by applying transfer learning. Global contrast normalization and three types of data augmentation allowed the model to achieve better accuracy. Binary 40x 99.25 99.26 99.63 98.29 99.44

100x 99.04 99.66 98.99 97.65 99.33

200x 99.00 99.65 98.94 97.62 99.29

400x 98.08 99.19 98.00 95.58 98.59

Eight 40x 94.49 94.78 93.81 92.83 94.15

100x 93.27 91.59 92.94 91.41 92.23

200x 91.29 88.28 91.18 88.95 89.47

400x 89.56 87.97 87.97 86.52 87.77

Seo et al. (2022) [17] Primal-Dual Multi-Instance SVM Binary 40x 87.90 91.60 90.20 – 91.60

100x 89.10 94.20 92.30 – 92.50

200x 88.90 93.10 89.00 – 91.80

400x 85.80 92.30 87.50 – 89.80

Zhou et al. (2022) [18] Resolution Adaptive Network+SVM Binary 40x 94.43 94.21 93.03 – 93.56

100x 98.31 97.80 98.12 – 98.02

200x 99.14 99.05 98.62 – 98.83

400x 93.35 92.76 92.27 – 92.45

Eight 40x 91.14 – – – –

100x 96.83 – – – –

200x 98.05 – – – –

400x 90.30 – – – –

Joshi et al. (2023) [19] Xecption+custom classification layers Binary 40x 93.33 91.63 92.20 – 91.91

Pandey et al. (2023) [20] Proposed Pretrained-Xception+SVM for Binary and Pretrained-Xception+FCNN for Eight Binary 40× 99.01 99.16 99.23 – 99.19

100× 99.43 99.38 99.01 – 99.19

200× 99.01 99.14 99.25 – 99.19

400× 98.13 98.25 98.02 – 98.13

Eight 40× 94.69 96.18 94.59 – 95.37

100× 93.75 95.21 93.72 – 94.52

200× 91.71 94.77 91.59 – 93.15

400× 91.03 93.50 91.35 – 92.40

Proposed Method A pre-trained anti-aliased ResNet-34 model is trained in three-step training using transfer learning and fine-tuning. Binary 40x 99.25 99.64 99.28 99.46 98.25

100x 98.79 98.59 99.64 99.11 97.21

200x 99.26 99.64 99.28 99.46 98.27

400x 98.08 97.98 99.18 99.17 95.64

Eight 40x 96.00 96.01 96.03 96.01 94.76

100x 95.63 95.63 95.68 95.62 94.26

200x 93.82 93.82 93.82 93.63 91.86

400x 94.50 94.51 94.48 94.41 92.72

Similarly, Table 8 depicts the model accuracy with recent works on MI. The proposed model’s accuracy is improved in binary by 7% with respective to [21] and [22]. Whereas with [23] about 3% in eight-class and marginal improvement for the binary case. Hence, the proposed model performed consistently on both MI and MD. Moreover, the model has achieved around 3% improvement in accuracy for eight-class problem.

Table 8

Comparing performance with MI on recent methods proposed in the literature

Author	Model Design Details	Class	Accuracy (%)	Recall (%)	Precision (%)	MCC (%)	F-Score (%)
Shallu et al. (2018) [21]	Pre-trained VGG-16+SVM	Binary	92.60	93.00	93.00		93.00
Dabeer et al. (2019) [22]	Custom CNN.	Binary	93.45	93.00	93.00		93.00
Boumaraf et al. (2021) [23]	Pre-trained ResNet-18 with transfer learning and augmentation.	Binary	98.42	99.01	98.75	96.19	98.88
		Eight	92.03	90.28	91.39	89.38	90.77
Proposed Model	A pre-trained anti-aliased ResNet-34 model is trained in three-step training using transfer learning and fine-tuning	Binary	98.86	98.98	99.35	99.17	97.36
		Eight	95.01	95.01	94.95	94.94	93.42

7 Ablation studies

To evaluate the effectiveness of different components and techniques in improving the performance of the ResNet-34 model, we conducted an ablation study using binary and eight-class classification for MI.

The baseline model, ResNet-34(Scratch), achieved an accuracy of 85.43% for the binary and 82.16% for the eight-class. Including the proposed four-fold on the -fly data augmentation improved the model performance, with the accuracy increasing to 86.89% for the binary and 82.76% for the eight-class. Further, the effectiveness of transfer learning by utilizing a pre-trained ResNet-34 model as the starting point improved accuracies of 91.27% and 85.86% for both categories, respectively. Instead of transfer learning, the 3-step training approach yielded even higher accuracies, reaching 92.46% and 86.12%, respectively. The combined impact of data augmentation and transfer learning with the ResNet-34(pre-trained) led to 94.04% and 88.94% accuracy. In continuity, instead of transfer learning, 3-step training raised the accuracies to 96.34% and 90.22%.

Similarly, similar steps with the inclusion of an Anti-aliased filter are shown in the Tables 10 indicated progressive improvement of accuracies. The final step results showed better accuracy, demonstrating the effectiveness of the proposed methodology.

Table 9
Analysis of MI for Binary on other Models

Model Variant Accuracy (%)

ResNet-34(Scratch) 85.43

ResNet-34(Scratch)+Data augmentation 86.89

ResNet-34(pre-trained) +Transfer Learning 91.27

ResNet-34(pre-trained) + proposed 3-step 92.46

ResNet-34(pre-trained) +Data augmentation + Transfer Learning 94.04

ResNet-34(pre-trained) +Data augmentation + proposed 3-step 96.34

ResNet-34(pre-trained) +Anti-aliased Filter + Transfer learning 92.09

ResNet-34(pre-trained) +Anti-aliased Filter + proposed 3-step 93.86

ResNet-34(pre-trained) +Anti-aliased Filter + Data augmentation + Transfer learning 97.91

ResNet-34(pre-trained) +Anti-aliased Filter + Data augmentation + proposed 3-step 98.86

Table 10

Analysis of MI for Eight on other Models

Model Variant	Accuracy (%)
ResNet-34(Scratch)	82.16
ResNet-34(Scratch)+Data augmentation	83.76
ResNet-34(pre-trained) +Transfer Learning	85.86
ResNet-34(pre-trained) + proposed 3-step	86.12
ResNet-34(pre-trained) +Data augmentation
+Transfer Learning	88.94
ResNet-34(pre-trained) +Data augmentation
+proposed 3-step	90.22
ResNet-34(pre-trained) +Anti-aliased Filter+
Transfer learning	89.88
ResNet-34(pre-trained) +Anti-aliased Filter+
proposed 3-step	92.64
ResNet-34(pre-trained) +Anti-aliased Filter+
Data augmentation + Transfer learning	94.32
ResNet-34(pre-trained) +Anti-aliased Filter+
Data augmentation + proposed 3-step	95.01

8 Conclusion

This paper presented a modified deep-learning architecture based on the pre-trained ResNet-34 for magnification-independent binary and eight-class classification of BC histopathology images. The architecture incorporated a fixed weights Gaussian filter to address the aliasing effect, which improved model stability and robustness. Furthermore, the additional learning rate constraint in the proposed three-step training allowed for smoother updating of learnable parameters, enhancing the model accuracy. Additionally, using a large batch size reduced the frequency of batch normalization updates, resulting in decreased training loss. Moreover, applying global contrast normalization reduced intra-contrast variation, and a four-fold, on-the-fly data augmentation ultimately enhanced the model’s generalization and accuracy. Notably, utilizing a Hessian power 0.5 in Ada-Hessian facilitated reaching the optimal solution in fewer epochs. This technique involved considering first and second-order derivatives during the computation of the loss in back-propagation. It was evident from the results that the proposed model achieved an accuracy of 98.86% for the binary class whereas 95.01% for eight-class classification on magnification independent task. Moreover, the model showed consistent performance on magnification dependent. Overall, the results obtained with the proposed model were better than the previous works in the literature using the same dataset, especially in eight-class classification. The proposed model’s strategy can be applied to other medical modalities and is well-suited for medical applications

However, the proposed model’s evaluation is limited to a specific dataset, which may restrict its generalizability to different datasets or medical modalities. While the model outperforms previous works on the same dataset, further validation on diverse datasets is necessary to enhance its reliability and applicability in various medical applications. Additionally, image repetition within the DC (ductal carcinoma) and LC (lobular carcinoma) sub-categories of the malignant class, specifically for patient ID 13412, adversely affects the accuracy of the eight-class classification. Addressing and mitigating the impact of this repetition is essential to improve the classification accuracy for these sub-categories.

To enhance the proposed model, future research can focus on treating the problem as a multi-label classification task. This approach would enable capturing the diverse characteristics present in BC histopathology images by treating each pathological feature as a separate label. Additionally, evaluating the model on diverse datasets and medical modalities would provide valuable insights into its generalizability and performance across different settings, enhancing its robustness in real-world scenarios.

Competing interests

The Authors declare that they have no competing interest.

Ethical approval and consent to participate

Not applicable.

References

Hyuna Sung , Jacques Ferlay , Rebecca Siegel

, Mathieu Laversanne , Isabelle Soerjomataram , Ahmedin Jemal and Freddie Bray , GlobalCancer Statistics: GLOBOCAN Estimates of Incidence andMortality Worldwide for 36 Cancers in 185 Countries, CA: ACancer Journal for Clinicians 71(3) (2021), 209–249.

Marc Macenko , Marc Niethammer , Marron

J.S.

, David Borland , John Woosley

, Xiaojun Guan , Charles Schmitt and Nancy

, A method for normalizing histology slides for quantitative analysis. Proceedings – 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2009, pages 1107–1110, 2009.

Nima Tajbakhsh , Jae Shin

, Suryakanth Gurudu

, R Todd Hurst , Christopher Kendall

, Michael Gotway

and Jianming Liang , Convolutional neural networks for medical image analysis: Fulltraining or fine tuning? IEEE Transactions on Medical Imaging 35(5) (2016), 1299–1312.

Spanhol Fabio

, Oliveira Luiz

, Caroline Petitjean and Laurent Heutte , A Dataset for Breast Cancer Histopathological ImageClassification, IEEE Transactions on Biomedical Engineering 63(7) (2016), 1455–1462.

Timo Ojala , Matti Pietikäinen and TopiMäenpää , Multiresolution gray-scale and rotationinvariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002), 971–987.

Zhenhua Guo , Lei Zhang and David Zhang , A completed modeling oflocal binary pattern operator for texture classification, IEEE Transactions on Image Processing 19(6) (2010), 1657–1663.

Janne Heikkila and Ville Ojansivu , Methods for local phase quantization in blur-insensitive image analysis. In 2009 International Workshop on Local and Non-Local Approximation in Image Processing, pages 104–111. IEEE, 2009.

Neslihan Bayramoglu , Juho Kannala and Janne Heikkila , Deep learning for magnification independent breast cancer histopathology image classification, Proceedings – International Conference on Pattern Recognition, 0:2440–2445, 2016.

Fabio Alexandre Spanhol , Luiz S. Oliveira , Caroline Petitjean and Laurent Heutte , Breast cancer histopathological image classification using Convolutional Neural Networks, Proceedings of the International Joint Conference on Neural Networks, 2016-October:2560–2567, oct 2016.

10.

Alex Krizhevsky , Ilya Sutskever and Geoffrey Hinton

, Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.

11.

Abdullah Al Nahid , Mohamad Ali Mehrabi and Yinan Kong , Histopathological breast cancer image classification by deep neural network techniques guided by local clustering, BioMed Research International, 2018.

12.

Zhongyi Han , Benzheng Wei , Yuanjie Zheng , Yilong Yin , Kejian Li and Shuo Li , Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model, Scientific Reports 7(1) (2017).

13.

Dalal Bardou , Kun Zhang and Sayed Mohammad Ahmad , Classification of Breast Cancer Based on Histology Images Using Convolutional Neural Networks, IEEE Access 6 (2018), 24680–24693.

14.

Shallu Sharma and Rajesh Mehra , Conventional machine learning and deep learning approach for multi-classification of breast cancer histopathology images–-a comparative insight, Journal of Digital Imaging 33 (2020), 632–654.

15.

Xiaomei Wang , Ijaz Ahmad , Danish Javeed , Syeda Armana Zaidi , Fahad Alotaibi

, Mohamed Ghoneim

, Yousef Ibrahim Daradkeh , Junaid Asghar and Elsayed Tag Eldin , Intelligent hybrid deeplearning model for breast cancer detection, Electronics 11(17) (2022).

16.

Jia Li , Jingwen Shi , Hexing Su and Le Gao , Breast Cancer Histopathological Image Recognition Based on Pyramid Gray Level Co-Occurrence Matrix and Incremental Broad Learning, Electronics (Switzerland) 11(15) (2022).

17.

Hoon Seo , Lodewijk Brand , Lucia Saldana Barco and Hua Wang , Scalingmulti-instance support vector machine to breast cancer detection onthe breakhis dataset, Bioinformatics 38(Supplement_1) (2022), i92–i100.

18.

Yiping Zhou , Can Zhang and Shaoshuai Gao , Breast cancerclassification from histopathological images using resolutionadaptive network, IEEE Access 10 (2022), 35977–35991.

19.

Joshi Shubhangi

, Bongale Anupkumar

, Olof Olsson

, Siddhaling Urolagin , Deepak Dharrao and Arunkumar Bongale , Enhanced pre-trainedxception model transfer learned for breast cancer detection, Computation 11(3) (2023), 59.

20.

Ankita Pandey and Arun Kumar , An integrated approach for breast cancer classification, Multimedia Tools and Applications (2023), 1–21.

21.

Shallu and Rajesh Mehra , Breast cancer histology imagesclassification: Training from scratch or transfer learning? ICT Express 4(4) (2018), 247–254.

22.

Sumaiya Dabeer , Maha Mohammed Khan and Saiful Islam , Cancerdiagnosis in histopathological image: Cnn based approach, Informatics in Medicine Unlocked 16 (2019), 100231.

23.

Said Boumaraf , Xiabi Liu , Zhongshu Zheng , Xiaohong Ma and Chokri Ferkous , A new transfer learning based approach to magnification dependent and independent classification of breast cancer in histopathological images, Biomedical Signal Processing and Control 63(2020) (2021), 102192.

24.

Richard Zhang , Making convolutional networks shiftinvariant again, 36th International Conference on Machine Learning, ICML 2019 (2019), 12712–12722.

25.

Kaiming He , Xiangyu Zhang , Shaoqing Ren and Jian Sun , Deep residual learning for image recognition, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016), 770–778.

26.

Sergey Ioffe and Christian Szegedy , Batch normalization:Accelerating deep network training by reducing internal covariate shift, 32nd International Conference on Machine Learning, ICML 2015 1 (2015), 448–456.

27.

Nitish Srivastava , Geoffrey Hinton , Alex Krizhevsky , Ilya Sutskever and Ruslan Salakhutdinov , Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15(56) (2014), 1929–1958.

28.

Spanhol Fabio

and Oliveira Luiz

Caroline Petitjean and Laurent Heutte, Breakhis dataset. http://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/, 2016. Accessed on 10/03/2023.

29.

Ian Goodfellow , David Warde-Farley , Mehdi Mirza , Aaron Courville and Yoshua Bengio , Maxout networks. In International conference on machine learning, pages 1319–1327. PMLR, 2013.

30.

Alexander Buslaev , Vladimir Iglovikov

, Eugene Khvedchenya , AlexParinov , Mikhail Druzhinin and Alexandr Kalinin

, Albumentations:Fast and flexible image augmentations, Information(Switzerland) 11(2) (2020), 1–20.

31.

Soham De and Samuel Smith

, Batch normalization biases residual blocks towards the identity function in deep networks, Advances in Neural Information Processing Systems, 2020-Decem(NeurIPS), 2020.

32.

Hermans Joeri

, Gerasimos Spanakis and Rico Möckel , Accumulated gradient normalization, Journal of Machine Learning Research 77 (2017), 439–454.

33.

Zhewei Yao , Amir Gholami , Sheng Shen , Mustafa Mustafa , Kurt Keutzer and Michael Mahoney , ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning, 35th AAAI Conference on Artificial Intelligence, AAAI 2021 12A (2021), 10665–10673.

34.

Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , LucaAntiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai and Soumith Chintala , PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, E Fox and R Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.

35.

Lars Buitinck , Gilles Louppe , Mathieu Blondel , Fabian Pedregosa , Andreas Mueller , Olivier Grisel , Vlad Niculae , Peter Prettenhofer , Alexandre Gramfort , Jaques Grobler , Robert Layton , Jake Vander Plas , Arnaud Joly , Brian Holt and Gaël Varoquaux , API design for machine learning software: experiences from the scikit-learn project. In ECML PKDDWorkshop: Languages for Data Mining and Machine Learning, pages 108–122, 2013.

Type	Sub Type	Patients	Magnification Factor				Total
			40x	100x	200x	400x
Benign	A	4	114	113	111	106	444
	FA	10	253	260	264	237	1014
	TA	3	109	121	108	115	453
	PT	7	149	150	140	130	569
	Total	24	625	644	623	588	2480
Malignant	DC	38	864	903	896	788	3451
	LC	5	156	170	163	137	626
	PC	6	145	142	135	138	560
	MC	9	205	222	196	169	792
	Total	58	1370	1437	1390	1232	5429
	Total	82	1995	2081	2013	1820	7909