Abstract
Breast cancer is one of the cancers with high morbidity and mortality in the world, which is a serious threat to the health of women. With the development of deep learning, the recognition about computer-aided diagnosis technology is getting higher and higher. And the traditional data feature extraction technology has been gradually replaced by the feature extraction technology based on convolutional neural network which helps to realize the automatic recognition and classification of pathological images. In this paper, a novel method based on deep learning and wavelet transform is proposed to classify the pathological images of breast cancer. Firstly, the image flip technique is used to expand the data set, then the two-level wavelet decomposition and reconfiguration technology is used to sharpen and enhance the pathological images. Secondly, the processed data set is divided into the training set and the test set according to 8:2 and 7:3, and the YOLOv8 network model is selected to perform the eight classification tasks of breast cancer pathological images. Finally, the classification accuracy of the proposed method is compared with the classification accuracy obtained by YOLOv8 for the original BreaKHis dataset, and it is found that the algorithm can improve the classification accuracy of images with different magnifications, which proves the effectiveness of combining two-level wavelet decomposition and reconfiguration with YOLOv8 network model.
Introduction
Breast cancer is a common malignant tumor that is often found in the breast tissue of female groups, and there is also a certain incidence in male groups. According to the latest information provided by the International Agency for Research on Cancer (IARC), GLOBOCAN2020 shows that female breast cancer has been surpassed by lung cancer to become the most common diagnosed cancer [1]. Breast cancer is a tumor that is formed by the abnormal proliferation of malignant cells in breast tissue, which usually originates from breast ducts and breast lobular cells [2]. Diagnostic methods for breast cancer include breast self-examination, breast ultrasound, mammography, breast magnetic resonance imaging (MRI), breast biopsy, and so on. Breast biopsy [3, 4] is widely regarded as the most reliable method for diagnosing breast cancer. In this diagnostic approach, cancer cells are detected by extracting samples of breast tissue for pathological examination. Therefore, accurately judging the presence of cancer and the type of cancer by observing the pathological images of breast tissue is of utmost importance in order for patients to receive effective and appropriate treatment [5, 6]. Traditional manual diagnosis requires doctors to have a solid theoretical basis and rich clinical experience. For the purpose of reducing the occurrence of misdiagnosis, it often requires several experts to collectively discuss and reach a consensus on the diagnosis result based on the pathological images of the patient. Therefore, this traditional approach is very time-consuming and labor-consuming.
In recent years, with the rapid development of digital image processing technology and computer vision technology, computer-aided diagnosis and treatment have become a research hotspot in the field of modern medical imaging [7–9]. The latest performance results of medical image classification tasks are constantly updated by the deep learning algorithm [10, 11]. In the initial stage, the application of deep learning models in the field of medical image classification focuses on unsupervised deep learning networks such as SAE, DBN, DBM, and so on. At present, the convolution neural network (CNN) has become the first representative technology in medical image diagnosis and classification. It can effectively extract image features, capture spatial relationships, process multi-scale information, and achieve high accuracy and generalization ability on large-scale data sets. CNN has been an important tool and a research hotspot in the field of medical image classification because of these advantages. The field of medical images has witnessed the emergence of numerous convolutional neural networks with the development of deep learning. In order to achieve better image classification results [12–14], selecting appropriate convolutional neural networks and combining them with image processing technology is crucial.
In response to this challenge and problem, this study proposes a network model combining wavelet and YOLOv8 [15–17]. In this method, the image is sharpened by wavelet transform, which can highlight the detailed features of the image and further improve the accuracy of image classification. The model is trained and tested with the breast cancer pathological image dataset BreakHis to verify the accuracy and robustness of the method.
Related work
In recent years, convolutional neural networks have been widely used in pathological image classification of breast cancer. Based on the depth residual network, the MuDeRN framework for multi-class classification of breast tissue pathological images was proposed by Ziba Gandomkar et al. [18], which consists of two stages. In the first stage, a residual network (ResNet) with a depth of 152 layers is trained to distinguish between benign and malignant images. In the second stage, the benign and malignant images are subdivided into eight subcategories, and the output images processed by ResNet are combined by a meta-decision tree to diagnose the patients. Because the traditional image classification system assumes that the images of all patients have the same tag as the patient, this is rarely verified in practice because the cost of tagging data is very high. P.J. Sudharshan et al. proposed a weakly supervised learning framework to study the correlation of multi-instance learning (MIL) in computer-aided diagnosis of breast cancer patients [19]. By comparing and analyzing different MIL methods (including APR, DISTING Density, MI-SVM, CITICATION-KNN, the nonparametric method, MIL-CNN, etc.), it is concluded that the nonparametric method has the best classification effect. A convolutional and recursive hybrid depth neural network for pathological image classification of breast cancer was proposed by Rui Yan et al. [20]. This method is based on the richer multi-layer feature representation of tissue pathological image blocks, combines the advantages of convolutional neural networks and recurrent neural networks, and achieves an average accuracy of 91.3% in the four classification tasks.
Inspired by the fact that the performance of the depth high-order statistical model in visual tasks is obviously better than the corresponding first-order statistical model, Cunqiao Hou et al. tried to explore the global depth high-order statistics to distinguish the pathological images of breast cancer. By integrating asymmetric convolution into the second-order network, a new second-order asymmetric convolution network (SOACNet) is proposed [21], which uses a series of asymmetric convolution blocks to replace each standard square kernel convolution layer in the backbone architecture and uses a global covariance combination to calculate the second-order statistics of depth features so as to make the representation of pathological images more robust. JiaLi et al. introduced the IBL model to the recognition of breast cancer pathological images for the first time and proposed a combined model that consists of a pyramid gray level co-occurrence matrix (PGLCM) feature extraction model and an incremental generalized learning (IBL) classification model [22]. Different from the deep neural network, the IBL model makes full use of its single hidden layer structure, which greatly reduces the time cost of training and testing. In order to make extensive use of the existing clinical data, Sushma Nagdeote et al. proposed a novel mathematical model for breast cancer (BRCA) prediction [23]. Considering the characteristics of breast cancer cells, the model was combined with the maximum likelihood method to improve the prediction accuracy, and the publicly accessible BRCA pathological image data set was used to test the proposed method. The results show that the proposed mathematical model, combined with different machine learning techniques, can adapt to different cancer types and imaging methods and achieve better performance.
Cheng Zhang et al. used a ten-layer convolution neural network (CNN) model called “ColorDeep” to extract the color features of different tissue parts of the cell and used the pure color image slices obtained by three-channel separation and reconfiguration as the model input. The model is tested on the BreaKHis data set, and the results show that the recognition accuracy of the image at the image level under the four magnifications is 96.89% and 99.67%, which is better than many advanced recognition methods [24].
Materials and methods
Motivation
The YOLOv8 network model has been trained on the COCO dataset on a large scale, and good classification results have been achieved. It can accurately classify people, animals, plants, household items, etc. To confirm the generalization ability of the YOLOv8 network model and solve the problem of automatic classification of breast cancer pathological images, this study decided to combine the YOLOv8 network model with image processing technology and used the BreaKHis data set to verify the effectiveness of the method. The two parts of 3.2 and 3.3 mainly introduce the framework of YOLOv8 network model and the principle of wavelet transform method for image processing in this study.
YOLOv8 network model architecture
YOLO is currently the most popular real-time object detector and can be widely accepted for the following reasons: (a) lightweight network architecture; (b) effective feature fusion methods; (c) more accurate detection results. According to the current use situation, the YOLOv5 and YOLOv7 algorithms are widely applied. YOLOv5 utilizes deep learning technology to achieve real-time and efficient target detection tasks, which mainly consist of a backbone network and a detection head [25]. YOLOv5 adopts a CSP (cross-stage partial) network structure, which effectively reduces the repeated calculation and improves the calculation efficiency. However, YOLOv5 has some shortcomings in small target detection, and the detection results of a large number of overlapping and dense targets need to be improved.
YOLOv7 [26] proposed a new training strategy called Trainable Free Bag (TBoF) to improve the performance of real-time target detectors. The TBoF method includes a series of trainable techniques, such as data expansion, mixing, and so on. Through applying TBoF to three different types of target detectors (SSD, RetinaNet, and YOLOv3), the accuracy and generalization ability of target detectors can be significantly improved. However, YOLOv7 often needs more computing resources and training time to perform better [27].
YOLOv8 was published in 2023, and its backbone is basically the same as YOLOv5. Figure 1 shows the architecture of the YOLOv8 network model, which replaces the C3 module with the C2f module based on the CSP idea. The C2f module draws lessons from the ELAN idea of YOLOv7 and combines C3 and ELAN to form the C2f module, which enables YOLOv8 to obtain more abundant gradient flow information while ensuring its own lightweight [15]. At the end of the backbone, the most popular SPPF module is still used to ensure the accuracy of object recognition at different scales. Figure 2 shows the specific structure of the SPPF module, which is composed of a CBS convolution layer and three maxpooling layers. It realizes feature fusion by connecting the feature graph without maximum pool processing with the feature map obtained after each additional maximum pool. In the neck part, the feature fusion method applied by YOLOv8 is still PAN-FPN, which strengthens the fusion and utilization of feature layer information at different scales. The authors of YOLOv8 used two up-sampling and multiple C2f modules and the final decoupled head structure to form the neck module. It combines confidence with regression to reach a new level of accuracy. YOLOv8 supports all versions of YOLO and allows for easy switching between different versions. Additionally, it is compatible with various hardware platforms (CPU-GPU), offering great flexibility [28].

YOLOv8 network framework diagram.

SPPF module structure diagram.
The wavelet transform is a mathematical transform method [29], which decomposes the signal into wavelet functions of different scales so that the local characteristics of the signal can be analyzed. Compared with the Fourier transform, the wavelet transform can better capture the instantaneous characteristics and non-stationarity of signals [30].
In this paper, the db4 wavelet function is employed for image sharpening enhancement. The db4 wavelet function is a kind of discrete wavelet function, which is the most commonly utilized wavelet function in the Daubechies wavelet function family, and it is also the first widely employed wavelet function [31]. With the characteristics of compressibility, multi-resolution analysis, and orthogonality, it is extensively applied in signal processing and image processing tasks, such as signal compression, denoising, and feature extraction.
Daubechies wavelet is a wavelet function constructed by Ingrid, it is abbreviated as dbN, N is the order of wavelet. The support region in the wavelet Ψ (t) and scale function Φ (t) is 2N–1, N is the vanishing moment of Ψ (t). Except for N = 1 (Harr wavelet), dbN has no symmetry. Except for N = 1 (Harr wavelet), dbN does not have an explicit expression, but the square modulus of the conversion function h is clear.
Letting
among which
The purpose of image enhancement in this paper is to improve the contrast of pathological images of breast cancer and highlight the edge and detail information of the images, which are reflected as high-frequency signals in the image frequency domain. Therefore, the second-level wavelet transform is used to achieve the sharpening and enhancement of pathological images of breast cancer in this paper. Figure 3 shows the decomposition process of an image through the two-level wavelet transform. Where LLj represents the low-frequency subband image, HLj is the high-frequency subband image in the horizontal direction, LHj is the high-frequency subband image in the vertical direction, and HHj is the high-frequency subband image in the diagonal direction. Following the direction of the arrows in the figure from left to right is the decomposition process, and vice versa from right to left is the reconstruction process.

Diagram of image wavelet decomposition process.
Datasets
The data set used in this paper is BreaKHis, a pathological image data set of breast cancer that was published by Fabio A. Spanhol et al. in 2016 [32]. The database consists of microbiopsy images of 82 patients with breast tumors, including 24 benign tumors and 58 malignant tumors. There are 7909 marked pathological images of breast tissue in BreaKHis, including 2480 images of benign tumors, including adenosis (abbreviated as A), fibroadenoma (abbreviated as F), phyllodes tumor (abbreviated as PT), and tubular adenoma (abbreviated as TA), and 5429 images of malignant tumors, including ductal carcinoma (abbreviated as DC), lobular carcinoma (abbreviated as LC), mucinous carcinoma (abbreviated as MC), and papillary carcinoma (abbreviated as PC). Each image in the data set has four different magnifications: 40×, 100×, 200×, 400×. The images have a pixel value of 700×460, and the mode is an RGB three-channel image. The distribution of the BreaKHis dataset is presented in Table 1.
Specific distribution of BreaKHis data sets
Specific distribution of BreaKHis data sets
In this paper, the images of different magnifications in the BreaKHis dataset were utilized to perform the multi-classification task on pathological images of breast cancer. The images consisted of 8 categories, including 4 types of benign tumors and 4 types of malignant tumors. The image samples of four benign tumors are shown in Fig. 4, and the image samples of four malignant tumors are displayed in Fig. 5.

Samples of benign tumors in the BreaKHis dataset, which show adenosis (a), fibroadenoma (b), phyllodes tumor (c), and tubular adenoma (d).

Samples of malignant tumors in the BreaKHis dataset, which show ductal carcinoma (a), lobular carcinoma (b), mucinous carcinoma (c), and papillary carcinoma (d).
Firstly, tidy up the images in the BreaKHis data set by grouping images of the same tumor type into folders named after the tumor class and creating a total of eight folders. To enhance the model’s performance and generalization, image flipping technology is employed to expand the dataset, and the impact of flipping can be observed in Fig. 6.
In this paper, the images are decomposed using two-level wavelet decomposition technology into low-frequency part and high-frequency detail parts. Then, the images in which the high-frequency information is enhanced are obtained by using the wavelet reconfiguration technique. After the two-level wavelet transform, the details of the image become clearer, which achieves the effect of image enhancement. The effect before and after the wavelet transform is shown in Fig. 7.

Effect before and after flipping the image, which show original image (a), flipped image (b), original image (c), flipped image (d).

Effect picture before and after image enhancement, which show original image (a), enhanced image (b), original image (c), enhanced image (d).
The original images, with a size of 700×460, are scaled down to 224×224 using image scaling technology in this paper. The processed data set is divided into the training set and the test set of this experiment according to the ratios of 8:2 and 7:3, respectively.
The device processor used in the experiment is an Intel (R) Core (TM) i7-9700 CPU @ 3.00 GHz, and the version is Windows 10 Professional Edition. The programming languages are Matlab, Python 3.9, PyTorch version 2.0.1 + cu118, and CUDA version 11.8.
Experimental results and discussion
The network model proposed in this paper, which combines two-level wavelet decomposition and reconstruction, shows higher classification accuracy when verified by YOLOv8.
Table 2 and Table 3 record the experimental results of the proposed algorithm. It can be found that in two experiments that divide data sets according to different proportions, the algorithm in this paper can effectively improve the classification accuracy of breast cancer pathological images by performing multiple classification tasks for images with magnifications of 40× and 400×. Among them, in the experiment of dividing the data set according to the ratio of 8:2, the classification accuracy of the eight categories of images with 400× magnification applied to the algorithm in this paper has improved. For images with a magnification of 40×, the results show that except for the accuracy of fibroadenoma, which decreased by 1%, adenosis and ductal carcinoma maintained their original high classification accuracy. Beyond that, the classification accuracy of the other six types of tumors improved greatly. Specifically, the classification accuracy rate increased by 12% in phyllodes tumors, 9% in tubular adenoma, 13% in lobular carcinoma, 11% in mucinous carcinoma, and 10% in papillary carcinoma. In the experiment that divided the dataset according to the 7:3 ratio, for images with the magnification of 40× and 400×, the accuracy of ductal carcinoma in 400× images decreased by 1%, some images maintained the original classification accuracy, and the classification accuracy of other images improved.
Experimental classification accuracy of dividing data sets according to 8:2
Experimental classification accuracy of dividing data sets according to 8:2
Experimental classification accuracy of dividing data sets according to 7:3
In summary, to enhance the model’s generalization ability and prevent over-fitting, this study expands the dataset using image flipping technology. It then applies two-level decomposition and reconstruction technology to enhance the edge detail information of the images. Based on the above results, the YOLOv8 network model is applied to perform the eight-classification task of breast cancer pathological images. The results show that compared with the classification accuracy of YOLOv8 on the original BreaKHis dataset, the suggested method achieves higher classification accuracy, which indicates the effectiveness of the proposed network model combined with two-level wavelet decomposition and reconstruction.
In future work, our plan is to enhance the classification accuracy of the YOLOv8 network model by adjusting its parameters. In addition, we aim to address the issue of low classification accuracy in lobular carcinoma. To achieve this, we plan to modify the sharpening effect of the image by adjusting the coefficients of the wavelet transform. This adjustment is expected to improve the classification accuracy, specifically for lobular carcinoma.
Footnotes
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (No. 42274173).
