Abstract
BACKGROUND:
The incidence rates of breast cancer in women community is progressively raising and the premature diagnosis is necessary to detect and cure the disease.
OBJECTIVE:
To develop a novel automated disuse detection framework to examine the Breast-Ultrasound-Images (BUI).
METHODS:
This scheme includes the following stages; (i) Image acquisition and resizing, (ii) Gaussian filter-based pre-processing, (iii) Handcrafted features extraction, (iv) Optimal feature selection with Mayfly Algorithm (MA), (v) Binary classification and validation. The dataset includes BUI extracted from 133 normal, 445 benign and 210 malignant cases. Each BUI is resized to 256×256×1 pixels and the resized BUIs are used to develop and test the new scheme. Handcrafted feature-based cancer detection is employed and the parameters, such as Entropies, Local-Binary-Pattern (LBP) and Hu moments are considered. To avoid the over-fitting problem, a feature reduction procedure is also implemented with MA and the reduced feature sub-set is used to train and validate the classifiers developed in this research.
RESULTS:
The experiments were performed to classify BUIs between (i) normal and benign, (ii) normal and malignant, and (iii) benign and malignant cases. The results show that classification accuracy of > 94%, precision of > 92%, sensitivity of > 92% and specificity of > 90% are achieved applying the developed new schemes or framework.
CONCLUSION:
In this work, a machine-learning scheme is employed to detect/classify the disease using BUI and achieves promising results. In future, we will test the feasibility of implementing deep-learning method to this framework to further improve detection accuracy.
Introduction
In the current era, the incidence rate of various infectious and acute diseases in humankind and appropriate diagnosis and treatment is the only option to reduce the impact of the disease [13, 35]. The medical imaging supported disease diagnosis is a suggested practice in hospitals and hence a number of medical imaging procedures are developed and implemented to detect the disease with better accuracy [18, 29]. Ultrasound Imaging Technique (UIT) is one of the approaches in which the necessary images are captured using sound wave. Compared to other imaging schemes, the UIT is considered to be safe and it can be considered to capture various information as discussed in [6].
Cancer is recognized as one of the harsh diseases in humans and premature detection is necessary to control its growth and spread [3, 33]. The statement by World Health Organization (WHO) confirms that cancer has a wider group and it can happen in any organ/tissue of body. This report also confirms that 9.6 million deaths happened globally in 2018 [34]. This is mainly due to the abnormal cell growth and the grown cells will spread to other parts of the body through the blood stream. Breast Cancer (BC) is one of the leading causes of death in women and the WHO report confirms that, in early 2020 about 2.3 million women were diagnosed with BC and 685000 reported deaths globally. Further this report also confirms that, at the end of year 2020, 7.8 million women living globally have BC confirmed within five years.
Cancer signs/symptoms rely upon various factors like cancer type, its location or the neighboring area where it has spread. For example, breast cancer can appear as a lump within the breast or some sort of discharge from nipple while breast cancer that is metastatic may project pain signals (if bones are also affected), excessive fatigue (linked with lungs), or seizures (in brain).
Because of its happening rate and harshness, a considerable number of diagnostic and treatment procedures are proposed and employed to detect the BC in its premature stage. The BC caused a large medical burden globally and to support early detection and treatment, a number of diagnosis procedures are developed and employed. Medical imaging supported BC detection is commonly adopted in hospitals and schemes, such as Magnetic-Resonance-Imaging (MRI), mammogram, thermal imaging, histopathology, elastography and UIT are widely employed in hospitals. Compared to other methods, the UIT is considered to be safe and helps to detect the location of the cancer during the needle biopsy procedure. The UTI or UTI combined with elastography is widely adopted in hospitals to diagnose the BC with better accuracy.
The most reported challenge by most of the breast cancer classification and segmentation mechanism is regarding the accurate segmentation and classification of ultrasound region, which is addressed in this research through exploiting the effective breast cancer segmentation method. Today, breast cancer portrays a frightening picture in every individual’s mind as it is considered to be intolerable, agonizing and fatal. In fact, this outlook has been overstated and amplified baselessly.
Breast Cancer (BC) generally begins with cells in the milk-producing ducts. It may also start in the glandular tissue called lobules or in other cells or tissue within the breast. An early detection and treatment will help to cure the disease.
In the literature, a number of BC diagnostic procedures are implemented with medical images of various modalities and in this work, the Breast Ultrasound Image (BUI) is considered for the investigation. Table 1 summarizes the earlier works employed on BUI supported cancer detection.
Summary of ultrasound supported breast cancer detection
Summary of ultrasound supported breast cancer detection
Table 1 presents the necessary information existing in the literature on the BC detection with the BUI and these works confirms its merit over other modalities. Compared to the non-invasive imaging approach, like thermal imaging, the recording and evaluation of the BC in BUI is quite simple and hence, it is commonly adopted in hospitals to detect the abnormalities. In this work, computerized classification of the BUI is presented using the benchmark images collected from [1]. The main limitations in the earlier work are the choice of the features which offered a lesser detection accuracy with the Machine-Learning Schemes (MLS). The proposed experimental investigation is implemented using the benchmark images contributed by Al-Dhabyan et al. [1]. In this database, every image is associated with the Ground-Truth and the collection, pre-processing and post processing of the images are clearly discussed. In this study, the a image resizing is implemented initially to reduce the test images into 256×256×1 pixels and the image augmentation (rotation with angle) is implemented to increase the number of test images (500 numbers) as depicted in Table 2. During the classification task, 70% of these images are considered for training the MLS and 30% of images are considered for the validation. The sample pictures of this database are presented in Fig. 1. During this investigation, a separate examination is employed for (i) Normal Vs Benign, (ii) Normal Vs Malignant and (iii) Benign Vs Malignant and the results are compared and verified using various binary classifiers available in the literature. The experimental outcome of the proposed scheme confirms that the developed MLS is efficient in classifying the considered BUI with better accuracy.
Sample test images considered in this research

Sample test pictures from chosen BUI database.
The other portions of this manuscript are arranged as: Portion 2 presents the context, Portion 3 illustrate the methodology, Portion 4 and 5 demonstrate results and conclusions, respectively.
This research work proposed a Machine Learning System (MLS) to detect the Breast Cancer (BC) and the proposed scheme is depicted in Fig. 2.

Developed scheme to examine the breast abnormality.
This technique consists the following phases; collecting the images of normal/abnormal breast section from the volunteers using the ultrasound imaging scheme, resizing the collected images to a recommended dimension (256×256×1), enhancing the images with a chosen procedure, extracting the vital features from the enhanced images, feature optimization and classification with dominant features. The most common classification methods are voxel-based or pixel-based breast cancer classification techniques. However, these approaches failed to consider the boundary information and global shape of breast cancer detection [5].
The performance of this scheme relies majorly on the feature mining and selection procedure. In this work, the test images are enhanced using the methods, such as Gaussian filter, LBP and saliency detection technique and the necessary features are then extracted from these images. The extracted features are then optimized with Mayfly-Algorithm (MA) and these features are then considered to validate the performance of the proposed scheme.
The gaps and issues identified from the optimization based methods are as follows: The detection rate was very less in [9], the performance was degraded with the increasing training samples and the computation resource required was also too high. In Vijayakumar et al. [21], the efficiency of the segmentation method was poor. It failed to attain better performance for the entire breast cancer region in Shi et al. [26] and the complexity and multiplicity of breast cancer made the process more tedious [8]. The memory load and computation overhead were increased in Dey et al. [2] but it failed to model the 3D network for enhancing the segmentation performance. The detection performance for breast cancer was less in Rajakumar et al. [21]. In case of white matter region, this model recognizes only the intensities of texture space, and faced a complex issue in computing the weight distance based on the breast cancer voxels [3]. Detecting and preventing breast cancer at the earliest, can be beneficial in minimizing the death rate and increasing the life span. Additionally, with early detection, the prognosis can be improvised, resulting in the best appropriate and effective cure.
Usually, Breast cancer formation can initiate from any region of the breast. The ductal cancers initiate from the ducts which carries milk to the nipple. The lobular cancers initiate in the glands that produce breast milk. Much other unfamiliar breast cancer also prevails [23]. There is also some tiny form of cancer that begins in some region of the breast tissues. These are referred to as sarcomas and lymphomas which are actually not considered as breast cancers. Though a lump in the breast can be formed because of any sort of cancer, it is not actually formed by all [1].
Various Breast cancers are identified on screening mammograms, which identify cancers at quiet an early stage, even prior to their feeling or any symptoms. Breast cancer has various other warning signs which one must not overlook and need to be conveyed to the doctor. One must also comprehend the fact that most of the breast lumps are benign rather than malignant. Breast tumors that are non-cancerous are a merely malformed growth that does not permeate outside the breast region and are non-hazardous [11].
Though there exist some benign breast lumps that can increase the probability of breast cancer in women. On experiencing any lump in the breast or any unusual change, the same must be immediately reported to the doctor to verify the existence of any benign or malignant cells and any risk involved in contracting cancer in the near future [24].
It is essential that the clinicians and oncologists identify particular risk parameters related to cancer disease for assisting in adjuvant therapy and thereafter surgical therapy considering overall elimination of affected region as and when required. Cancer diagnosis and pathological analysis of breast cancer affected regions are illustrated in experiments for providing effective and valuable survivorship advantage [22]. Hence it is necessary that patients undergoing cancer diagnoses must be given precise and overall diagnostic evaluation which will act as a foundation for the novel therapeutic strategies. Multiple scoring based diagnosing systems are examined. For complementing the diagnosis conducted by the medical experts, various researches are carried out for the growth and advancement of diagnosis systems which are presented in the subsequent section.
The performance of the automated disease detection scheme relies mainly on the features obtained from the images to be categorized. In Machine-Learning technique, a number of feature extraction methods are discussed in prior works and in this research, the necessary features are extracted from the images enhanced with GF, LBP and saliency. The necessary entropy features, such as Kapur, Tsallis, Shannon, Yager, Rényi and Max are extracted from the images of GF and Saliency. Along with these entropies, three Hu moments are extracted from every image. The LBP helps to get 59 features from each pattern and these images are then optimized using the MA. This section presents the overview of the image enhancement techniques implemented in this work.
Gaussian filter
Surface and edge improvement provide essential image information and the GF based picture enhancement is a proven technique. The work of Marr and Hildreth [15] authenticates that the GF implemented using varied scale (θ) improves the texture pattern in vertical/horizontal orientations. The expression for 2D Gaussian operator is defined in Eqn. (1);

GF patterns generated with various θ.
The GF helps to get 6 entropy values and three Hu moments from each image. The total features extracted from four patterns are presented in Equation. (2) and final value is shown in Equation (3);
The LBP based information extraction is recently considered in medical image classification tasks and the earlier work confirms that, this technique helps to get 1 × 1 ×59 features from each image.
In this work, the LBP with various chosen weights proposed in Gudigar et al. [5] is considered and the generated patterns for adopted weights of W = 1 to 4 is shown in Fig. 4. Other information can be found in [2].

LBP patterns generated using different weights.
The number of mined LBP features is presented in Equation (4);
Saliency (SAL) is associated with the enhanced section from an image and the saliency detection will help to identify the vital information present in the test image. The earlier work on salience is discussed in [11]. In this work, the saliency in BTU is computed to identify its abnormal section and the hot colour map outcome during this task is depicted in Fig. 5. This image helps to get the feature presented in Equation (5);

Saliency value of a sample test image.
After collecting the necessary features from the pictures, every feature is integrated to form the necessary feature vector and this feature vector is then considered for the feature reduction with MA. The overall features collected from the image enhancement task are depicted in Equation (6);
After performing the segmentation process, the next step to be performed is the feature extraction. Relevant and the essential features are extracted from the segmented ultrasound image. Here, the features from the ultrasound image are extracted using information theoretic measures. To extract the features, the original breast ultrasound image and the generated segments are employed such that the original image features along with the features assures better classification accuracy. The theoretic measures enhance the accuracy of classification by offering texture features from the classifier.
Feature optimization is the main task in every MLS to avoid the problem of over fitting in [33]. The traditional (Student’s t-test) assisted feature reduction procedure needs various mathematical operations to be employed and this work also governed by p-value. To minimize the complexity in the feature reduction/optimization task; heuristic algorithm-based techniques are widely employed in the literature and this work adopted the MA based feature reduction discussed by Rajakumar et al. [21].
The MA is a recently invented heuristic technique [7, 36], which helps to find the optimal solution based on the Cartesian-Distance (CD) existing among the search agents. The operation of MA is similar to the traditional Firefly Algorithm and it provides the optimal solution with lesser iterations. The chief merit of the MA is it consist the merits of Particle Swarm Optimization (PSO), Firefly Algorithm (FA) and Genetic Algorithm (GA) and hence it provides a better result.
During examination stage, all agents are permitted to connect close to the premium location (G
best
). Later, M is permitted to join at (G
best
) by altering its location and speed. This process is shown in Equations (7), (8) and (9);
The velocity update by;
The velocity and position update for the F is shown in Equations (9) and (10);
Figure 6 depicts the stages available in the MA and it’s working. Here, the optimization capability and the ability of feature extraction were joined together to perform effective breast cancer detection. The large convolutional kernels and the small convolutional kernels structure were combined to reduce the over fitting and to increase the nonlinear mapping such that the features with the multiformity rate was also increased. The features that were learned from the ultrasound image were fed to the classifier for implementing the optimization framework.

Flow chart of the MA based feature selection.
The feature selection operation is graphically presented in Fig. 7; and this scheme helps to reduce the feature vector to a lower level. In this scheme, every feature (Normal/Cancer) is individually compared with its related feature and the feature with better CD is selected and the worst CD is discarded. The selected features are then combined together to form a 1D feature vector, known as optimal features.

Feature reduction with MA by maximizing the CD.
By employing Machine learning a program can examine statistics, comprehend correlations and utilize the insight for resolving issues and improvise predictions/facts. Techniques of data mining and machine learning algorithms hold a very important stand in the medical domain. Health informatics has evolved and is growing at a fast pace dealing with computer /Information technology and Information technology in the field of health and clinical information. The approach of Clinical data mining suffers from some sort of misdiagnosis and uncertainty.
The initial parameters of these classifiers are tuned as per the discussion in the earlier works and in a few algorithms, the traditional tuning parameters existing are considered unchanged. The necessary information about these classifiers can be found in [9, 11].
The performance of developed BUI assessment scheme is confirmed by computing the necessary image quality parameters. Based on these values, other measures depicted in Equations (12) to (17) are calculated and these values confirm the performance of proposed scheme.
This section of the research presents the experimental outcome of the present study. This work is tested using a workstation equipped with MATLAB software and the images present in Table 2 are considered for the assessment.
Initially, every image is treated using the GF with a chosen angle (θ = 30 o , 120 o , 240 o and 300 o ) and the obtained images are recorded. After generating the GF patterns, the features are then extracted using chosen Entropies and Hu moments and are stored as the GF features. In the next phase, the test images are then improved using the LBP with weights, like W = 1 to 4 and the necessary LBP features (1 × 1 ×236) are then extracted and considered. Finally, the saliency of the images is generated and the Entropies as well as Hu moments are extracted.
Figure 8 presents the outcome achieved for various test images (Normal, Benign and Malignant class BUI). This image confirms that the texture pattern of the image various based on the enhancement methods and the features extracted from these images will provide the necessary insight about the information to be analysed. All these extracted features are serially combined to get a single vector of size 1 × 1 ×281; then a feature reduction is implemented to extract the vital features using the MA, which provides a reduced 1D feature vector of size 1 × 1 ×94.

Enhanced test image with various methods.
The feature optimized by the MA is then considered to train and validate the binary classifiers with a 5-fold cross validation. This comparison helps to get the necessary results and based on these values, the performance values are compared and validated. In this work binary classification is implemented to categorize the BUI into Normal Vs Benign, Normal Vs Malignant and Benign Vs malignant and the achieved results are presented in Table 3. This table initially computes the TP, FN, TN and FP values and from these values, other measures are derived in [8].
Performance values achieved during experimental analysis
In this research, the features of 350 images are considered to train the classifier system and the features of 150 images are used for the validation. For Normal Vs Benign case; the classification accuracy achieved with SVM-L is better compared to other methods. Individual comparison of the classifier performance is time consuming and hence, the overall performance of this technique is compared with the help of Glyph-Plot as depicted in Fig. 9(a). The pattern which covers the largest area is considered to be better and from this figure, it is confirmed that the overall performance of SVM-L is good (> 95%) compared to other binary classifiers considered in this research work.

Glyph-Plot constructed using the outcome of the proposed scheme.
Similar process is then executed for Normal Vs malignant and Benign Vs Malignant and the related results are depicted in Table 3 and Fig. 9(b) and (c) respectively. The outcome of the KNN is better (> 96%) compared to the alternatives in the first case and the result by RF is better (> 94%) in the second case. These results confirm the merit of the proposed scheme and this confirms that, the present approach can be considered to examine the breast abnormality using the BUI.
Figure 9 presents the comparison of the accuracy achieved with the proposed scheme for various classes of images and this confirms that the accuracy achieved with Normal Vs Malignant class is better compared to other comparisons. In most of the test images, the features extracted with the normal and benign class pictures are alike and hence the classification accuracy if less compared to the malignant class. Further, when comparing the malignant class with the benign class, getting better accuracy is reduced due to the alike GF, LBP and Saliency features; which look similar in most of the images. From Fig. 10, it can be confirmed that, the proposed scheme helps to get enhanced disease detection when we consider normal and malignant class images (with KNN classifier) and slightly lesser detection when we consider the benign class image with normal/malignant class images.

Comparison of achieved accuracy with proposed scheme with various classifiers.
In order to validate the merit of the proposed scheme, its classification accuracy is then compared with other BTI examination procedures existing in the literature as shown in Fig. 11. In the literature, similar data set is considered to detect the cancer using the methods, like Convolutional Neural Network (CNN), SVM [19] and Artificial Neural Network (Singh et al., 2015) and the accuracy is then compared with the proposed technique,

Comparison of accuracy in proposed and existing methods.
The implemented ensemble learning with CNN and achieved 94.62% accuracy. The work of Nugroho et al. [19] executed texture and geometry-based disease detection and the SVM classifier provided accuracy of 91.30%. The research by Singh et al. [15] considered texture and shape information and with ANN classifier, an accuracy of 84.60% is obtained. When comparing to other earlier works, the proposed scheme helps to get a better accuracy as shown in Fig. 11.
The outcome in this research confirms that the proposed scheme works well on the considered BTI database. This scheme considered the well-known features to get the better detection accuracy. In future, this work can be extended with; (i) Considering other texture and shape features to enhance the outcome, and (ii) Combining the machine-features with deep-features to improve the disease detection accuracy. In future, the performance of the proposed scheme can be tested and confirmed using the clinically collected BTI from hospitals.
This proposed research applies a machine-learning technique to detect the cancer in BUI with better accuracy. This work considered the test images from the benchmark BUI database and every test image of this dataset is resized to 256 × 256 × 1 and this image is then enhanced with texture enhancement techniques. Every image is treated using GF, LBP and Saliency techniques and from each image, the necessary features, such as entropies, Hu and LBP are then extracted. To avoid the over fitting problem, a MA based feature reduction is then employed, which reduced the actual feature 1 × 1 ×281 to 1 × 1 ×94. This scheme employed binary classification to detect the disease in Normal Vs Benign, Normal Vs Malignant and Benign Vs Malignant classes and this technique helped to get a classification accuracy of > 94%. This research focuses on the Mayfly optimization based on the machine learning technology, this can be further extended to use some other optimization to achieve effective performance is severity level breast cancer classification. Also, this work can be extended to minimize the computational overhead.
Conflict of interest
The authors declare that they have no conflict of interest.
