Abstract
Early diagnosis of breast cancer plays an important role in improving survival rate. Physiological changes of breast tissue can be observed and measured through medical electrical impedance, and the results can be used as a preliminary diagnosis by doctors before treatment. In this paper, quantum genetic algorithm (QGA) and support vector machine (SVM) were combined to classify breast tissues to help clinicians in diagnosis. The algorithm uses QGA to optimize the parameters of SVM and improve the classification performance of SVM. In this experiment, the electrical impedance data measured from breast tissue provided by UCI [58] was used as the data set. Objectively speaking, the data volume of the data set is small and the representativeness is not strong enough. However, the experimental results show that QGA-SVM shows better classification performance, and it is better than SVM.
Introduction
According to the statistics of the past decade, the incidence of breast cancer (BC) ranks first in the world. For breast cancer patients, early detection, early diagnosis and treatment, and comprehensive adjuvant therapy after surgery are the keys to prolonging survival [1, 2]. Mammography is the most common way to screen and detect breast cancer. But its sensitivity to detecting the risk of cancer development decreases with increasing breast density, mainly because increased density tends to mask lesions. In addition to mammography, magnetic resonance imaging (MRI) can also be performed. Fine needle aspiration biopsy (FNAB) is a common technique, which is also used as a method for the investigation and diagnosis of breast cancer. Although these methods have made progress in screening for lesions, they have also been painful and uncomfortable for patients, and mammography can even damage breast tissue [3–5]. In contrast, interdisciplinary computer-aided diagnosis makes diagnosis more efficient and more acceptable. In 2000, Silva et al. proposed the use of electrical impedance spectroscopy (EIS) to classify breast tissue using linear discriminant analysis, and designed a three-stage layered approach. The maximum overall classification accuracy was 92%, and the cancer tissue identification rate was greater than 86%. Narumol et al. [6]. proposed a breast tissue classification algorithm based on Bootstrap Aggregation. Bootstrap Aggregation is to sample the data, and then use the random replacement method to create a classifier for data randomization. This means that existing data remain the same after randomization, rather than being reduced. They conducted multiple cross validation on the samples, and finally obtained an accuracy rate of 74.47%. In 2019, Yoke et al. [7] proposed a new perspective, using EIS to classify wounds in breast tissue. Breast wounds can be observed and distinguished by EIS. Under different conditions, wounds have different patterns and levels. The experiment used learning vector quantization (LVQ) to classify breast tissue wounds. In order to obtain better results, a genetic algorithm (GA) is used to optimize the LVQ weight value. The experiment compared the classification of breast tissue wounds by LVQ and GA-LVQ. The final maximum classification accuracy of GA-LVQ is 73%, which is better than LVQ. Toukir et al. [8] used five ensemble-based machine learning (ML) algorithms, namely Random Forest (RF), Extreme Random Tree (ERT), Decision Tree (DT), Gradient Boosting Tree (GBT), and Adaptive Boosting (ADB) algorithms, to classify breast tissue. The results show that the three bagging integrated ML algorithms, namely RF ERT and DT, have better classification accuracy than the two boosting algorithms GBT and ADB, and the maximum classification accuracy reaches 86%. Pranav et al. [9] analyzed the EIS dataset using four different ML algorithms, SVM, DT, RF, and modified random forests (MRF). MRF has an accurate mean of 99%. A test size of 15% works best. Most computer-aided diagnosis systems traditionally use manual feature extraction methods. In view of the inefficiency and time-consuming of this method, DM Vo et al. [10] proposed a method to extract the most useful visual features from multi-scale training images for breast cancer classification with an ensemble of trained DCNNs. To maximize the classification performance, they combined DCNN and boosting tree classifier to improve the classification performance of DCNN classifier. The challenging database and breakhis dataset of bioimaging 2015 breast histology classification challenge were used to test the effectiveness of the method. The results show that these deep learning models can extract better features compared to handcrafted feature extraction methods, with classification accuracy up to 96%. Deniz et al.[11] used transfer learning and deep feature extraction methods to enable pre-trained CNN models to help classify breast cancer. They perform feature extraction on the BreakHis dataset followed by transfer learning. Feature extraction using pretrained neural network structures for classification. The network structure they used for pre-training was a modified AlexNet, which removed the last three layers of the network and added new layers. Finally, the classification is done using SVM. The results show that transfer learning produces better results compared to deep feature extraction and SVM, with a classification accuracy of 93.57%.
Most of the above experiments used the Breast Tissue sample set of UCI database in the United States, including 106 breast tissue samples. Because the sample size of the data set is not rich enough and does not meet the requirements of some algorithms for the data set[12–14], some algorithms are not highly sensitive to the characteristics of breast tissue, resulting in low classification accuracy [15–18]. The data set used in this experiment is small, and the simple linear model can be better than the depth network model. In order to better, faster and more accurately analyze the pathological state from the detected data [19–22], this paper proposes QGA-SVM. QGA combines the advantages of quantum with GA, introduces quantum coding and quantum revolving gate, increases the possibility of genome change, and makes QGA have more diversified offspring than traditional GA. Moreover, QGA has higher convergence speed and stronger search ability, and the optimization effect has been realized in some fields. The radial basis function of SVM is used as kernel function, which reduces computation and saves storage space. In this paper, the algorithm is used in the classification of breast tissue, and the convergence and accuracy of the algorithm are tested from EIS.
Related work
When breast tissue forms a tumor or develops cancer, it releases an vascular factor that stimulates the tumor to produce numerous nutrient blood vessels, often spreading at the tumor margins or inserting into surrounding tumors [23–25]. As a result, the tumor and its surrounding tissue are rich in new blood vessels, speeding up blood flow and increasing blood supply. Due to the specificity of the blood impedance of the human body, the impedance of the tumor and surrounding tissues will change significantly [26–28].
Medical electrical impedance technology is a non-injury detection technology which can extract biomedical information related to human pathological condition by using the electrical characteristics and change rules of biological tissues and organs. It usually detects objects by means of an electrode system placed on the body surface to apply a small AC measurement signal, and calculates the electrical impedance and its corresponding changes through the obtained test signal to obtain relevant physiological and pathological information [29–31].
The basic structural unit of the human body is the cell. A cell is surrounded by a membrane, a semi-permeable membrane with a special structure and function, called a cell membrane [32–34]. It allows the selective passage of certain substances, but strictly maintains the stability of the cellular material composition. It separates the cell contents from the surrounding environment of the cell, and enables the cell to selectively exchange substances with the surrounding environment through the cell membrane to maintain life activities [35–38]. The cell membrane is not only a barrier between the cell and its environment, but also a gateway for the cell to receive influence from the outside world and other cells. The cell membrane is also closely related to the physiological and pathological processes of immune function and cell division, differentiation and canceration [39–41].
Due to the presence of the cell membrane, the internal and external liquid of the cell can be regarded as an electrolyte, and the liquid between the membrane and the membrane can be regarded as a capacitance. A single cell can be equivalent to the circuit model shown in Fig. 1, where R e is the resistance of the extracellular fluid, C e is the parallel capacitance of the extracellular fluid; R m is the resistance of the cell membrane, C m is the parallel capacitance of the cell membrane; R i is the resistance of the intracellular fluid, C i is the parallel capacitance of the intracellular fluid. In the range below 1 MHz (low frequency), the cell membrane resistance R m is very large, which can be regarded as an open circuit, while the parallel capacitances C i and C e of the inner and outer liquids are small and can also be regarded as an open circuit. The simplified equivalent circuit model shown in Fig. 2, this simplified model is also called the parallel equivalent circuit model. For the biological tissue as a whole, we can assume that it is composed of many cells. Therefore, the circuit model of the biological tissue can be equivalent to the circuit shown in Fig. 2. At this time, R i , R e , and C m represent the components of the entire biological tissue. This is the so-called three-element bio-impedance model with internal and external liquid resistance and membrane capacitance[42, 43].

Circuit model.

Simplified model.
Medical electrical impedance technology can extract the electrical impedance characteristic information related to the functional changes of tissues and organs at the cell level, so as to identify the physical and pathological events at the cell level, and provide early disease reports or prediction reports before the structural changes of biological tissues [44–46].
QGA is a new evolutionary algorithm using quantum logic gates for chromosome evolution. It aims to overcome the limitations of slow convergence and easy to fall into local extremum caused by improper selection, crossover and mutation in classical genetic algorithm. Based on the quantum state vector representation, the algorithm applies the probability amplitude of quantum bits to the encoding of chromosomes, allowing a single chromosome to represent the overlap of multiple states, in order to achieve the goal of optimizing the solution [47, 48]. QGA is a new optimization algorithm which combines the operation mode of classical quantum computing and traditional genetic algorithm. Compared with traditional genetic algorithm, QGA has the characteristics of good population diversity, strong global search ability and fast convergence speed [49–51].
Quantum coding and quantum revolving gate are the most important ones in QGA [52, 53]. Quantum coding is the representation of chromosomes as quantum state vectors. It allows one chromosome to be expressed in a cluster of multiple states, which increases the diversity and richness of the population and enables the algorithm to find the best in a smaller population. The introduction of quantum gates ensures the updating of the population and enables fast convergence of the algorithm.
QGA is encoded by quantum coding, in which two concepts of qubit and quantum superposition state are given. In quantum computing, qubits are the smallest unit of information storage [54, 55]. The superposition state of the single qubit of |0〉 and |1〉 is used to represent genetic information. A qubit can be represented by the following three states, namely the |0〉 state that represents the spin up, and the spin down represents the spin downward |1〉 state and any superposition state between |0〉 and |1〉. The state of a qubit can be described as follows:
A population of size n is described by m qubits, which can be expressed as:
In the quantum genetic algorithm, qubits can be changed through quantum gates. The introduction of quantum gates not only provides the possibility for the development and exploration of the algorithm, but also makes the algorithm converge. Quantum gate is an operation mechanism for completing evolution. Commonly used quantum gates include NOT gate, controlled NOT gate, revolving gate and Hadamard gate [56, 57]. Different quantum gates can be selected according to different specific problems.
The quantum gate is a 2 × 2 order invertible matrix
Among them: (α i , β i ) and (α′ i , β′ i ) are the probability amplitudes before and after the update of the i-th qubit revolving gate of the chromosome, θ i is the size of the rotation angle, which is used to determine the convergence speed.
Support vector machine is a method of identifying binary classification of data based on supervised learning in thefield of artificial intelligence machine learning [59, 60]. Its core is to transform the linear indivisibility problem into a linear indivisible problem, and divide the linearity through nonlinear mapping. When it is applied to practical problems, its recognition accuracy depends to a large extent on the choice of parameters. The parameters (C, σ) determine the performance of the SVM function. Among them, the parameter C is the tolerance to errors, and the nuclear parameter σ affects the complexity of the subspace distribution of the sample data. If σ is selected incorrectly, over-fitting or under-fitting will occur [61]. The phenomenon offitting, these two parameters can affect the speed of prediction and training and then affect the classification effect of the classifier. Therefore, this algorithm uses quantum genetic algorithm for parameter optimization, which greatly improves the classification effect of the algorithm. The algorithm uses the kernel function to carry out the linear mapping process from low dimension to high dimension, and does not need to know the explicit expression of nonlinear mapping in the operation process. Therefore, this classification method will not increase the amount of calculation, but also avoid the complex calculation with the increase of dimension.
The multi-parameter machine learning model selected in this paper is nonlinear SVM, and the optimization problem of the model is:
Through the Lagrange multiplier method and transformed into a dual problem, the optimization problem is transformed into:
QGA-SVM starts with optimizing parameters to improve the classification performance of SVM, selects QGA to find the optimal parameters, and then uses SVM to classify the data. This combination can effectively prevent falling into local optimum and improve the classification accuracy.
The specific process of the algorithm is as follows:
Input: electrical impedance spectral characteristic data.

Algorithm flow chart of QGA-SVM.
Set the basic parameters of the algorithm, including the number of groups N, the maximum number of iterations T, the parameters C and σ to be optimized, as well as their value intervals [Cmin, Cmax] and [σmin, σmax], and the quantum rotation angle.
Initialize the quantum form of the population, and set the quantum population as
Each individual in the initial population is measured. The binary representation of chromosomes was converted to decimal and set between [Cmin, Cmax] and [σmin, σmax]. The fitness function is
Determine whether the termination conditions are met. If so, the operation is terminated. Otherwise, proceed to the next step.
The individuals in population Q (t) were measured to generate binary solution set, and the fitness of each determined solution was evaluated.
Variation adjustment for individuals using total interference crossover and quantum turnstiles. Firstly, the fitness function value list after crossing is obtained, then the rotation angle of each qubit is initialized, and then the rotation angle of each qubit is calculated. According to the rotation angle of each qubit, a new quantum angle list of the population is generated to obtain a new population Q (t + 1). The corresponding binary solution set is obtained by measuring each individual in Q (t + 1), Then, based on the fitness evaluation of each determined solution, the optimal individual and the corresponding fitness value are recorded.
Increase the number of iterations by 1 and return to step 5.
The algorithm flow chart of QGA-SVM is shown in Fig. 3.
Regardless of the number of samples in modeling and prediction, in this experiment, the test samples are selected manually and randomly in equal proportion in six categories, so the experimental results have strong universality. Run the program, the maximum number of iterations is set to 50, and finally the average of the 10 optimal accuracy rates in each case is used as the final classification result.
Dataset
The experiment used a collection of Breast Tissue samples from the UCI database[58] in the United States. The data set recorded 120 spectra from breast tissue samples of 64 patients undergoing breast surgery, and each spectrum included 12 impedance measurements at different frequencies from 488 Hz to 1 MHz. 14 spectra were discarded due to abnormalities. Impedance measurements were made at the frequencies: 15.625, 31.25, 62.5, 125, 250, 500, 1000 KHz.Impedance measurements of freshly excised breast tissue were made at the follwoing frequencies: 15.625, 31.25, 62.5, 125, 250, 500, 1000 KHz. These measurements plotted in the plane constitute the impedance spectrum from where the breast tissue features are computed. Among the remaining 106 cases, the normal tissue category included 14 connective tissue, 22 adipose tissue, 16 glandular tissue, and the pathological tissue category included 21 carcinoma tissue, 15 fibroadenoma, and 18 breast diseases.The dataset has a total of 9 attribute features, as shown in Table 1.
Attribute features
Attribute features
In the experiment, the number of training samples was set to 20, 40, 60, 80 and 100, and the remaining samples were taken as the test sample set. The three algorithms were respectively run to compare the effects.
QGA-SVM, PCA-SVM and SVM were used to classify breast cancer tissues, breast diseases and fibroadenomas. The accuracy results obtained from the experiment are shown in Fig. 4, Fig. 5 and Fig. 6 respectively. The performance comparison of the three algorithms in terms of running time, time complexity, bias trade-off, parameterization, etc. is shown in Table 2.

Comparison of breast cancer classification results.

Comparison of classification results of breast diseases.

Comparison of classification results of fibroadenoma.
Performance comparison table of three classification algorithms
The line graphs in Fig. 4, Fig. 5 and Fig. 6 show the trend of the accuracy of QGA-SVM, PCA-SVM and SVM for the classification of sample data with the change of the number of samples. From the figures the following conclusions can be drawn: The classification accuracy of the three algorithms increases significantly with the increase of the number of training samples, which indicates that the more training samples we have, the higher the accuracy of the learning model. If the number of samples is large enough, the classification algorithm can predict the class with very high accuracy. Overall, QGA-SVM has faster convergence speed and higher classification accuracy than the other two algorithms even when the number of training samples is small. Although it can be seen from the above figure that QGA-SVM improves the accuracy of data classification, it increases the operation time while improving the accuracy. When the number of samples is constant, there is little influence. However, when the number of samples is large enough, the time consumed by QGA-SVM will have a certain gap compared with the algorithm with lower classification accuracy. Therefore, when selecting an algorithm, select an appropriate algorithm according to the requirements.
Table 2 compares the performance of three classification algorithms in four aspects: Although QGA-SVM has excellent performance in classification accuracy, its running time and time complexity are not superior among the three algorithms. SVM is outstanding in these two aspects. The bias error comes from the fact that the model is biased towards a specific solution or hypothesis, and QGA-SVM does not show significant bias in the classification process. The three algorithms are nonparametric models, that is, the parametric model means that the number of parameters of the model is fixed, while the number of parameters of the nonparametric model increases with the increase of data.
Comparison of classification accuracy of LDA, QGA-SVM and SVM for breast tissue
Comparison of classification accuracy of LDA, QGA-SVM and SVM for breast tissue
From table 3, it can be concluded that the accuracy of QGA-SVM for breast cancer classification can reach 100%, and the accuracy of adipose tissue classification can also reach more than 99%. Meanwhile, it can be seen from Table 3 that: Compared with LDA and SVM algorithms, QGA-SVM improved the overall classification accuracy, but the classification accuracy of other tissues except breast cancer and adipose tissue was still less than 90%. When the number of training samples was 80, QGA-SVM was used to classify connective tissue, and the recognition accuracy was lower than LDA and SVM algorithms. Because connective tissue is a normal tissue without any disease, although the classification accuracy is lower than LDA and SVM, it does not affect the overall accuracy of QGA-SVM in classifying diseased breast tissue. The experimental results show that when the training samples are fixed, although LDA and SVM show good accuracy, the accuracy of QGA-SVM is always the best.
A total of 106 groups of breast tissues were used, all of which were used as training sets and test sets.
The effect of distinguishing three kinds of pathological tissues separately
The effect of distinguishing three kinds of pathological tissues separately
Separately distinguish three normal tissues. In the process of differentiation, for example, when distinguishing connective tissue, connective tissue is regarded as a class, and adipose tissue and gland tissue are regarded as a class. The same division is performed when distinguishing between adipose tissue and glandular tissue. The classification effect is shown in Table 5.
The effect of distinguishing three kinds of normal tissues separately
As can be seen from the tables, since the training set and test set are the same, the classification accuracy of both pathological tissues and normal tissues has been improved to a certain extent and is higher than 90%. Essentially, this tissue identification only allows some test checks of the discriminative method, and what really proves the classification effect is that the training and test sets are different. But to a certain extent, it also reflects the strong classification performance of the algorithm.
53 groups of samples were selected as the training set, including 10 cancer tissue groups, 8 fibroadenoma groups, 9 breast disease groups, 7 connective tissue groups, 11 adipose tissue groups, and 8 gland tissue groups, and the remaining 53 groups of samples were used as the test set. When the selected training set and test set are different, the organization can still be classified, and some observable rules can still appear.
The three pathological tissues were separately distinguished, and the classification effects were shown in Table 6.
The effect of distinguishing three kinds of pathological tissues separately
The effect of distinguishing three kinds of pathological tissues separately
Three normal tissues were separately distinguished, and the classification effects were shown in Table 7.
The effect of distinguishing three kinds of normal tissues separately
In the process of differentiation, if the differentiation of cancer tissue, cancer tissue as a class, glandular tissue and adipose tissue as a class; The same is done to distinguish glandular tissue from adipose tissue.
The classification of 106 data sets as training sets is shown in Table 8.
Classification effects of 106 data sets as training sets
Classification effects of 106 data sets as training sets
The classification of 53 data sets as training sets is shown in Table 9.
Classification effects of 53 data sets as training sets
According to the above classification methods based on different training samples and classification methods,whether from the perspective of separate classification of pathological tissues and normal tissues or from the perspective of identifying pathological tissues from normal tissues, the classification accuracy of taking part of samples as training sets has been reduced to varying degrees. From the general situation of classification, we can find that the classification function used at present has higher classification performance. It can not only accurately identify cancer tissue from normal tissue, but also accurately identify cancer tissue from pathological tissue, and the accuracy rate is good. And it is very useful to use them to analyze data. In the future work, we will focus on expanding the data set and better training the algorithm to make the algorithm more robust.
In multiple experiments, one of the classification cases with normal performance is selected, and its confusion matrix is displayed as shown in Fig. 7. Although this is a relatively modest classification, it is also excellent at classifying cancerous tissue compared to other algorithms.

Confusion matrix.
We compared recent studies on the use of EIS in the diagnosis of breast cancer, and compare the advantages and disadvantages as shown in Table 10. These studies are still devoted to the classification of breast tissue. With the progress of scientific research, the performance of related studies is also continuously improved, and the EIS has been described and sorted out more comprehensively.
Comparison of related literatures
Comparison of related literatures
Combining the advantages of efficient and fast global optimization algorithm QGA and SVM, QGA-SVM is obtained. The algorithm was applied to the characteristic data of breast, connective tissue, adipose tissue, breast disease, fibroadenoma and carcinoma obtained by electrical impedance measurement. The experimental results show that the improved classification algorithm can classify breast tissue with high accuracy, which can help us effectively identify normal breast tissue and malignant tumor tissue. Compared with the traditional classification algorithm, qga-svm greatly improves the classification efficiency, especially in the classification of breast cancer and adipose tissue. The accuracy is more than 99%, and satisfactory optimization results are obtained. At the same time, the increased calculation time is within an acceptable range. In the case of different training sets, the algorithm performance is best when the training sets and verification sets are the same, but the case of different training sets and verification sets is more representative and factual. The algorithm still shows excellent classification accuracy in different training sets and verification sets.
