Abstract
The Mammographic image is a tool for observing breast cancer. Analyzing difficulties include shape, size variety, nearby tissue, and noise. In this paper, we propose a method to classify mammogram abnormalities based on learning vector quantization inference classifier (LVQIC) with fuzzy co-occurrence matrix (FCOM) textural features. The system is implemented on the Mini-MIAS data set with a 5-class problem, i.e., the classification of architectural distortion (AD), spiculated mass (SPIC), calcification (CALC), well-defined/circumscribed masses (CIRC), and normal (NORM). The implementation is also on a 2-class problem consisting of AD-vs-All, SPIC-vs-All, CALC-vs-All, CIRC-vs-All, and NORM/abnormal. The best blind test result is from the 5-class problem with features from fuzzy co-occurrence matrix (FCOM) with 4 clusters, co-occurrence distance d = 2, and 16 prototypes per class. The best classification result is 100% correct classification with 0.03, 0.04, 0.06, and 0.02 false positive rate for AD, SPIC, CALC, and CIRC, respectively.
Introduction
Breast cancer is one of the most occurring diseases in women. It is also one of the fatal diseases. However, detecting the cancer at the earlier stage can help easing the curing process [1]. The Beast Imaging-Reporting and Data System (BI-RADS) [1], a quality assurance tool in mammography, is a form of reported results. There are several factors in BI-RADS including architectural distortion (AD), asymmetry (ASYM), calcification (CALC), well-defined/circumscribed masses (CIRC), other ill-defined masses (MISC), and spiculated mass (SPIC). There are several research works involving the abnormality detection without categorizing the abnormality types [2–13]. There are some research works involving with abnormality types detection either detecting one type, i.e., the CALC [14–20] or detecting two to five types [21–26]. Most the existing works utilized a traditional method including feature generation and classification. However, there are many feature extraction methods. It might not be easy to find a good feature set for this problem. Also, it might be easy for human to understand the system if the system can provide rules for the detection.
Hence, in this paper we develop a mammogram abnormality detection model using the learning vector quantization inference classifier (LVQIC) [27] and the fuzzy co-occurrence matrix (FCOM) [26, 28]. The FCOM is utilized to generate a feature set for the problem. The LVQIC is utilized to select good features and generate rule set for the detection.
Methods and materials
Fuzzy co-occurrence matrix
In this paper, we use the fuzzy co-occurrence matrix (FCOM) [26, 28] as our feature generation method. The FCOM is a novel texture features based on the incorporation of the fuzzy set theory into the Gray Level Co-occurrence Matrix (GLCM) [29]. First, the fuzzy C-means (FCM) [30] with m = 2 is implemented on an original gray scale image I. Then each pixel is assigned to a cluster with the highest membership value in order to create the FCOM. The number of clusters (C) and the number of directions are used to indicate the number of FCOM planes. In our experiment, we set direction (θ) to 0, 45, 90, and 135 degrees. Hence, there are C × 4 FCOM planes. We set d = 1 or 2 in the experiment. The summary of fuzzy co-occurrence matrix algorithm is as follows:
-For each direction (θ)
-For each pixel (p)
-Find pixel (q) that is d apart from p
-Find the assigned cluster of pixel q
(Suppose assigned cluster of p is k and assigned cluster of q is i)
-Set all FCOMs to zero
-For each cluster (i)
-FCOM(i, k, l) = FCOM(i, k, l) + u pi + u qi
(where u pi and u qi are the membership values of pixels p and q in cluster i)
-End For i
-End For p
-End For θ
In the experiment, we implement the FCM with the number of clusters (C) of 4 and 8. Then for 4 clusters, each FCOM is with the size of 4×4 creating 16 FCOM planes, whereas that with the size of 8×4 creating 32 FCOM planes for 8 clusters. We compute 14 features according to Equations (1 to 15). Hence there are 14×16 = 224 and 14×32 = 448 dimensions for 4 clusters and 8 clusters, respectively. The mean and standard deviation of all directions for each of 14 features are also utilized. Hence there are 14×24 = 336 and 14×48 = 672 dimensions for 4 clusters and 8 clusters, respectively.
Energy:
Contrast:
Correlation:
Variance:
Homogeneity:
Sum Average:
Sum Variance:
Sum Entropy:
Entropy:
Difference Variance:
Difference Entropy:
Information Measure of Correlation 1:
Information Measure of Correlation 2:
Maximal Correlation Coefficient:
After generating feature set mentioned in the previous section, the learning vector quantization inference classifier (LVQIC) [27, 31] is used to select feature and classify. The LVQIC is a neuro-fuzzy model based on learning vector quantization (LVQ) [32]. It has ability of classification, feature selection, and rule extraction. Figure 1 shows the structure of the LVQIC.

The structure of LVQIC.
Layer 1: This layer simply contains input features. Each feature is normalized to 0 to 1.
Layer 2: The closest prototype and the closest feature are determined. The closest prototype is calculated from:
for each prototype j, compute
Then we need to find a winning prototype of each feature by,
for each feature i, compute
Layer 3: There are 2 types of prototype closeness and feature closeness, i.e.,
1. Hard method (winner-takes-all)
2. Soft method (winner-takes-most)
Both closeness values are accounted as the output of this layer as follow:
Then the output vector of this layer will be
Layer 4: The output of this layer is calculated as
The training process is based on the concept of the learning vector quantization and the gradient descent. Please be noted that the details of the training process of the LVQIC are shown in [30].
The output weight values are utilized in the feature selection process as follows:
-Find
-Initialize topc = 0, slf = 0, count = 0, FS = () ##no. selected features.
-Repeat
-Pick ith feature from the remaining features which FR i is maximum.
-Keep ith feature in FS and set count = count + 1.
-Set c to the classification rate of the training data set with features in FS
-If c > topc then topc = c, slf = count.
-Until no remaining feature.
-Set FS' = {k|k is a feature in the first selected slf features in FS}.
-Set FS = FS'. ##FS contains only the first selected slf features.
-Initialize topc = 0, slw = 0, count = 0, WS = () ## no. selected weights.
##Consider only weights w kji connecting to the first selected slf
##features in FS.
-Repeat
-Pick w kji connecting to the remaining features in the FS.
-Keep weight w kji in WS and set count = count + 1.
-Set c to the classification rate of the training data set with weights in WS
-If c > topc then topc = c, slw = count.
-Until no remaining output weight.
-Set WS' = {l|l is a weight in the first selected slw weights in WS}.
-Set WS = WS' ##WS contains only the first selected slw weights.
-Return slf, slw, FS, WS.
The rule extraction is determined according to the prototype features and the output weights. There are 2 rule categories, i.e., crisp rule and fuzzy rule. We only consider weights w kji that are in WS. For input feature i (x i ) and prototype g ji connecting to w kji , the generated rule can be
Crisp rules: IF (x i is CLOSE TO g ji ) THEN class k
Fuzzy rules: IF w kji ×(x i is CLOSE TO g ji ) THEN class k
The Mammographic Image Analysis Society called mini-MIAS [33] is utilized in the experiment. There are 322 images of architectural distortion (AD), asymmetry (ASYM), calcification (CALC), well-defined/circumscribed masses (CIRC), other ill-defined masses (MISC), and spiculated mass (SPIC) and normal (NORM), each with the size of 1024×1024. Examples are shown in Fig. 2. We, however, only detect AD, SPIC, CALC, and CIRC in the experiment. The signature library is created from cropped images with size of 64×64 on the target regions selected from 8 AD, 9 SPIC, 6 CALC, 8 CIRC, and 42 NORM mammograms. At the end, the signature library contains 42 AD, 42 SPIC, 21 CALC, 21 CIRC, and 42 NORM images. Examples are shown in Fig. 3. The system is trained from the images in the signature library with 7-fold cross validation. The best model is selected to test with the blind test data set. The remaining of AD, SPIC, CALC, CIRC, and NORM mammograms are used as a blind test data set. There are 220 mammograms in total for the blind test data set.

Original mammograms.

Example of sub-image in the signature library.
As mentioned in the previous section, there are 2 types of feature set used in the experiment, i.e., features from FCOM without means and standard deviations (called Feat1) and features from FCOM with means and standard deviations (called Feat2). To train the system, the number of prototypes per class is set to 4, 7, 11, and 16. The LVQIC models are trained with 7-fold cross validation using only the data in the signature library. Each model is trained 5 times. Tables 1 and 2 show the example of the validation set result for 5-class classification using Feat1 and Feat2, respectively. The feature set Feat1 with 8 clusters, d = 1, and 11 prototypes per class (Feat1C8D1P11) and set Feat2 with 4 clusters, d = 2, and 16 prototypes per class (Feat2C4D2P16) both yield 79.17% correct classification. The 5-trial average classification rate from Feat1C8D1P11 is 69.17% ±6.97%, whereas that of Feat2C4D1P11 is 64.17±9.13. Tables 310 show the example of the validation set result for 2-class classification using Feat1 and Feat2. For AD-vs-All case, Feat1 with 8 clusters, d = 2 and 16 prototypes per class (Feat1C8D2P16) and Feat2 with 4 clusters, d = 2 and 11 prototypes per class (Feat2C4D2P11) both provide 91.67% correct classification. Feat1C8D2P16 provides 5-trial classification rate of 83.33% ±5.1%, whereas Feat2C4D2P11 provides that of 85.00% ±4.75%. The best model for the SPIC-vs-All using Feat1 with 4 clusters, d = 2 and 4 prototypes per class (Feat1C4D2P4) and Feat2 with 4 clusters, d = 1 and 11 prototypes per class (Feat2C4D1P11) give 83.33% correct classification. While the 5-trial average classification rate of Feat1C4D2P4 and Feat2C4D1P11 are 76.67% ±3.73% and 78.33% ±4.56%, respectively. In CALC classification, Feat1 with 8 clusters, d = 1 and 7 prototypes per class (Feat1C8D1P7) and Feat2 with 8 clusters, d = 2 and 11 prototypes per class (Feat2C8D2P11) give 95.83% correct classification rate. The 5-trial average classification rate from Feat1C8D1P7 and Feat2C8D2P11 in this case are 90.00% ±3.73%. The CIRC classification rate in 2-class problem using Feat1 with 4 clusters, d = 2 and 11 prototypes per class (Feat1C4D2P11) and Feat2 with 4 clusters, d = 2 and 7 prototypes per class (Feat2C4D2P7) are 100%. The 5-trial average classification rate from Feat1C4D2P11 and Feat2C4D2P7 are 95.00% ±3.48% and 96.66% ±1.86%, respectively.
Example of validation set result of 5-class classification using Feat1
Example of validation set result of 5-class classification using Feat1
Example of validation set result of 5-class classification using Feat2
Example of validation set result of 2-class (AD-vs-all) using Feat1
Example of validation set result of 2-class (AD-vs-all) using Feat2
Example of validation set result of 2-class (SPIC-vs-all) using Feat1
Example of validation set result of 2-class (SPIC-vs-all) using Feat2
Example of validation set result of 2-class (CALC-vs-all) using Feat1
Example of validation set result of 2-class (CALC-vs-all) using Feat2
Example of validation set result of 2-class (CIRC-vs-all) using Feat1
Example of validation set result of 2-class (CIRC-vs-all) using Feat2
Another 2-class problem is when we consider AD, SPIC, CALC, and CIRC as one abnormal class. The best validation result of Feat1 and Feat2 are shown in Tables 11 and 12. Feat1 with 4 clusters, d = 1 and 11 prototypes per class (Feat1C4D1P11) and Feat2 with 8 clusters, d = 1 and 16 prototypes per class (Feat2C8D1P16) provide 100% correct classification. Whereas the 5-trial average classification rate are 94.17% ±3.72% for Feat1C4D1P11 and 93.34% ±3.73% for Feat2C8D1P16.
Example of validation set result of 2-class (normal-vs-abnormal) using Feat1
Example of validation set result of 2-class (normal-vs-abnormal) using Feat2
After the best model is selected for each case, we test each model on the blind test data set. A window of 64×64 is scanned with step size of 16 pixels from top to bottom and left to right to generate features and classify object. The result from the system is put at the center of that window. To show the performance of the system, we create the receiver operating characteristics (ROC) curve. The curve shows the probability of detection against the false positive rate. The false positive rate is a ratio of the number of false regions to the number of regions of the other classes. We counted false positive as one false positive if all pixels are connected as 8-connected component. The same method is utilized to count the detection region. Figures 4 and 5 show the ROC curves for the 5-class problem with Feat1C8D1P11 and Feat2C4D2P16, respectively.

Blind test result of the 5-class problem with Feat1C8D1P11.

Blind test result of the 5-class problem with Feat2C4D2P16.
Figure 6 shows the blind test ROC curve of AD-vs-All with Feat1C8D2P16, SPIC-vs-All with Feat1C4D2P4, CALC-vs-All with Feat1C8D1P7, and CIRC-vs-All with Feat1C4D2P11. While Fig. 7 shows that of AD-vs-All with Feat2C4D2P11, SPIC-vs-All with Feat2C4D1P11, CALC-vs-All with Feat2C8D2P11, and CIRC-vs-All with Feat2C4D2P7. Figures 8 and 9 show the blind test ROC curves of normal-vs-abnormal with Feat1C4D1P11 and Feat2C8D1P16, respectively.

Blind test result of the 2-class problem with Feat1.

Blind test result of the 2-class problem with Feat2.

Blind test result of the 2-class problem (normal-vs-abnormal with Feat1C4D1P11.

Blind test result of the 2-class problem (normal-vs-abnormal with Feat2C8D1P16.
Table 13 shows the best blind test result from each model. We can see that the best result is from 5-class problem with the Feat2C4D2P16 model. The detection result is as high as 100% correct classification. However, there are some false positives shown in the result.
The best blind test result from each model
The best blind test result from each model
Example of successful detection with some false positives of each model are shown in Figs. 1015. We can also see that there are many false positive regions. One of the reasons might be because window size of 64×64 is smaller than that of many abnormalities areas. Some areas of AD, SPIC, CIRC, and CALC are bigger than the window size, hence, the LVQIC cannot classify those regions correctly.

Example of the mammogram result of the 5-class problem with Feat1C8D1P11.

Example of mammogram result of the 5-class problem with Feat2C4D1P11.

Example of the mammogram result of the 2-class problem with Feat1.

Example of the mammogram result of the 2-class problem with Feat2.

Example of the mammogram result of the 2-class problem (normal-vs-abnormal with Feat1C4D1P11.

Example of the mammogram result of the 2-class problem (normal-vs-abnormal with Feat2C8D1P16.
Table 14 shows the ability of our system that is comparable with the existing state-of-the-art methods. We can see that our method is comparable with the existing ones. However, there is no pre-processing or ROI selection in this proposed system.
Existing and our purposed method detection result
Note FPI stands for false positive per image.
This paper proposes a scheme of breast abnormalities detection in mammograms using fuzzy co-occurrence matrix (FCOM) to extract features and learning vector quantization inference classifier (LVQIC) to classify the abnormalities. We implement the system on 5-class problem. i.e., architectural distortion (AD), spiculated mass (SPIC), calcification (CALC), well-defined/circumscribed masses (CIRC), and normal (NORM). The system is also implemented on the 2-class problem, i.e., AD-vs-All, SPIC-vs-All, CALC-vs-All, CIRC-vs-All, and NORM/abnormal. The best blind test result is from the 5-class problem with features from FCOM with means and standard deviations with 4 clusters, d = 1, and 11 prototypes per class. The best classification result is 100% correct classification with 0.03, 0.04, 0.06, and 0.02 false positive rates for AD, SPIC, CALC, and CIRC.
