Mammogram content-based image retrieval based on malignancy classification

Abstract

Content-based image retrieval (CBIR) technique is increasingly gaining research attention as a Computer Aided Diagnosis (CAD) approach for breast cancer diagnosis. This work discusses a novel feature modeling technique for CBIR systems based on classifier scores and standard statistical calculations on the same. Established textural and geometric features are initially used to represent medical characteristics, before being used to generate secondary features through classifier scoring using the Support Vector Machine and Quadratic Discriminant Analysis classifiers. The model is validated through a range of benchmarks, and is shown to perform competitively in comparison to similar works.

Keywords

Mammography microcalcifications image processing CBIR machine learning computer aided diagnosis

1. Introduction

Breast cancer is the uncontrolled growth and spread of abnormal cells originating in the breast; it is the most commonly diagnosed cancer among women globally with 1.7 million cases reported in 2012 alone, accounting for 25% of new cancer cases [5]. It can be fatal if not attended to, but yields a good prognosis if detected early. Mammography currently provides the best approach for early detection of the disease, with trials estimating its reduction of fatality rates by up to 40%; this is besides enabling a wider array of treatment options, including less-intensive surgery procedures [6].

Content-based image retrieval (CBIR) systems can provide a support structure to radiologists during breast cancer diagnosis by availing pathologically similar past mammogram cases with known diagnosis, which improves the confidence of the current diagnosis [12, 16]. An essential distinction between CBIR systems and traditional text-based query systems is that in the former, comparison considers visual features derived from the images and a corresponding method of similarity measurement instead textual annotations, as is the case in the latter. Indeed, the efficiency and efficacy of medical CBIR systems – also called Content-based Medical Image Retrieval (CBMIR) systems – strongly relies in part on the features selected for representing the salient high level medical characteristics of the image [24]. Additionally, the choice of a corresponding similarity measurement method for the resulting visual features is crucial to reducing the semantic gap, defined as the difference in high-level interpretation of images by humans as contrasted to the low-level understanding of the same by algorithmic models. The efficiency and efficacy of pathology-based CBIR systems in providing accurate results to radiologists is a critical factor in their acceptance in regular medical routines [11]. In this paper, a CBIR model is presented, that demonstrates improved retrieval performance of mammograms based on microcalcifications as the pathology. Microcalcifications, besides breast masses, are the most important lesions in the diagnosis of breast cancer. The main contribution of this study is,

•
The parallel combination of scores from both the SVM and Quadratic Discriminant Analysis (QDA) classifiers for feature characterization
•
Further extraction of the ultimate feature set from the scores, rather than the direct application of the scores to the retrieval engine

2. Literature review

Within the last two decades, research activity has focused on various CBIR models for breast cancer diagnosis support based on diverse high-level features such as calcifications, the breast parenchyma, masses, etc.; these are extensively discussed in the survey study by Zheng [37]. For instance, a CBIR model is presented for the retrieval of mammograms based on breast density [12]. It extracts Singular Value Decomposition (SVD) and histogram features, which are used to train a Support Vector Machine (SVM) model. The model is measured on the sole benchmark of average precision, attaining therewith the best score of 82.14% using the polynomial kernel. The authors note that the model can be improved by considering other crucial information such as features related to lesions as well as appropriate weighting of features. The importance of extracting features related to lesions (e.g. masses and calcifications) for CBIR systems is also acknowledged by Kinoshita et al. [20]. For similarity measurement, a significant number of studies [34, 17, 19] use the Euclidean distance metric.

The extraction of lesion information as features for CBIR algorithms has received significant attention in the literature [37]. Notably, the pivotal concept of user similarity perception modeling with regards to lesions and more specifically, microcalcifications, is presented by El-Naqa et al. [13] as further work on a model presented in their earlier seminal work. In their study, the authors encode perceptual similarity of mammograms by radiologists using the neural network (NN) and SVM classifiers, based on nine microcalcification cluster (MCC) shape features extracted from regions of interest (ROIs). The authors posit that classifiers capture similarity as perceived by human observers more accurately than simple distance metrics. The ROIs forming the image dataset were sourced from a public database and scored by radiological experts specifically for that study. Experimental results reported a significant improvement in the matching fraction (76.7%) of their learned model against the Euclidean distance metric, even surpassing that of the human observers (66.7%). In a largely similar experimental setup, they expand on the results of their study by incorporating individual microcalcification features, with the objective of comparing supervised learning (modeled using the SVM classifier with a Gaussian kernel) against unsupervised learning (using Discriminant Adaptive Nearest Neighbor (DANN)) [35]. The results reported a superior matching fraction score for the supervised technique at 72.5% against approximately 64.5% for DANN.

Having demonstrated the viability of classifiers in encoding domain-specific information as briefly discussed in the preceding paragraphs, researchers have also looked at extending/modifying the structure of classifiers in order to customize them to specific problems. This can be seen in the study by Nishikawa et al. [34, 17] where a so-called case-adaptive approach is employed to improve the retrieval performance of their computer-aided diagnosis (CADx) system. Their approach involved retrieving similar mammogram cases for a particular query as a preliminary step using a regular classifier and using the retrieved cases to further modify the decision boundary of classifier. Effectively, the classifier is trained with the new set of retrieved cases in conjunction with the original training set. The computational cost associated with this approach was deemed an issue, which Jing et al. [19] addressed by replacing the decision function of the first classifier (called baseline classifier) with a regularization prior. Apart from achieving a high score according to the Area under the Curve (AUC), the regularized classifier approach [19] resulted in a tenfold reduction in computational complexity.

More recent work on adaptation of the SVM decision function was presented by Tsochatzidis et al. [33], where three SVMs are trained using 90 image ROIs from the DDSM database with the task to distinguish breast masses based on three BI-RADS categories. For any given image sample, the authors use the value of the SVMs’ decision function rather than its sign as input to a function that calculates what they call the participation value. The three participation values constitute the members of a three-dimensional feature vector that is used for similarity calculations by the Euclidean metric. Their model outscored a state-of-the-art conventional Euclidean-based similarity measurement model by 5.7% based on the mean average precision metric. In a subsequent study, their scheme was adapted to microcalcifications covering four BI-RADS categories. Seven shape features and three textural features were extracted to characterize the lesions, with the latter calculated over Contourlet subbands. Eighty-seven ROIs extracted from the DDSM database were used for model training and performance benchmarking with the model scoring 60% compared to 52% by the unsupervised CBIR (Euclidean-based model) based on mean average precision.

Efficient feature characterization is critical to CBIR CAD-based systems [11, 9] and is still an active research area. Much work is still needed on the characterization of features in order to improve the accuracy of CBIR systems [11]. Similarity modeling using classifiers has demonstrated its viability over simple distance measures as has been discussed in the preceding paragraphs. However, to the best of our knowledge, none of the previous work has considered using statistical descriptors based on the classifiers’ decision functions. This work aims to further explore this idea by deriving statistical features from classifier scores as a means of improving the accuracy of CBIR-based CAD systems in the domain of breast cancer diagnosis.

The remainder of this paper is structured as follows: the proposed methodology is presented in the next section followed by a presentation and discussion of the proposed model’s performance and thereafter the conclusion.

3. Materials and methods

The proposed methodology is illustrated in Fig. 1. The model takes as input a binary image containing probable calcification objects as the foreground and its original gray level version. The term “calcification” or “calcification object”, especially in the early sections of this paper, is loosely used to refer to the foreground pixel regions in the binary region of interest and does not necessarily mean that the object has been established as a true calcification. The subsequent processes are discussed in the following subsections.

Figure 1.

Functional diagram of the proposed method.

3.1 Region of Interest (ROI) determination

The ROI encompasses the region over which features will be extracted further down the pipeline. This region is drawn around clusters and individual calcification objects. The inputs to the model are comprised of a segmented binary image depicting calcification objects and its original grey level version as shown in Fig. 2. White pixels in the binary image are assumed to represent calcification objects with the black pixels forming the background. A cluster is established where there are three or more calcification objects within an area of 1 cm ${}^{2}$ [22]. In the case no cluster is found, the image is marked as not having calcification clusters and calcification objects are considered individually. The detection and segmentation of calcifications was done as a prior activity and is considered beyond the scope of this text for the sake of brevity.

For feature extraction, a bounding box is drawn around individual calcification objects and their containing cluster, where present, on the binary image; the bounding box is extended with a padding of 5 pixels from the bordering calcification’s boundary pixels. The ROI position and dimensions established from the binary image are superimposed on the gray level image, such that both ROIs have the same coordinates on both versions of the image – these two ROIs will represent the original image in the subsequent steps (see Fig. 2). The binary image is simply the segmented version of the grey level image (i.e. containing the detected calcifications).

Table 1
Individual feature vector $\vec{v}_{i}$ , with dimension $|\vec{v}_{i}|=125$

#	Name	Image type
1	Area	Binary
2	Compactness	Binary
3	Orientation	Binary
4	Eccentricity	Binary
5	Solidity	Binary
6–25	Contrast	Grey level
26–45	Correlation	Grey level
46–65	Energy	Grey level
66–85	Homogeneity	Grey level
86–105	Entropy	Grey level
106–125	Maximum probability	Grey level

Figure 2.

The first image is the original, second is the segmented and the third is the superposition of the first image on the second to highlight the microcalcifications. The rectangle depicts the established region of interest for feature extraction.

3.2 Feature extraction and preprocessing

At this stage, features are extracted to represent individual calcification objects and cluster objects. Information on these two object types is critical to malignancy determination as explained in the literature review section. The individual calcification feature vector is denoted as $\vec{v}_{i}$ , while the cluster feature set is denoted as $\vec{v}_{c}$ .

3.2.1 Feature extraction

The choice of features is guided by the need to bridge the semantic gap in the pathological description of microcalcifications between CBIR algorithms and radiologists. In this regard, two features that highly correlate with radiologists’ descriptions are extracted as the first set of features. These are: Haralick features, which are extracted from the gray level ROI, and geometric features, which are extracted from the binary image ROI [13, 34]. Table 1 shows all the feature vector components used to characterize individual microcalcifications, while Table 2 lists the features used for microcalcification clusters. These two features have been widely used in calcification characterization and their performance for shape and textural encoding applications is well documented [22].

Table 2
Cluster feature vector $\vec{v}_{c}$ , with dimension $|\vec{v}_{c}|=141$ . The meanings of the abbreviations are as follows: CC – Cluster calcifications, CVH – Cluster convex hull, CR – Cluster region, B – Binary image, GL – Grey level image, $\mu$ – mean, $\sigma$ – Standard deviation

Feature description			ROI object	Image type
#	Primary	Secondary
1–2	Area	$\mu$ , $\sigma$	CC	B
3–4	Compactness	$\mu$ , $\sigma$	CC	B
5–6	Orientation	$\mu$ , $\sigma$	CC	B
7–8	Eccentricity	$\mu$ , $\sigma$	CC	B
9–10	Solidity	$\mu$ , $\sigma$	CC	B
11	Area		CVH	B
12	Compactness		CVH	B
13	Orientation		CVH	B
14	Eccentricity		CVH	B
15	Solidity		CVH	B
16	Density		CR	B
17–18	Inter-calcification distance	$\mu$ , $\sigma$	CR	B
19–20	calcification $\rightarrow$ cluster centroid distance	$\mu$ , $\sigma$	CR	B
21	Number of calcifications		CR	B
22–41	Contrast		CR	GL
42–61	Correlation		CR	GL
62–81	Energy		CR	GL
82–101	Homogeneity		CR	GL
102–121	Entropy		CR	GL
122–141	Maximum probability		CR	GL

Haralick features are extracted from the GLCM matrix; they are used for modeling textural characteristics, which are distinctly defined in a calcification-present area of mammograms. The GLCM matrix, $P(i,j|d,\theta)$ , encodes the spatial dependencies of tonal intensities for a given distance and orientation, providing a basis for extraction of second-order statistical features. The element $P(i,j|\delta x,\delta y)$ is the relative frequency of the co-occurrence of pixels having intensity $i$ and $j$ , separated by pixel distance $\delta x$ and $\delta y$ along the $x$ and $y$ dimensions respectively. Assuming an image $I$ with spatial dimensions $M\times N$ and $L$ grey levels, the GLCM matrix is defined as follows[26],

$\displaystyle P_{i,j,\theta}=\sum_{x,y}P\{I(x,y)=i\text{ and }I(x+d\theta_{0},% y+d\theta_{1})=j\}$ (1)

where, $0\leqslant x\leqslant M-1$ , $0\leqslant y\leqslant N-1$ and $0\leqslant i,j\leqslant L-1$ . The orientation $\theta$ is quantized to four values , which are represented as shown in Eq. (2).

$\displaystyle\theta=\begin{cases}0^{\circ},&\text{if $\theta_{0}=0$ \text{ and% } $\theta_{1}=1$;}\\ 45^{\circ},&\text{if $\theta_{0}=-1$ \text{ and } $\theta_{1}=-1$;}\\ 90^{\circ},&\text{if $\theta_{0}=1$ \text{ and } $\theta_{1}=0$;}\\ 135^{\circ},&\text{if $\theta_{0}=1$ \text{ and } $\theta_{1}=-1$;}\\ \end{cases}$ (2)

This work uses all four orientations shown in Eq. (2) and five distances, $d\in$ [1, 3, 5, 7, 9]. Haralick features are calculated from the GLCM matrix as shown in Eqs (3)–(8).

$\displaystyle\text{Maximum probability}=\max(p_{i,j})$ (3) $\displaystyle\text{Energy}=\sum_{i,j}^{N}p_{i,j}^{2}$ (4) $\displaystyle\text{Homogeneity}=\sum_{i,j}^{N}\frac{p_{i,j}}{1+|i-j|}$ (5) $\displaystyle\text{Contrast}=\sum_{i,j}^{N}p_{i,j}|i-j|^{2}$ (6) $\displaystyle\text{Correlation}=\sum_{i,j}^{N}\frac{p_{i,j}(i-\bar{u}_{i})(j-% \bar{u}_{j})}{\sigma_{i}\sigma_{j}}$ (7)

$\displaystyle\text{Entropy}=\sum_{i,j}^{N}p_{i,j}(-\ln p_{i,j})$ (8)

Geometric features on the other hand describe shape characteristics of clusters or individual calcification objects, which is useful in distinguishing the various pathologies of calcifications. The five Shape features extracted in this work directly relate to the descriptions used by radiologists to characterize the various calcification properties [22, 27, 9]:

•

Area – the total number of foreground pixels;

•

Compactness – the ratio involving a factor of the object’s perimeter and its area. It gives a measure of the roundness of the object;

•

Orientation – the angle between the x-axis and the major axis of the ellipse of the object;

•

Eccentricity – the ratio of the distance between the foci of the ellipse and its major axis length. Bigger values imply a higher linearity semblance of the object;

•

Solidity – refers to the ruggedness of the object, measured as the ratio between its actual area and that of its convex hull.

Cluster region (CR) in Table 2 refers to the grey level ROI identified in the previous Section, the convex hull is drawn around the border calcification objects of the cluster. In the non-clustered ROI, haralick and geometric features are extracted for each individual calcification object.

3.2.2 Feature preprocessing

At this stage, normalization is applied on both feature sets ( $\vec{v}_{i}$ and $\vec{v}_{c}$ ) followed by Principal Component Analysis on the cluster feature vector ( $\vec{v}_{c}$ ). The $\mathcal{Z}$ -score normalization is applied to reduce the undue influence of features having large ranges; it is done using the following equation [2],

$\displaystyle\tilde{x}=\frac{x-\mu}{\sigma}$ (9)

where $\tilde{x}$ is the standardized vector, $x$ is the original vector, $\mu$ and $\sigma$ are the sample mean and standard deviation respectively. This process effectively transforms both feature vectors to have zero mean and unit standard deviation.

Given the few clusters that were obtained after feature extraction, the cluster feature vector $\vec{v}_{c}$ is transformed into a reduced dimension space using Principal Component Analysis (PCA), to reduce the dimensions of the resultant vector. PCA seeks a linear combination $Y=\sum_{i=1}^{n}\lambda_{i}x^{(i)}$ for a column of predictors $x^{(i)}$ of a matrix $X$ such that the dimensions of $Y$ are linearly independent [10]. The resultant matrix $Y$ is usually ordered with the most significant dimensions first, with this significance defined in terms of variance. By taking the first $m$ significant dimensions of $Y$ , most important information in $X$ can be retained with the benefit of reduced data.

The reduction of the dimension of $\vec{v}_{c}$ was necessary because the Quadratic Discriminant Analysis (QDA) classifier used later on requires a minimum number of training samples for a given feature dimension size for effective parametrization during covariance estimation [30]. PCA was not applied to the individual vector $\vec{v}_{i}$ given that there were enough samples for QDA training. In this work, $\vec{v}_{i}$ and $\vec{v}_{c}$ are used as intermediate features, and fed as inputs to the classifiers in Section 3.4.

3.3 Feature selection

The “curse of dimensionality” is a classic issue in CBIR systems, where the performance of such systems expectedly degrades with an increase in the number of features. In fact, some authors contend that some extraneous features act as noise, worsening the query results of CBIR systems[23, 11, 29]. Ladha and Deepa categorize features into three [23]:

relevant
features have an influence on the output. Their role cannot be overlooked
irrelevant
features have no influence on the output and can be left out without incurring any performance penalty
redundant
features can be substituted by other features

Feature selection plays a three-fold role of (1) reducing the cost of feature extraction, (2) improving classification accuracy and (3) improving the reliability of performance estimate [21, 4]. Feature selection methods are characterized by: a search strategy used to explore the space of hypothesis, a mechanism of proposing feature candidates for the current hypothesis and a measure of evaluating the selected candidate features at any given point. They can be categorized under three classes: filter model, wrapper model and hybrid model. Filter approaches rely on the general characteristics of data to evaluate features based on some discriminating criteria, while wrappers use classifiers and a subset selection approach to measure a feature subset’s prediction performance [23]. The Hybrid model is a combination of the two. These three categories have formed the focus of research in a wide range of applications and datasets [23, 21, 4, 1, 32, 18, 29]; a comprehensive discussion on feature selection methods for various biomedical applications can be found in [23, 18], with [4, 32] focusing on microcalcification detection applications.

Figure 3.
Individual feature set Independence significance test results.

The optimal feature set was selected from the vectors $\vec{v}_{i}$ and $\vec{v}_{c}$ using both filter and wrapper approaches; this combined approach has been successfully applied to the selection of optimal feature subsets for microcalcification characterization [4, 29]. For notational convenience, let us take $U_{x}$ as the universal set containing all features from class $x$ , and $F(U_{x},m)$ as a function that returns $m$ features from the set $U_{x}$ . The following steps outline the proposed approach for the feature selection process.

1.
Convert cluster features to PCA eigen data $U_{cPCA}$ ;
2.
Normalize by $\mathcal{Z}$ -score standardization both individual feature set $U_{i}$ and cluster feature set $U_{cPCA}$ ;
3.
Preselect and rank by decreasing order $k$ features $F_{i}=F(U_{i},k)$ from individual feature set, and $l$ features $F_{c}=F(U_{cPCA},l)$ from the cluster set, whose independent features significance test result (see Eq. (10)) $\vec{v}(i)>=$ 2.0;
4.
Using the individual and cluster subsets ( $F_{i}$ and $F_{c}$ ) preselected in the previous step, select the optimal feature subsets for the individual and cluster ( $S_{i}$ and $S_{c}$ respectively) based on prediction performance using Quadratic Discriminant Classifier, using the Forward Selection Feature Search selection strategy.

Figure 4.
Cluster feature set Independence significance test results.

Figure 5.
Cluster feature set Independence significance test results.

The preselection step employs a filter approach according to Weiss and Indurkhya [36], which they named the independent features significance test. It involves conducting a hypothesis test on each feature to measure its information value with regards to the separability of the classes. The essence of this step is to remove obviously uninteresting (irrelevant) features with little informative value; this significantly reduces the computational burden for the next step, which is computationally intensive. While this step overlooks dependencies (redundancies) among selected features, it is fast and useful as a preprocessing technique in feature selection applications [21]. Using $i$ to index a particular feature in the feature set $\vec{v}$ , the significance of the resultant feature $\vec{v}(i)$ is calculated as follows [36].

$\displaystyle\textit{Sig}(\vec{v}(i))=\frac{\mu_{A}-\mu_{B}}{\sqrt{\frac{% \sigma_{A}}{\sum_{A}}+\frac{\sigma_{B}}{\sum_{B}}}}$ (10)

In accordance with the authors’ recommendation, all features having significance values less than 2.0 were removed from the final feature subset. Based on the results of the test, 86 individual features were selected from the original set $\vec{v}_{i}$ for scoring above 2.0 (see Fig. 3). As mentioned in the previous subsection, $\vec{v}_{c}$ is transformed using PCA before significance testing is done (Eq. (10)). The significance test is thus performed on the PCA coefficients. As seen in Fig. 5, 18 PCA coefficients score above the test cutoff mark of $2.0$ .

The second selection step applies a wrapper approach to remove redundant features using the Quadratic Discriminant Analysis (QDA) classifier and the Forward Selection Feature Search (FSFS) method. The QDA classifier is employed at this stage because of its relatively inexpensive time cost, as contrasted to the Support Vector Machine (SVM) classifier. The FSFS search strategy incrementally adds features to an initial null set until further addition cannot minimize the error rate [23]. The selection results following this process are shown in Figs 5 and 6.

Figure 6.
Feature selection results for individual calcification features ( $\vec{v}_{i}$ ).

3.4 Classifier training

The SVM and QDA models were trained using the selected feature sets. Three parameters needed to be established for the SVM classifier: the kernel type, its associated parameters and the constraint value $C$ . The linear, polynomial and rbf kernels were selected for their high recommendation in similar works [6]. The final kernel chosen from the three was that which gave the minimal classification error. This study used an unconstrained linear optimization method to establish the optimal parameter values. Initially, a search was conducted through a set of equally spaced linear values, followed by fine-tuning of the selected parameters by searching random values around them. Algorithm 3.4 shows the steps followed in fine-tuning the parameters of the SVM classifier.

[1] Train the SVM classifier

TrainSVMData [TrainData,TestData] = partitionData(Data,10) randomly partition Data into 10 sets with equal class representation Cost $\leftarrow RandomGenerator(20)$ , Scale $\leftarrow RandomGenerator(20)$ Initialize Cost and Scale parameters to 20 random real values Kernel $\in(^{\prime}linear^{\prime},^{\prime}polynomial^{\prime},^{\prime}rbf^{\prime})$ $i=1\to 20$ SVMModel = trainModel(Cost[i], Scale(i), Kernel,TrainData) Error = getClassificationError(SVMModel,TestData) CE[i] $\leftarrow$ Average(Error) CE ${}_{min}$ = getIndex(Argmin(CE)) Get index of minimum classification error Cost ${}_{opt}$ = Cost[CE ${}_{min}$ ], Scale ${}_{opt}$ = Scale[CE ${}_{min}$ ] SVMModel = trainModel(Cost ${}_{opt}$ ,Scale ${}_{opt}$ ,Kernel,Data) train using optimum parameters on all data return SVMModel return optimal model For the initial coarse parameter search, the bounds for the range of values considered in the parameter search for $\sigma$ and $C$ were taken from related work [6]. Specifically, the parameter search for the parameters considered the following values: $C\in\{2^{-i}|i=-3,-2,\ldots,15\}$ and $\sigma\in\{2^{-j}|j=-12,-2,\ldots,4\}$ . The optimum model was established to be the polynomial kernel with the following parameters: $\sigma=$ 2.7803 and $C=$ 1000.

3.5 Classifier scoring

The final feature vector is derived from classifier scores, unlike most related research works that directly employ primary features (Haralick, Wavelet, Geometric, etc.) to the $k$ -NN algorithm [13, 34, 14]. The trained SVM and QDA classifiers are used to generate scores which are used for creating the final feature vector in this stage. While classifier scores have been used before in the literature, this study extends the notion to include statistics on the scores as well. Furthermore, the contribution of a given feature is weighted based on its significance test described in Eq. (10). Scores for both the cluster region and the individual calcifications are considered. Table 3 presents the features used as the final vector set, as well as their significance and relative discrimination strength.

Table 3
The final feature vector is a combination of classifier scores as well as basic first order statistics on them. This table also shows the individual performance of the features based on the independent features significance test described in Section 3.3

Feature	Feature relevance	Description
1. SVM score	3.75(0.05%)	Cluster’s SVM score
2. $\mu_{\text{SVM}}$	17.82(0.25%)	Average of SVM scores for all ROI calcifications
3. $\mu_{\text{QDA}}$	16.16(0.23%)	Average of QDA scores for all ROI calcifications
4. $\mu_{\text{SVM+}}$	12.81(0.18%)	Average of SVM scores for positive ROI calcifications
5. $\mu_{\text{QDA+}}$	8.90(0.13%)	Average of QDA scores for positive ROI calcifications
6. $\sigma_{\text{QDA}}$	3.46(0.05%)	Standard deviation of QDA scores for all ROI calcifications
7. $\sigma_{\text{SVM+}}$	3.85(0.05%)	Standard deviation of SVM scores for positive ROI calcifications
8. $\sigma_{\text{QDA+}}$	4.04(0.06%)	Standard deviation of QDA scores for positive ROI calcifications

“Positive ROI calcifications” as mentioned in Table 3 refers to those calcifications that are classified as positive by the classifiers. A calcification is considered positive only if its score for both the QDA and SVM classifiers is greater than 50%.

3.6 Similarity measurement and ranking

The voting $k$ -Nearest Neighbor ( $k$ -NN) classifier is finally used to assign a class to the query image based on the ranked results. The $k$ -NN classifier is a non-parametric classification technique that assigns to a sample the class represented by a majority of its $k$ neighbors [4]. This method assumes all instances in the database as points in an $n$ -dimensional space and calculates the distance $d$ between the query vector $q$ and all the other samples, returning the set of $k$ vectors in increasing order of distance [7]. The distance $d$ is commonly referred to as the measure of dissimilarity and is defined as a mapping $d:X\times X\rightarrow\mathbb{R}$ . The Euclidean distance is used in calculating the dissimilarity between database samples. Assuming that $\mathbf{x}=(x_{1},x_{2},\ldots,x_{m})$ is the query vector $\mathbf{y}=(y_{1},y_{2},\ldots,y_{m})$ a vector representing a database image, both of dimension $m$ , the Euclidean distance $L_{2}$ is calculated as follows [30].

$\displaystyle L_{2}=d(x,y)=\sum_{i=1}^{m}(x_{i}-y_{i})^{2}$ (11)

In the case where more than one database samples have the same distance $d$ to the query, the algorithm returns the first $k$ images of the result as they are ordered in the database. The $k$ value of the $k$ -NN classifier was taken from the set [1, 3, 5, 7, 9, 11].

4. Evaluation study

4.1 Image dataset and experimental setup

The images used in this work were sourced from the Mammographic Image Analyis Society (MIAS) database [31]. The MIAS database has 29 ROIs containing microcalcifications, 15 of which are Malignant. These ROIs are spread over 25 different cases. This work extracted all ROIs classified positive for presence of microcalcifications, and 99 (randomly chosen but evenly distributed across all cases) classified as normal, from the MIAS database each having a resolution of 256 $\times$ 256 pixels with the cluster containing the abnormality centered on the image.

This dataset was divided into the training and testing dataset at the ratio of 0.8:0.2 respectively. The training dataset was used for the modeling of the SVM and QDA classifiers and the calculation of the standardization variables ( $\mu$ and $\sigma$ ) as discussed in Sections 3.2.2–3.4. The test dataset was used for validation of the trained models (Algorithm 3.4). The trained classifiers were then used to score all images to form the new feature database shown in Table 3. The new database was used for the validation of the model using a $k$ -NN classifier. All the images in the new feature dataset were used in the training and model validation process using the Leave-one-out cross validation approach (LOOCV) described in Algorithm 4.1. For each iteration of the LOOCV algorithm, one database sample was excluded, with the remaining others forming the training set $T$ . The excluded samples were substituted in one at a time as query images in successive iterations. All samples were thus used in turns as query images to assess the model according to the metrics discussed in the next section. Algorithm 4.1 presents the naive LOOCV strategy [14] used in assessing and benchmarking various model parameters.

[H] [1] Leave-One-Out Cross Validation for the evaluation of a given modelX Feature vector set with associated ground truth $k\in[1,3,5,7,9,11]$ $Scores[k]\leftarrow 0$ Initialize scores for all metrics to null $x_{i}\in X$ $T=X-{x_{i}}$ Exclude training sample from training dataset $M=train\_knn(T,\theta)$ train model based on the training sample and parameters $\hat{f}(x_{i})=test\_knn(M,x_{i})$ $Scores[k]\leftarrow Scores[k]+\hat{f}(x_{i})$ return $S c o r e s$ return scores for the model

4.2 Performance metric

Multiple metrics were used in a complementary manner to give a wider assessment of the proposed model; this circumvents the incompleteness of singular metrics as test validity descriptors [3] and facilitates wide comparison with literature. Specifically, model performance was benchmarked using Sensitivity (or True Positive Rate), Specificity (is equal to 1-FPR(False Positive Rate)), Accuracy and Positive Predictive Value (PPV). These metrics were chosen because of their wide use in related applications [4, 6] and the valuable information on system performance that they capture [3, 25].

The metrics used for the evaluation of the model are based on the $2\times 2$ contingency tabulation of the concepts of true/false positives/negatives, as shown in Table 4 [38].

Table 4
2 $\times$ 2 contingency table depicting True Positives (TP), False Positives(FP), True Negatives (TN) and False Negatives(FN)

		Disease
		Positive	Negative
Test	Positive	TP	FP
	Negative	FN	TN

Having established the values of the contingency table, sensitivity, specificity, accuracy and the positive predictive value (PPV) are calculated as follows.

$\displaystyle\text{Sensitivity}=\frac{\text{TP}}{\text{TP}+\text{FN}}$ (12)

$\displaystyle\text{Specificity}=\frac{\text{TN}}{\text{TN}+\text{FP}}$ (13)

$\displaystyle\text{Accuracy}=\frac{\text{TN}+\text{TP}}{\text{TN}+\text{TP}+% \text{FN}+\text{FP}}$ (14)

$\displaystyle\text{PPV}=\frac{\text{TP}}{\text{TP}+\text{FP}}$ (15)

Sensitivity (also called the True Positive Rate/Fraction i.e. TPR/TPF) gives a measure of the probability that the algorithm will correctly classify an unseen positive query, while specificity is the probability that it will correctly classify an unseen negative query [3]. A high sensitivity model implies that it is unlikely to miss a test, and is usually preferred in screening for a disease [3]. A high specificity value is equally desirable as it implies a lower probability of false positives. The Positive Predictive Value gives the probability of the sample being truly positive, by considering the prevalence of the disease; it is useful since a positive classification score does not automatically imply the presence of disease, but rather varies with the prevalence of the disease within the population sampled. For instance, a highly sensitive test will have many FPs if the disease prevalence is low [28]. A good model should score high values in all the aforementioned metrics.

4.3 Results and discussion

Experiments were conducted to comparatively analyze the proposed approach and two others employed in some literature works. The results are presented in Tables 5-6, benchmarked by the metrics discussed in the previous section. To provide a more clear perspective of the overall performance of the models for comparative assessment, the scores for all the metrics are averaged in the last column of the tables. The parameter values for $k$ are taken from the set [1, 3, 5, 7, 9, 11]; odd numbers were picked to avoid tie scenarios during voting. For referential convenience, the first approach, containing Haralick and Geometric features is referred to as Model 1 (Table 5); the two-dimensional set comprising the SVM and QDA scores is referred to as Model 2 (Table 6) and the proposed model containing features derived from the classifier scores as Model 3 (Table 7) or simply “the proposed model”.

Table 5
Performance benchmark using selected haralick and geometric features as detailed in Section 3.3. PPV is the Positive Predictive Value benchmark, also known as Precision. The highlighted row marks the best performing $k$ value based on average score of all metrics

$k$	Accuracy	PPV	Sensitivity	Specificity	Average
1	0.7784	0.1429	0.17390	0.8596	0.4887
3	0.8505	0.2000	0.08696	0.9532	0.5227
5	0.8608	0.1667	0.04348	0.9708	0.5105
7	0.8711	0.2500	0.04348	0.9825	0.5368
9	0.8763	0.3333	0.04348	0.9883	0.5604
11	0.8814	0	0	1	0.4704
Average	0.8531	0.1822	0.0652	0.9591	0.5149

Table 6

Performance benchmark using SVM and QDA scores only. PPV is the Positive Predictive Value benchmark, also known as Precision. The highlighted row marks the best performing $k$ value based on average score of all metrics

$k$	Accuracy	PPV	Sensitivity	Specificity	Average
1	0.7474	0.1389	0.2174	0.8187	0.4806
3	0.7990	0.1364	0.1304	0.8889	0.4887
5	0.8918	1	0.0870	1	0.7447
7	0.8918	1	0.0870	1	0.7447
9	0.8814	0	0	1	0.4704
11	0.8814	0	0	1	0.4704
Average	0.8488	0.3792	0.0870	0.9513	0.5666

Table 7

Performance benchmark for the derived feature set comprising Statistics on SVM and QDA scores. PPV is the Positive Predictive Value benchmark, also known as Precision. The highlighted row marks the best performing $k$ value based on average score of all metrics

$k$	Accuracy	PPV	Sensitivity	Specificity	Average
1	0.9278	0.68	0.7391	0.9532	0.8250
3	0.9433	0.7727	0.7391	0.9708	0.8565
5	0.9485	0.8095	0.7391	0.9766	0.8684
7	0.9536	0.8182	0.7826	0.9766	0.8828
9	0.9485	0.76	0.8261	0.9649	0.8749
11	0.9485	0.76	0.8261	0.9649	0.8749
Average	0.9450	0.7667	0.7754	0.9678	0.8637

According to the results in Table 5, Model 1 performs the least in sensitivity and PPV, but strongly in specificity (95.91%), with the best performance attained at $k=$ 9. Notably at this point, the high specificity result is a pattern shown by all models with scores at 95.13% and 96.78% for Model 2 and Model 3 respectively. This model performs poorly at all parameter values of $k$ in the PPV criteria, with an average precision of 18.22%.

Model 2 (Table 6) marginally improves the sensitivity score at 8.7% in comparison to Model 1. Its PPV score is more than double that of Model 1, but is still significantly lower at 37.92% when averaged across all values of $k$ . However, it should be noted that it gives perfect scores for the PPV metric at $k=5$ and $k=7$ . Evidently, these two parameter values offer the best performance for this model. It outscores Model 1 in all benchmarks with an average improvement of 18.43%, considering the optimal parameter settings for both models. Its best sensitivity score of 8.7% (at its optimal $k$ value, or 21.74% across all values of $k$ ) is however below ideal for practical application.

The proposed model (Table 7) outperforms both Models 1 and 2 in the average scores of all metrics. It has the best performance at $k=7$ with an average score of 88.28% across all metrics, which is a significant improvement by 13.81% and 32.24% over Model 2 and Model 1 respectively. It registers its least performance of 68% in the PPV metric at $k=1$ . Of note, especially in comparison with the other models, is its consistent performance across all benchmarks, with scores significantly above 50%.

The high specificity scores, which are perfect in some settings for Models 1 and 2, might initially suggest good discrimination capability regarding negative cases for those models, until they are balanced with the other metrics. The experimental setup involved using all database images as query images at some point in the iterations. Given that the dataset samples are skewed towards non-malignant cases, the high specificity values of Models 1 and 2 might have undesirably been buoyed by that fact. This reasoning is supported by their low sensitivity and precision values. The high sensitivity values of the proposed model imply that it is relatively robust and effective in its ability to discriminate malignant cases even when the dataset’s class ratio is significantly imbalanced.

Figure 7.

Accurate query results using an mdb227 ROI. The query image appears on the top-left. The incorrect result has been highlighted by a dark rectangle (bottom left).

Table 8

Comparison of proposed model against recent similar works. The values are expressed as percentage scores

Author	PPV(%)	Specificity (%)	Sensitivity (%)	Accuracy (%)
Tsochatzidis et al. [22]	N/A	N/A	N/A	60
Our model	82	98	78	95

Figure 8.

Inaccurate query results using an mdb238 ROI. The query image appears on the top-left. The incorrect results have been highlighted by a white rectangle.

Studies have discovered a high rate of false positives as well as missed detections by radiologists during breast cancer screening, with estimates placing the radiologists’ sensitivity at about 75% [8]. Accurate query results based on visual content have been reported to be a significant diagnostic aid to radiologists [16, 9]. The significantly positive scores of the proposed model across all the metrics employed imply a high and consistent ability to extract database cases closely matching a given query sample; the incorporation of more suitable features as demonstrated by the proposed model can enhance the accuracy of CBIR-based CAD systems in breast cancer diagnosis. Such systems when used as a second opinion, can in turn improve the quality of diagnostic decisions [16, 6]. The experimental results show that the proposed model can retrieve the correct results for 78 out of 100 queries involving positive samples. It retrieves the correct image results for 98 out of 100 queries involving negative samples. The high predictive value of 82% assigns a commensurate credibility to the model’s positive results. Therefore, there is a high chance that the positive cases returned by the model are indeed positive. This is a desired attribute of a model in a clinical setting given that a low PPV leads to additional costs and negative pyschological effect on patients as follow up examination is necessitated to establish the actual diagnosis.

Most classification errors of the proposed model can be attributed to mammogram cases with dense fibroglandular tissues (Fig. 8); the intensity profile of such cases is very similar to that of microcalcifications making differentiation more challenging – this has been noted as a problem in similar studies as well and remains an open area of research [6, 15]. Our model nonetheless contributes to the field of computer aided diagnosis of breast cancer by introducing an improved feature characterization approach. Vijayalakshmi et al. [8] present a similar system combining the Local Binary Pattern (LBP) and the Artificial Neural Network classifier. While they report high scores (between 92.5%–100%) in the four metrics they use (specificity, accuracy, sensitivity and accuracy), their tests are however conducted on a smaller dataset of 80 images. Furthermore, the classes represented in their dataset are equally balanced as contrasted to the imbalanced class representation in this study.

Table 8 demonstrates the competitive performance of the proposed model in comparison to other related works. In particular, Tsochatzidis et al. [22] present a supervised retrieval model based on the SVM for malignancy assessment. Their feature vector is derived from participation values of support vectors. Their model considers all BI-RADS categories with the database composed of a total of 87 ROIs. In Table 8, we also consider their average accuracy score across all categories. Figures 7 and 8 give image results to sample queries for both accurate and inaccurate scenarios respectively. In Fig. 7, the inaccurate result appears in fifth position among the returned results. In Fig. 8, the inaccurate results appear in positions one, two, five and six.

As further work, the proposed model can be enhanced by including a physician-in-the-loop approach, where the results are assigned relevance scores, with the same being used to modify the weights of the attributes. Additionally, the incorporation of other pathological features important to the detection of breast cancer such as breast masses can be included to provide a wholistic diagnostic approach, given that this study focused only on microcalcifications. Modeling of the dense tissues and their differentiation from microcalcifications can be investigated as a means of reducing the false positive cases encountered in this study.

5. Conclusion

Content-based image retrieval is a potentially useful technique in supporting diagnostic decisions by availing similar cases with known pathology to radiologists. Accuracy and efficacy are required for CBIR systems to be adopted in regular medical practice. This paper presented an improved model for the retrieval of mammograms based on their pathology. The main contribution of this study is the combination of classifier scores, with statistics on the same to construct an effective feature vector for improving the accuracy of the retrieval of mammogram cases based on pathology and more specifically, the malignancy of clusters, where present. The feature characterization model was further improved by appropriate weighting of the features based on results of their individual discrimination ability. Experiments benchmarked the model’s performance using a wide range of metrics, with results showing increased relative effectiveness of the proposed model over the common application of texture/geometric features or their scores alone as widely applied in the literature. Further works on this model can consider extending the classification problem to include other pathologies according to BI-RADS classes and addressing the negative effect of dense fibroglandular tissues on microcalcification characterization.

References

Dale Addison

J.F.

Wermter

and Arevian

G.Z.

, A comparison of feature extraction and selection techniques, In Proceedings of the International Conference on Artificial Neural Networks, 2003, pp. 212–215.

Aksoy

and Haralick

R.M.

, Feature normalization and likelihood-based similarity measures for image retrieval, Pattern Recogn. Lett. 22(5) (April 2001), 563–582.

Alberg

A.J.

Park

J.W.

Hager

B.W.

Brock

M.V.

and Diener-West

, The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests, Journal of General Internal Medicine 19(5) (2004), 460–465.

Alolfe

M.A.

Mohamed

W.A.

Youssef

A.M.

Kadah

Y.M.

and Mohamed

A.S.

, Feature Selection in Computer Aided Diagnostic System for Microcalcification Detection in Digital Mammograms, In 26th National Radio Science Conference, NRSC’, 2009, pages 1–9.

American Cancer Society, Global Cancer Facts & Figures, Annual report 3, Atlanta, February 2015.

Andreadis

I.I.

Spyrou

G.M.

and Nikita

, A comparative study of image features for classification of breast microcalcifications, Measurement Science and Technology 22(11) (October 2011), 1–9.

Beniwal

and Arora

, Classification and feature selection techniques in data mining, International Journal of Engineering Research & Technology 1(6) (August 2012), 1–6.

Vijayalakshmi

Bhanumathi

and Suresh

G.R.

, Study of mammogram micro calcification to aid tumour detection using artificial neural network based classifier, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering 3(2) (April 2014), 644–650.

Choraś

R.S.

, Image feature extraction techniques and their applications for cbir and biometrics systems, International Journal of Biology and Biomedical Engineering 1(1) (2007), 6–16.

10.

Collins

and Okada

, A comparative study of similarity measures for content-based medical image retrieval, In Pamela Forner, Jussi Karlgren, and Christa Womser-Hacker, editors, CLEF (Online Working Notes/Labs/Workshop), 2012.

11.

da Silva

S.F.

Ribeiro

M.X.

do E.S. Batista Neto

Traina-Jr.

and Traina

A.J.M.

, Improving the ranking quality of medical image retrieval using a genetic feature selection method, Decision Support Systems 51(4) (2011), 810–820.

12.

de Oliveira

J.E.E.

de Albuquerque Araújo

and Deserno

T.M.

, Content-based image retrieval applied to BI-RADS tissue classification in screening mammography, World J Radiol. 3(1) (January 2011), 24–31.

13.

El-Naqa

Yang

Galatsanos

N.P.

Nishikawa

R.M.

and Wernick

M.N.

, A similarity learning approach to content-based image retrieval: application to digital mammography, IEEE Transactions On Medical Imaging 23(10) (2004).

14.

El-Naqa

Yang

Galatsanos

N.P.

and Wernick

M.N.

, Content-based image retrieval for digital mammography, ICIP 3 (2002), 141–144.

15.

Engelken

Bremme

Bick

Hammann-Kloss

and Fallenberg

E.M.

, Factors affecting the rate of false positive marks in CAD in full-field digital mammography, European Journal of Radiology 81(8) (August 2012), 844–848.

16.

Gilbert

J.F.

Astley

S.M.

Gillan

M.G.C.

Agbaje

O.F.

Wallis

M.G.

James

Boggis

C.R.M.

and Duffy

S.W.

, Single reading with computer-aided detection for screening mammography, N Engl J Med 359 (2008), 1675–1684.

17.

Yang

Jing

and Nishikawa

R.M.

, Retrieval boosted computer-aided diagnosis of clustered microcalcifications for breast cancer, Medical Physics 39 (2012), 676–685.

18.

Jeffery

I.B.

Higgins

D.G.

and Culhane

A.C.

, Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data, BMC Bioinformatics 7(1) (2006), 1–16.

19.

Jing

Yang

and Nishikawa

R.M.

, Regularization in retrieval-driven classification of clustered microcalcifications for breast cancer, Journal of Biomedical Imaging 3 (January 2012), 1–3:8.

20.

Kinoshita

S.K.

de Azevedo-Marques

P.M.

Pereira

R.R.

Rodrigues

J.A.H.

and Rangayyan

R.M.

, Content-based retrieval of mammograms using visual features related to breast density patterns, Journal of Digital Imaging 20(2) (2007), 172–190.

21.

Kudo

and Sklansky

, Comparison of algorithms that select features for pattern classifiers, Pattern Recognition 33(1) (2000), 25–41.

22.

Tsochatzidis

Zagoris

Savelonas

Papamarkos

Pratikakis

Arikidis

and Costaridou

, Microcalcification oriented content-based mammogram retrieval for breast cancer diagnosis, In IEEE International Conference Imaging Systems and Techniques (IST), 2014, pp. 257–262.

23.

Ladha

and Deepa

, Feature selection methods and algorithms, International Journal on Computer Science and Engineering (IJCSE) 3(5) (2011), 1787–1797.

24.

Lederman

Leichter

Ratner

Abramov

Manevich

and Stoeckel

, Should CAD be used as a second reader? Exploring two alternative reading modes for CAD in screening mammography, In 10th International Conference on Digital Mammography, 2010, pp. 161–167.

25.

Metz

C.E.

, Basic principles of ROC analysis, Seminars in Nuclear Medicine 8 (1978), 283–298.

26.

Mokji

M.M.

and Abu-Bakar

S.A.R.

, Gray level co-occurrence matrix computation based on haar wavelet, In 4th International Conference on Computer Graphics, Imaging and Visualization (CGIV 2007), August 14–16, 2007, Bangkok, Thailand, 2007, pp 273–279..

27.

Papadopoulos

Fotiadis

D.I.

and Likas

, An automatic microcalcification detection system based on a hybrid neural network classifier, Artificial Intelligence in Medicine 25(2) (2002), 149–167.

28.

Parikh

Mathai

Parikh

Chandra Sekhar

and Thomas

, Understanding and using sensitivity, specificity and predictive values, Indian Journal of Ophthalmology 56(1) (2008), 45–50.

29.

Peng

and Jiang

, A novel feature selection approach for biomedical data classification, J. of Biomedical Informatics 43(1) (February 2010), 15–23.

30.

Srivastava

Gupta

M.R.

and Frigyik

B.A.

, Bayesian Quadratic Discriminant Analysis, Journal of Machine Learning Research 8 (2007), 1277–1305.

31.

Suckling

Parker

Dance

D.R.

Astley

Hutt

Boggis

C.R.M.

Ricketts

Stamatakis

Cerneaz

Kok

S.-L.

Taylor

Betal

and Savage

, The mammographic image analysis society digital mammogram database, In Proceedings of the 2nd International Workshop on Digital Mammography, 1994, pp. 375–378.

32.

Swiniarski

and Swiniarska

, Comparison of Feature Extraction and Selection Methods in Mammogram Recognition, Annals of the New York Academy of Sciences 980(1) (2002), 116–124.

33.

Tsochatzidis

L.T.

Zagoris

Savelonas

M.A.

and Pratikakis

, SVM-based CBIR of breast masses on mammograms, In AI-AM/NetMed@ECAI’14, 2014, pp. 26–30.

34.

Wei

Yang

and Nishikawa

R.M.

, Microcalcification classification assisted by content-based image Retrieval for breast cancer diagnosis, Pattern Recognit 42(6) (2009), 1126–1132.

35.

Wei

Yang

Nishikawa

R.M.

and Wernick

M.N.

, Learning of perceptual similarity from expert readers for mammogram retrieval, IEEE (2006), 1356–1359.

36.

Weiss

S.M.

and Indurkhya

, Predictive data mining: a practical guide, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.

37.

Zheng

, Computer-aided diagnosis in mammography using content-based image retrieval approaches: current status and future perspectives, Algorithms 2(2) (2009), 828–849.

38.

Zhu

Zeng

and Wang

, Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS® implementations, In NESUG proceedings: Health care and life sciences, Baltimore, Maryland, 2010, 1–9.

Mammogram content-based image retrieval based on malignancy classification

Abstract

Keywords

1. Introduction

3. Materials and methods

Table 1 Individual feature vector v → i , with dimension | v → i | = 125

3.2.1 Feature extraction

Table 2 Cluster feature vector v → c , with dimension | v → c | = 141 . The meanings of the abbreviations are as follows: CC – Cluster calcifications, CVH – Cluster convex hull, CR – Cluster region, B – Binary image, GL – Grey level image, μ – mean, σ – Standard deviation

3.5 Classifier scoring

Table 3 The final feature vector is a combination of classifier scores as well as basic first order statistics on them. This table also shows the individual performance of the features based on the independent features significance test described in Section 3.3

4.1 Image dataset and experimental setup

4.2 Performance metric

Table 4 2 × 2 contingency table depicting True Positives (TP), False Positives(FP), True Negatives (TN) and False Negatives(FN)

Table 5 Performance benchmark using selected haralick and geometric features as detailed in Section 3.3. PPV is the Positive Predictive Value benchmark, also known as Precision. The highlighted row marks the best performing k value based on average score of all metrics

References

Table 1
Individual feature vector $\vec{v}_{i}$ , with dimension $|\vec{v}_{i}|=125$

Table 3
The final feature vector is a combination of classifier scores as well as basic first order statistics on them. This table also shows the individual performance of the features based on the independent features significance test described in Section 3.3

Table 4
2 $\times$ 2 contingency table depicting True Positives (TP), False Positives(FP), True Negatives (TN) and False Negatives(FN)

Table 5
Performance benchmark using selected haralick and geometric features as detailed in Section 3.3. PPV is the Positive Predictive Value benchmark, also known as Precision. The highlighted row marks the best performing $k$ value based on average score of all metrics