A novel nearest neighbors algorithm based on power muirhead mean

Abstract

This paper introduces the innovative Power Muirhead Mean K-Nearest Neighbors (PMM-KNN) algorithm, a novel data classification approach that combines the K-Nearest Neighbors method with the adaptive Power Muirhead Mean operator. The proposed methodology aims to address the limitations of traditional KNN by leveraging the Power Muirhead Mean for calculating the local means of K-nearest neighbors in each class to the query sample. Extensive experimentation on diverse benchmark datasets demonstrates the superiority of PMM-KNN over other classification methods. Results indicate statistically significant improvements in accuracy on various datasets, particularly those with complex and high-dimensional distributions. The adaptability of the Power Muirhead Mean empowers PMM-KNN to effectively capture underlying data structures, leading to enhanced accuracy and robustness. The findings highlight the potential of PMM-KNN as a powerful and versatile tool for data classification tasks, encouraging further research to explore its application in real-world scenarios and the automation of Power Muirhead Mean parameters to unleash its full potential.

Keywords

Machine learning K-Nearest neighbors power muirhead mean

1. Introduction

Numerous domains, including pattern recognition and artificial intelligence, statistics, cognitive psychology, vision analysis, and medicine, rely on object classification for their research and practical applications.

The K-Nearest Neighbors (KNN) classifier tackles the problem of classification by first determining the distance between a new sample to be classified and the training samples, then observing the k-nearest neighbors for the new sample, and finally deciding whether or not the new sample belongs to the class that shares the most neighbors with the new sample [1].

Typically, KNN is a useful classifier, but it is worth mentioning that, because the parametric form of the density function cannot be assumed, non-parametric classifiers need a large number of samples. When working with a tiny sample size, this might be a serious problem [2, 3, 4].

Moreover, a well-known problem with KNN classifiers is that they are susceptible to outliers that distort the class distribution [4].

The majority voting principle is the foundation of KNN, which determines the class of a new sample by looking at its closest neighbors and the class that constitutes the majority of those neighbors. In the case that a data set is obviously unbalanced, one of the drawbacks of the majority voting principle is that the classified samples of the class or classes that have a large number of samples have a tendency to dominate the prediction of the new sample. This is simply due to the fact that they are frequently more numerous among the k nearest neighbors [5, 6].

The Muirhead Mean (MM), first introduced by Muirhead in 1902, is a generic aggregation function. It can be defined as a function of means and has been used in various applications owing to its ability to acknowledge interrelationships [7].

The Power Muirhead Mean (PMM) is an extension of the MM, designed to mitigate the adverse impact of outliers and emphasize central data points more effectively. This is achieved by combining the MM with a Power Average (PA), thus allowing the algorithm to adapt to various data distributions more robustly.

In this research, we offer a method for overcoming the limitations of the KNN classifier by using the Power Muirhead Mean of K nearest neighbors for each class separately.

2. Literature review

This section summarizes the theoretical foundations of the KNN classifier, Muirhead Mean (MM) Operator, Power Average (PA), and Power Muirhead Mean (PMM).

2.1 K-nearest neighbors classifier

The classic KNN algorithm is one of the earliest and most basic techniques for classifying patterns. In spite of this, it often produces competitive outcomes, and in particular fields, when supplemented with previous knowledge, it has greatly improved the state of the art. The KNN rule classifies each unlabeled instance according to the majority label of its k-nearest neighbors in the training set. Therefore, its success is highly dependent on the distance measure used to locate the closest neighbors. In the absence of previous information, the majority of KNN classifiers utilize a basic Euclidean metric to quantify dissimilarities across vector input instances. The following equation describes Euclidean distance [8].

$\displaystyle d({V},{M})=\sqrt{{{\sum\limits_{i=1}^{n}{({{V_{i}}-{M_{i}}})}^{2% }}}}$ (1)

where we define vectors $V=(V_{1},V_{2},V_{3},\ldots,V_{n}),M=(M_{1},M_{2},M_{3},\ldots,M_{n})$ , $n$ is the dimensionality of the vector input, namely, the number of examples’ attributes. The smaller $d(x_{i},x_{j})$ are the two examples that are more similar. A test sample’s class label is chosen by the majority vote of its $k$ closest neighbors.

$\displaystyle y({d_{i}})=\arg\max\limits_{k}{\sum\limits_{x_{j}\in\textit{KNN}% }{y(x_{j},{c_{k}})}}$ (2)

where $d_{i}$ is a test example, $x_{j}$ is one of its k nearest neighbors in the training set, $y(x_{j},c_{k})$ indicates that whether $x_{j}$ belongs to class $c_{k}$ . Equation (2) means that the prediction will be the class having most members in the k nearest neighbors [8].

2.2 Muirhead Mean Operator

The Muirhead Mean Operator provides a general aggregation function, and it is defined as follows:

Let $\alpha_{i}(i=1,2,...,n)$ be a collection of nonnegative real numbers, and $P=({p_{1}},{p_{2}},\ldots,{p_{n}})\in{R_{n}}$ be a vector of parameters [9, 10].

$\displaystyle{\textit{MM}^{P}}({\alpha_{1}},{\alpha_{1}},\ldots,{\alpha_{n}})=% {\left({{1\over{n!}}\sum\limits_{\vartheta\in{S_{n}}}{\prod\limits_{j=1}^{n}{% \alpha_{\vartheta(j)}^{{p_{j}}}}}}\right)^{{1\over{\sum\limits_{j=1}^{n}{{p_{j% }}}}}}}$ (3)

We call $\textit{MM}^{P}$ the Muirhead mean (MM), where $\vartheta(j)(j=1,2,\ldots,n)$ is any a permutation of $(1,2,\ldots,n)$ , and $S_{n}$ is the collection of all permutations of $(1,2,\ldots,n)$ [10].

Moreover, from Eq. (3), we can know:

If $P=(1,0,\ldots,0)$ , the MM reduces to $\textit{MM}^{(1,0,\ldots,0)}={1\over n}\sum\limits_{j=1}^{n}{{\alpha_{j}}}$ which is the arithmetic averaging operator.

If $P=({\raise 3.01pt\hbox{$1$}\!\mathord{\left/{\vphantom{1n}}\right.\kern-1.2pt}% \!\lower 3.01pt\hbox{$n$}},{\raise 3.01pt\hbox{$1$}\!\mathord{\left/{\vphantom% {1n}}\right.\kern-1.2pt}\!\lower 3.01pt\hbox{$n$}},\ldots,{\raise 3.01pt\hbox{% $1$}\!\mathord{\left/{\vphantom{1n}}\right.\kern-1.2pt}\!\lower 3.01pt\hbox{$n% $}})$ , the MM reduces to ${\textit{MM}^{(1/n,1/n,\ldots,1/n)}}=\prod\limits_{j=1}^{n}{\alpha_{j}^{1/n}}$ which is the geometric averaging operator.

If $P=(1,1,0,0,\ldots,0)$ , the MM reduces to $\textit{MM}^{(1,1,0,0,\ldots,0)}={\left({{1\over{n(n-1)}}\sum\limits_{{}_{i% \neq j}^{i,j=1}}^{n}{{\alpha_{i}}{\alpha_{j}}}}\right)^{{1\over 2}}}$ which is the Bonferroni mean operator [11].

If $P\!=\!{(\overbrace{1,1,\ldots,1}^{k},\overbrace{0,0,\ldots,0}^{n-k})}$ , the MM reduces to $\textit{MM}^{(\overbrace{1,1,\ldots,1}^{k},\overbrace{0,0,\ldots,0}^{n-k})}\!=% \!\!{\left(\!\!{{{\mathop{\oplus}\limits_{{}_{1\!\leqslant\!{i_{1}}\prec\ldots% \prec{i_{k}}\!\leqslant\!n}}\mathop{\otimes}\limits_{{}_{j\!=\!1}}^{k}{\alpha_% {{i_{j}}}}}\over{C_{n}^{k}}}}\!\!\right)^{\!\!{\frac{1}{k}}}}$ which is the Maclaurin symmetric mean operator [12].

From the Eq. (3) and the special cases of MM operator mentioned above, we can know that the advantage of the MM operator is that it can capture the overall interrelationships among the multiple aggregated arguments and it is a generalization of most existing aggregation operators [10].

2.3 Power average

PA operator was introduced for the first time by Yager [13] for crisp numbers. The most significant benefit of the PA operator is its ability to mitigate the adverse impact of excessively high and low arguments on the final findings.

Let $\alpha_{i}(i=1,2,\ldots,n)$ be a collection of numbers. The PA operator is expressed as follows:

$\displaystyle\textit{PA}({\alpha_{1}},{\alpha_{2}},\ldots,{\alpha_{n}})=\sum% \limits_{i=1}^{n}{\left({\frac{{(1+T({\alpha_{i}})){\alpha_{i}}}}{{\sum\limits% _{j=1}^{n}{(1+T({\alpha_{j}}))}}}}\right)}$ (4)

where $T({\alpha_{i}})=\sum\nolimits_{j=1,i\neq j}^{n}{\textit{Sup}({\alpha_{i}},{% \alpha_{j}})}$ and ${\textit{Sup}({\alpha_{i}},{\alpha_{j}})}$ is the support for ${\alpha_{i}}$ and ${\alpha_{j}}$ , satisfying the following properties:

$\textit{Sup}({\alpha_{i}},{\alpha_{j}})\in[0,1],$

$\textit{Sup}({\alpha_{i}},{\alpha_{j}})=\textit{Sup}({\alpha_{j}},{\alpha_{i}}),$

If $d({\alpha_{i}},{\alpha_{j}})<d({\alpha_{l}},{\alpha_{k}}),$ then $\textit{Sup}({\alpha_{i}},{\alpha_{j}})>\textit{Sup}({\alpha_{l}},{\alpha_{k}}),$ where $d({\alpha_{i}},{\alpha_{j}})$ is the distance between $\alpha_{i}$ and $\alpha_{j}$ .

For simplicity, we’ll refer to the above properties as Property 1 [14].

2.4 Power muirhead mean

Li et al. [14] proposed the Power Muirhead Mean by combining PA with MM.

Let $\alpha_{i}(i=1,2,\ldots,n)$ be a collection of numbers, and $P=({p_{1}},{p_{2}},\ldots,{p_{n}})\in{R^{n}}$ be a vector of parameters. Then, PMM is defined as follows:

$\displaystyle\textit{PMM}^{P}({\alpha_{1}},{\alpha_{2}},\ldots{\alpha_{n}})={% \left({\frac{1}{{n!}}{{\sum\limits_{\vartheta\in{S_{n}}}{\prod\limits_{i=1}^{n% }{\left({\frac{{n(1+T({\alpha_{\vartheta(i)}})){\alpha_{\vartheta(i)}}}}{{\sum% \limits_{j=1}^{n}{(1+T({\alpha_{j}}))}}}}\right)}}}^{{p_{i}}}}}\right)^{\frac{% 1}{{\sum\limits_{i=1}^{n}{{p_{i}}}}}}}$ (5)

where $T({\alpha_{i}})=\sum\nolimits_{j=1,j\neq i}^{n}{\textit{Sup}}({\alpha_{i}},{% \alpha_{j}})$ and $\textit{Sup}({\alpha_{i}},{\alpha_{j}})$ is the support for ${\alpha_{i}}$ and ${\alpha_{j}}$ , satisfying the Property 1 [14].

3. Methodology

In this section, we have introduced a new version of KNN which uses PMM to overcome the alluded issues of naïve KNN.

3.1 K-nearest neighbors based on power muirhead mean

Our proposed methodology represents a fusion of the K Nearest Neighbors (KNN) algorithm with the innovative Power Muirhead Mean, changing the landscape of data classification. This novel approach aims to enhance the accuracy and robustness of traditional KNN by efficiently handling data distributions and emphasizing central data points while mitigating the impact of outliers.

The PMM-KNN algorithm diverges from traditional KNN and its modifications in several key ways:

•
Outlier Robustness: By incorporating the Power Average with the Muirhead Mean, PMM-KNN reduces the influence of outliers on the classification process.
•
Adaptive Aggregation: The use of PMM allows for a more flexible aggregation of nearest neighbors, taking into account the interrelationships between data points more effectively than traditional means.
•
Improved Centroid Calculation: PMM provides a sophisticated method for centroid calculation, leading to more accurate class representation and, consequently, better classification performance.

The theoretical implications of this combination lie in the enhanced ability to capture complex data structures and relationships, which is crucial for high-dimensional and imbalanced datasets. By leveraging the adaptive nature of PMM, our approach not only improves classification accuracy but also provides a more nuanced understanding of the underlying data distribution.
3.1.1 Distance computation

The first step of our methodology involves computing the distances between each test data point and all training data points in the dataset. This initial distance calculation allows us to determine the proximity of the test data point to each individual training data point. By effectively capturing the local structure of the data, this step lays the foundation for our subsequent data aggregation process.

3.1.2 Selecting K nearest neighbors

With the distances calculated, we proceed to select K data points from each class that exhibit the shortest distances to the respective test data point. The K parameter serves as a crucial hyperparameter, enabling us to fine-tune the granularity of the data neighborhood considered during the classification process. This selection mechanism ensures that we capture a sufficient representation of the local data distribution for each class, facilitating robust classification outcomes.

3.1.3 Calculate centroids using power muirhead mean

To enrich our methodology further, we use the concept of Power Muirhead Mean, a powerful mathematical tool capable of capturing the essence of data distributions. The PMM uses a vector of parameters $P=(p_{1},p_{2},\ldots,p_{n})$ to adjust the weights assigned to different data points, allowing us to emphasize central data points over outliers.

These parameters were selected based on preliminary experiments and domain knowledge, aiming to balance the trade-off between robustness to outliers and sensitivity to the data’s central tendency.

3.1.4 Aggregating K nearest neighbors

With the K Nearest Neighbors selected for each class, we proceed to aggregate these data points using the Power Muirhead Mean. The aggregation process incorporates the $P$ vector, enabling the mean to adapt to the distribution characteristics of each class. By virtue of this adaptability, our approach offers a significant advantage over traditional KNN, as it effectively captures the data’s underlying distribution and provides a more informed and robust representation of each class’s centroid.

3.1.5 Predicting the test data class

Having obtained the Power Muirhead Mean centroids for each class, we assess the distance between the test data point and each of these centroids. The class corresponding to the centroid with the minimum distance to the test data point is considered the predicted class for the input data. This distance-based classification strategy ensures that we leverage the collective information of the K Nearest Neighbors while effectively incorporating the adaptability of the Power Muirhead Mean to produce accurate and reliable predictions.

PMM-KNN[1] $\{{x_{i}},{c_{i}}\}_{i=1}^{N}$ (Labeled Data)

$q$ (Query Sample)

$k$ $(1\leqslant k\leqslant N)$ (Parameter of KNN)

$P=({p_{1}},{p_{2}},\ldots,{p_{k}})$ (PMM Parameters) $y$ (The Label for Query Sample $q$ ) $i=1$ to $N$ $\textit{dis}{t_{{c_{i}}}}[x_{i}]={d_{\textit{EUC}}}(q,{x_{i}})$ c in classes Sort $\textit{dis}{t_{{c}}}$ by value $\textit{dis}{t_{{c}}=}$ First k elements of ${\textit{dis}{t_{{c}}}}$ Calculate $\textit{cent}{{}_{{c}}}$ by applying Eq. (5) on $\textit{dis}{t_{{c}}}$ keys $\textit{cDist}[c]={d_{\textit{EUC}}}(q,\textit{cent}{{}_{{c}}})$ $\mathop{\arg\min}\limits_{c}$ cDist

Figure 1.

Flowchart of the PMM-KNN algorithm.

3.2 Potential optimization techniques

While the PMM-KNN algorithm shows improved classification accuracy, it also incurs a marginal increase in computational time. To mitigate this issue in practical applications, several optimization techniques can be considered.

3.2.1 Dimensionality reduction

Reducing the dimensionality of the data before applying the PMM-KNN algorithm can decrease computational complexity. Techniques such as Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) can be employed to project high-dimensional data into a lower-dimensional space while preserving essential data structure. This reduction in dimensionality can significantly decrease the computational burden during the distance computation and aggregation phases.

3.2.2 Efficient data structures

Implementing efficient data structures such as k-d trees or ball trees can expedite the nearest neighbor search process. These data structures enable faster querying of k-nearest neighbors, especially in high-dimensional spaces, thereby reducing overall computational time.

3.2.3 Parallel processing

Parallel processing techniques can distribute the computational load across multiple processors or cores. By parallelizing distance computations and centroid calculations, the PMM-KNN algorithm can achieve a substantial reduction in execution time.

3.2.4 Approximate nearest neighbors

Approximate nearest neighbor search algorithms, such as Locality-Sensitive Hashing (LSH), offer a trade-off between accuracy and computational efficiency by approximating the k-nearest neighbors instead of computing them exactly. This approach can lead to significant speedups with minimal impact on classification accuracy.

3.2.5 Incremental updates

For dynamic datasets where new data points are continuously added, incremental update techniques can be employed. These techniques allow for efficient updates by only processing new data points and their impact on existing centroids, rather than recomputing the entire PMM-KNN model from scratch.

In summary, our method amalgamates the strengths of K Nearest Neighbors and Power Muirhead Mean, paving the way for a more sophisticated and powerful approach to data classification. By seamlessly integrating data neighborhood selection with adaptable data aggregation, our methodology outperforms traditional KNN algorithms and other state-of-the-art classification techniques, setting a new standard for accuracy and reliability in data analysis and pattern recognition tasks.

4. Data and testing

In this section, we will discuss the datasets that we have chosen to use as our benchmarks, and then we will explain the assessment approach that we have chosen for our model.

4.1 Datasets

In this research, we have used five real-world datasets, which have shown in Table 1.

Table 1
All datasets used for evaluation of the PMM-KNN model

Dataset	Description
Iris dataset	The dataset contain 150 sample data in it. The dataset has three classes of data that are Setosa, Versicolor and Virginica each having 50 sample data.
Breast cancer wisconsin	Wisconsin-Breast Cancer (Diagnostics) dataset (WBC) from UCI machine learning repository is a classification dataset, which records the measurements for breast cancer cases. There are two classes, benign and malignant. The malignant class of this dataset is downsampled to 21 points, which are considered as outliers, while points in the benign class are considered inliers.
Digits	The original optical recognition of handwritten digits dataset from UCI machine learning repository is a multi-class classification dataset. The instances of digits 1 9 are inliers and instances of digit 0 are down-sampled to 150 outliers.
Landsat satellite (statlog)	The original Statlog (Landsat Satellite) dataset from UCI machine learning repository is a multi-class classification dataset. Here, the training and test data are combined. The smallest three classes, i.e. 2, 4, 5 are combined to form the outliers class, while all the other classes are combined to form an inlier class.
EEG eye state	The data set consists of 14 EEG values and a value indicating the eye state. All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analyzing the video frames. ‘1’ indicates the eye-closed and ‘0’ the eye-open state. All values are in chronological order with the first measured value at the top of the data.

4.2 Performance assessment

In this study, we acknowledge certain limitations and assumptions in our experimental design. One limitation is the selection of datasets, which may not encompass the full diversity of real-world data distributions. Consequently, the generalizability of our results to all possible datasets remains uncertain. Additionally, the choice of PMM parameters, while optimized for our selected datasets, may require further tuning for different data scenarios.

To address these limitations, future work should include testing the PMM-KNN algorithm on a broader range of datasets, including those from real-world applications, to validate its effectiveness and versatility. Moreover, exploring automatic parameter selection methods for PMM can enhance the algorithm’s adaptability across diverse datasets.

Next, we are going to briefly introduce the criteria that have been used to measure our model performance. As we know, the proportion of true predictions to all predictions named Accuracy can not describe our model solus. Therefore, we have introduced two other assessments consisting of Specificity and Sensitivity.

4.2.1 Specificity and sensitivity

The ability of a model to correctly identify true positives in each of the given categories is measured by the parameter known as Sensitivity. A model’s capacity to assess and forecast the true negatives of each accessible category is referred to as its specificity. All category models may benefit from using these measures. The equation that needs to be used to calculate these metrics is provided down below. As we need to define positive and negative classes, we should select all possible pairs from classes and assign “positive” and “negative” labels to them. Then, a True Positive (TP) is a result for which the model accurately predicted the positive class. A True Negative (TN) is a result in which the model accurately predicts the negative class. False Positives (FP) are outcomes in which the model forecasts the positive class inaccurately, and finally, a False Negative (FN) is an outcome where the model incorrectly predicts the negative class.

$\displaystyle\textit{Sensitivity}=\frac{\textit{TP}}{{\textit{TP}+\textit{FN}}}$ (6) $\displaystyle\textit{Specificity}=\frac{\textit{TN}}{\textit{FP}+\textit{TN}}$ (7)

4.3 Parameter selection

Selecting optimal parameters for the Power Muirhead Mean (PMM) is crucial for the performance of the PMM-KNN algorithm. The parameters $P=(p_{1},p_{2},\ldots,p_{n})$ in the PMM determine the weights assigned to different data points, impacting the robustness and adaptability of the aggregation process. In this study, parameters were manually tuned based on preliminary experiments and domain knowledge. However, for practical applications, automated parameter selection methods can be explored to enhance the algorithm’s adaptability across diverse datasets.

4.3.1 Grid search

Grid search involves exhaustively searching a predefined set of parameter values to identify the combination that yields the best performance. This method, although computationally intensive, provides a comprehensive evaluation of the parameter space.

4.3.2 Random search

Random search offers a more efficient alternative to grid search by randomly sampling parameter values from predefined distributions. This approach can often find near-optimal parameter settings with fewer evaluations.

4.3.3 Bayesian optimization

Bayesian optimization is a probabilistic model-based approach that iteratively selects the most promising parameter values based on past evaluations. This method balances exploration and exploitation, leading to efficient convergence to optimal or near-optimal parameters.

4.3.4 Cross-validation

Integrating cross-validation within the parameter selection process provides robust estimates of model performance for different parameter settings. Evaluating the algorithm on multiple folds of data ensures that the selected parameters generalize well to unseen data.

5. Results

In this section, we will discuss the experiment results and analysis of the proposed method.

We have thoroughly compared the performance of the PMM-KNN method with several state-of-the-art classification methods, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Naïve Bayes (NB). To ensure fair and meaningful comparisons, we diligently optimized the implementation details and settings of all comparative methods. For each method, we carefully tuned the hyperparameters using cross-validation techniques, aiming to achieve the best possible performance.

Initially, we split our datasets into training and testing sets and utilized the same data partitions for evaluating all classifiers. Subsequently, we calculated the mean accuracy of each method using a robust 10-fold cross-validation approach. To optimize PMM-KNN, we considered the $P$ vector as a ones chain with the length of $p$ , maintaining simplicity while fine-tuning the algorithm for optimal results. Additionally, we selected Eq. (8) as the Support function, further enhancing the effectiveness of PMM-KNN.

The support equation Eq. (8) is essential for determining the degree of influence or support one data point provides to another within the Power Muirhead Mean framework. This support measure helps in understanding the relationships between different data points and influences the aggregation process.

The Eq. (8), ensures that the support value is normalized between 0 and 1. When the distance is zero (i.e., the data points are identical), the support is at its maximum value of 1. As the distance increases, the support value approaches zero.

The optimization process for the comparative methods, including KNN, SVM, and NB, was carried out meticulously. We explored a wide range of hyperparameter values, fine-tuning them through cross-validation, to ensure that each method’s settings were optimal. This attention to detail guarantees that the comparison between PMM-KNN and other classifiers is fair and unbiased.

By carefully optimizing the parameters and settings of all methods involved in the comparison, we can confidently assert that the reported performance improvements of PMM-KNN are not due to an unfair advantage in parameter selection. Instead, our findings accurately reflect the genuine superiority of PMM-KNN in comparison to the state-of-the-art classification methods. We trust that these optimization efforts strengthen the validity and reliability of our results, ensuring a comprehensive evaluation of the proposed PMM-KNN algorithm.

$\displaystyle\textit{Sup}({\alpha_{i}},{\alpha j})=\frac{1}{{1+d({\alpha_{i}},% {\alpha_{j}})}}$ (8)

The evaluation results for various datasets and classification methods are presented in Table 2. The table shows the mean accuracy along with sensitivity and specificity for each method on different datasets.

Table 2

Datasets evaluation results

Dataset	Measure	PMM-KNN	GNB	SVM	KNN
Landsat satellite	Accuracy	0.924	0.796	0.897	0.895
	Sensitivity	0.901	0.785	0.870	0.869
	Specificity	0.985	0.960	0.979	0.979
Iris dataset	Accuracy	0.980	0.953	0.960	0.960
	Sensitivity	0.983	0.950	0.963	0.962
	Specificity	0.990	0.977	0.980	0.980
Breast cancer	Accuracy	0.940	0.939	0.921	0.930
	Sensitivity	0.935	0.931	0.900	0.917
	Specificity	0.935	0.931	0.900	0.917
EEG eye state	Accuracy	0.979	0.452	0.551	0.943
	Sensitivity	0.978	0.498	0.500	0.941
	Specificity	0.978	0.498	0.500	0.941
Digits	Accuracy	0.993	0.839	0.988	0.986
	Sensitivity	0.993	0.835	0.989	0.985
	Specificity	0.999	0.982	0.999	0.998

Table 3

Comparison of PMM-KNN with other methods

Dataset	Method	Mean accuracy	STD accuracy	$P$ -value	Significant difference
Landsat satellite	PMM-KNN	0.9248	0.0078	–	–
	GNB	0.7958	0.0112	$<0.001$	Yes
	SVM	0.8968	0.0119	$<0.001$	Yes
	KNN	0.8954	0.0136	$<0.001$	Yes
Iris dataset	PMM-KNN	0.9800	0.0427	–	–
	GNB	0.9533	0.0427	0.117	No
	SVM	0.9600	0.0611	0.054	No
	KNN	0.9600	0.0611	0.054	No
Breast cancer	PMM-KNN	0.9403	0.0325	–	–
	GNB	0.9385	0.0306	0.676	No
	SVM	0.9209	0.0265	0.014	Yes
	KNN	0.9297	0.0282	0.138	No
EEG eye state	PMM-KNN	0.9786	0.0021	–	–
	GNB	0.4523	0.0177	$<0.001$	Yes
	SVM	0.5512	0.0134	$<0.001$	Yes
	KNN	0.9426	0.0031	$<0.001$	Yes
Digits	PMM-KNN	0.9928	0.0036	–	–
	GNB	0.8386	0.0303	$<0.001$	Yes
	SVM	0.9878	0.0065	$<0.001$	Yes
	KNN	0.9855	0.0083	$<0.001$	Yes

5.1 Statistical analysis

The PMM-KNN algorithm showed significant improvements in datasets with complex and high-dimensional distributions, such as the Landsat Satellite and EEG Eye State datasets. This can be attributed to the PMM’s ability to capture intricate data relationships and mitigate the impact of outliers.

In contrast, the improvements were less pronounced in simpler datasets like Iris, where the data structure is less complex, and traditional KNN already performs well. These variations highlight the strength of PMM-KNN in handling challenging classification scenarios.

To statistically validate the accuracy differences, we conducted paired t-tests for each dataset and method comparison. The results are shown in Table 3. The p-values indicate whether the accuracy differences are statistically significant or not. If the p-value is less than 0.05, the difference is considered statistically significant.

In the Landsat Satellite dataset, PMM-KNN exhibited statistically significant improvements over Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), and traditional K-Nearest Neighbors (KNN). Although PMM-KNN did not show significant differences on the Iris dataset, it maintained competitive performance. On the Breast Cancer dataset, PMM-KNN outperformed SVM with a statistically significant difference. For the EEG Eye State dataset, PMM-KNN achieved significant improvements over GNB, SVM, and KNN. Lastly, on the Digits dataset, PMM-KNN demonstrated outstanding performance, significantly outperforming GNB, SVM, and KNN. These consistent significant differences indicate the superior capability of PMM-KNN in handling complex data distributions and its potential as a powerful approach for various data classification tasks.

5.2 Computational efficiency

In terms of computational efficiency, PMM-KNN was compared with traditional KNN. The additional computations involved in PMM did result in a marginal increase in processing time. However, this increase is justified by the significant gains in classification accuracy and robustness. Future work should explore optimization techniques to reduce computational overhead while maintaining performance improvements.

5.3 Theoretical impact of potential optimization techniques

While the current implementation of the PMM-KNN algorithm demonstrates significant improvements in classification accuracy, it also incurs a marginal increase in computational time. The aforementioned optimization techniques, although not yet implemented, suggest promising avenues for future work to enhance computational efficiency.

By incorporating dimensionality reduction, efficient data structures, and parallel processing, it is anticipated that the computational time can be significantly reduced without compromising classification accuracy. Approximate nearest neighbor search and incremental updates provide additional strategies to balance computational efficiency with accuracy.

Overall, our experimental results demonstrate the effectiveness of PMM-KNN across various datasets. However, the degree of improvement varies. It consistently outperformed traditional KNN and showed significant improvements over other state-of-the-art methods, especially on complex and high-dimensional datasets. The adaptability of the Power Muirhead Mean enabled PMM-KNN to capture the underlying data distributions more effectively, leading to enhanced accuracy and robustness in data classification.

These results underscore the potential of PMM-KNN as a powerful and versatile approach for a wide range of data classification tasks. Future research could focus on further refining the parameter selection process for the Power Muirhead Mean and exploring its application in other machine learning algorithms. The automated tuning of the Power Muirhead Mean parameters could offer even more significant improvements and further simplify the application of PMM-KNN in real-world scenarios.

6. Conclusion and future work

In this paper, we introduced the innovative Power Muirhead Mean K-Nearest Neighbors (PMM-KNN) algorithm, which has demonstrated remarkable performance in data classification tasks. By combining the power of the traditional K-Nearest Neighbors with the adaptability of the Power Muirhead Mean, our proposed methodology effectively addresses some of the limitations of conventional KNN approaches and enhances the accuracy and robustness of the classification process.

Through extensive experimentation on various benchmark datasets, we have shown that the PMM-KNN algorithm outperforms traditional KNN and other state-of-the-art classification methods in most cases. Utilizing Power Muirhead Mean empowers the algorithm to capture the underlying data distribution more effectively, thereby improving the centroid estimation process. Consequently, our approach offers superior performance in handling data with varying distributions, making it highly suitable for real-world applications.

6.1 Potential impact

The PMM-KNN algorithm holds significant potential for various real-world applications, particularly in domains requiring robust and accurate classification of high-dimensional data. Examples include medical diagnostics, where accurate classification of complex data can lead to better patient outcomes, and remote sensing, where precise classification of satellite imagery is crucial for environmental monitoring.

6.2 Future work

Future research should focus on automating the selection of PMM parameters. Potential methodologies include machine learning techniques such as hyperparameter optimization frameworks (e.g., Bayesian optimization, grid search) and adaptive algorithms that can dynamically adjust parameters based on the dataset characteristics. Such advancements will further enhance the usability and adaptability of PMM-KNN in diverse applications.

In conclusion, the Power Muirhead Mean K-Nearest Neighbors algorithm represents a significant advancement in the field of data classification. By fusing the strengths of KNN with the adaptability of the Power Muirhead Mean, our proposed methodology exhibits promising potential for addressing complex classification challenges. We encourage researchers to explore and extend the application of the PMM-KNN algorithm to various domains and datasets, as it holds the promise of delivering enhanced accuracy and robustness in data analysis and pattern recognition tasks.

7. Declarations

7.1 Code availability

To promote research transparency and facilitate the reproducibility of our findings, we confirm that the implementation of the PMM-KNN algorithm is available upon request. Researchers interested in accessing the code may contact us at kourosh@aut.ac.ir. We are committed to supporting the scientific community in their efforts to validate and extend our work. By providing access to the code, we aim to foster collaboration and enhance the rigor of scientific inquiry in the field of machine learning and data classification.

7.2 Data availability

The datasets analyzed during the current study are available in the following links:

•
Iris Dataset https://archive.ics.uci.edu/ml/datasets/iris;
•
Breast Cancer Wisconsin https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic);
•
Digits https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits;
•
Landsat Satellite https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite);
•
EEG Eye State https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State.

7.3 Competing interests

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

References

Keller

Gray

Givens

. Afuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics. 1985; SMC-15(4): 580-585. doi: 10.1109/TSMC.1985.6313426.

Raudys

Jain

, et al. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on pattern analysis and machine intelligence. 1991; 13(3): 252-264.

Cover

Hart

. Nearest neighbor pattern classification. IEEE transactions on information theory. 1967; 13(1): 21-27.

Fukunaga

. Introduction to statistical pattern recognition. Elsevier; 2013.

Coomans

Massart

. Alternative k-nearest neighbour rules insupervised pattern recognition: Part1. k-Nearest neighbour classification by using alternative voting rules. Analytica Chimica Acta. 1982; 136: 15-27.

Hajizadeh

Aghagolzadeh

Ezoji

. Mutual neighborhood and modified majority voting based KNN classifier for multi-categories classification. Pattern Analysis and Applications. 2022; 25(4): 773-793.

Liu

Teng

. Some Muirhead mean operators for probabilistic linguistic term sets and their applications to multiple attribute decision-making. Applied Soft Computing. 2018; 68: 396-431.

Sun

Huang

. An adaptive k-nearest neighbor algorithm. In: 2010 seventh international conference on fuzzy systems and knowledge discovery. IEEE. 2010; 1: 91-94.

Muirhead

. Some methods applicable to identities and inequalities of symmetric algebraic functions of n letters. Proceedingsof the Edinburgh Mathematical Society. 1902; 21: 144-162.

10.

Liu

. Some Muirhead mean operators for intuitionistic fuzzy numbers and their applications to group decision making. PloS One. 2017; 12(1): e0168767.

11.

Bonferroni

. Sulle medie multiple di potenze. Bollettino dell’Unione Matematica Italiana. 1950; 5(3–4): 267-270.

12.

Maclaurin

. A second letter to Martin Folkes, Esq.; concerning the roots of equations, with demonstration of other rules of algebra. Philos Trans R Soc Lond Ser A. 1729; 1729(36): 59-96.

13.

Yager

. The power average operator. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans. 2001; 31(6): 724-731.

14.

Zhang

Wang

Zhu

Xing

. Pythagorean fuzzy power Muirhead mean operators with their application to multi-attribute decision making. Journal of Intelligent and Fuzzy Systems. 2018; 35(2): 2035-2050.