Abstract
This paper introduces the innovative Power Muirhead Mean K-Nearest Neighbors (PMM-KNN) algorithm, a novel data classification approach that combines the K-Nearest Neighbors method with the adaptive Power Muirhead Mean operator. The proposed methodology aims to address the limitations of traditional KNN by leveraging the Power Muirhead Mean for calculating the local means of K-nearest neighbors in each class to the query sample. Extensive experimentation on diverse benchmark datasets demonstrates the superiority of PMM-KNN over other classification methods. Results indicate statistically significant improvements in accuracy on various datasets, particularly those with complex and high-dimensional distributions. The adaptability of the Power Muirhead Mean empowers PMM-KNN to effectively capture underlying data structures, leading to enhanced accuracy and robustness. The findings highlight the potential of PMM-KNN as a powerful and versatile tool for data classification tasks, encouraging further research to explore its application in real-world scenarios and the automation of Power Muirhead Mean parameters to unleash its full potential.
Introduction
Numerous domains, including pattern recognition and artificial intelligence, statistics, cognitive psychology, vision analysis, and medicine, rely on object classification for their research and practical applications.
The K-Nearest Neighbors (KNN) classifier tackles the problem of classification by first determining the distance between a new sample to be classified and the training samples, then observing the k-nearest neighbors for the new sample, and finally deciding whether or not the new sample belongs to the class that shares the most neighbors with the new sample [1].
Typically, KNN is a useful classifier, but it is worth mentioning that, because the parametric form of the density function cannot be assumed, non-parametric classifiers need a large number of samples. When working with a tiny sample size, this might be a serious problem [2, 3, 4].
Moreover, a well-known problem with KNN classifiers is that they are susceptible to outliers that distort the class distribution [4].
The majority voting principle is the foundation of KNN, which determines the class of a new sample by looking at its closest neighbors and the class that constitutes the majority of those neighbors. In the case that a data set is obviously unbalanced, one of the drawbacks of the majority voting principle is that the classified samples of the class or classes that have a large number of samples have a tendency to dominate the prediction of the new sample. This is simply due to the fact that they are frequently more numerous among the k nearest neighbors [5, 6].
The Muirhead Mean (MM), first introduced by Muirhead in 1902, is a generic aggregation function. It can be defined as a function of means and has been used in various applications owing to its ability to acknowledge interrelationships [7].
The Power Muirhead Mean (PMM) is an extension of the MM, designed to mitigate the adverse impact of outliers and emphasize central data points more effectively. This is achieved by combining the MM with a Power Average (PA), thus allowing the algorithm to adapt to various data distributions more robustly.
In this research, we offer a method for overcoming the limitations of the KNN classifier by using the Power Muirhead Mean of K nearest neighbors for each class separately.
Literature review
This section summarizes the theoretical foundations of the KNN classifier, Muirhead Mean (MM) Operator, Power Average (PA), and Power Muirhead Mean (PMM).
K-nearest neighbors classifier
The classic KNN algorithm is one of the earliest and most basic techniques for classifying patterns. In spite of this, it often produces competitive outcomes, and in particular fields, when supplemented with previous knowledge, it has greatly improved the state of the art. The KNN rule classifies each unlabeled instance according to the majority label of its k-nearest neighbors in the training set. Therefore, its success is highly dependent on the distance measure used to locate the closest neighbors. In the absence of previous information, the majority of KNN classifiers utilize a basic Euclidean metric to quantify dissimilarities across vector input instances. The following equation describes Euclidean distance [8].
where we define vectors
where
The Muirhead Mean Operator provides a general aggregation function, and it is defined as follows:
Let
We call
Moreover, from Eq. (3), we can know:
From the Eq. (3) and the special cases of MM operator mentioned above, we can know that the advantage of the MM operator is that it can capture the overall interrelationships among the multiple aggregated arguments and it is a generalization of most existing aggregation operators [10].
PA operator was introduced for the first time by Yager [13] for crisp numbers. The most significant benefit of the PA operator is its ability to mitigate the adverse impact of excessively high and low arguments on the final findings.
Let
where
If
For simplicity, we’ll refer to the above properties as Property 1 [14].
Li et al. [14] proposed the Power Muirhead Mean by combining PA with MM.
Let
where
In this section, we have introduced a new version of KNN which uses PMM to overcome the alluded issues of naïve KNN.
K-nearest neighbors based on power muirhead mean
Our proposed methodology represents a fusion of the K Nearest Neighbors (KNN) algorithm with the innovative Power Muirhead Mean, changing the landscape of data classification. This novel approach aims to enhance the accuracy and robustness of traditional KNN by efficiently handling data distributions and emphasizing central data points while mitigating the impact of outliers.
The PMM-KNN algorithm diverges from traditional KNN and its modifications in several key ways:
Outlier Robustness: By incorporating the Power Average with the Muirhead Mean, PMM-KNN reduces the influence of outliers on the classification process. Adaptive Aggregation: The use of PMM allows for a more flexible aggregation of nearest neighbors, taking into account the interrelationships between data points more effectively than traditional means. Improved Centroid Calculation: PMM provides a sophisticated method for centroid calculation, leading to more accurate class representation and, consequently, better classification performance.
The theoretical implications of this combination lie in the enhanced ability to capture complex data structures and relationships, which is crucial for high-dimensional and imbalanced datasets. By leveraging the adaptive nature of PMM, our approach not only improves classification accuracy but also provides a more nuanced understanding of the underlying data distribution.
The first step of our methodology involves computing the distances between each test data point and all training data points in the dataset. This initial distance calculation allows us to determine the proximity of the test data point to each individual training data point. By effectively capturing the local structure of the data, this step lays the foundation for our subsequent data aggregation process.
Selecting K nearest neighbors
With the distances calculated, we proceed to select K data points from each class that exhibit the shortest distances to the respective test data point. The K parameter serves as a crucial hyperparameter, enabling us to fine-tune the granularity of the data neighborhood considered during the classification process. This selection mechanism ensures that we capture a sufficient representation of the local data distribution for each class, facilitating robust classification outcomes.
Calculate centroids using power muirhead mean
To enrich our methodology further, we use the concept of Power Muirhead Mean, a powerful mathematical tool capable of capturing the essence of data distributions. The PMM uses a vector of parameters
These parameters were selected based on preliminary experiments and domain knowledge, aiming to balance the trade-off between robustness to outliers and sensitivity to the data’s central tendency.
Aggregating K nearest neighbors
With the K Nearest Neighbors selected for each class, we proceed to aggregate these data points using the Power Muirhead Mean. The aggregation process incorporates the
Predicting the test data class
Having obtained the Power Muirhead Mean centroids for each class, we assess the distance between the test data point and each of these centroids. The class corresponding to the centroid with the minimum distance to the test data point is considered the predicted class for the input data. This distance-based classification strategy ensures that we leverage the collective information of the K Nearest Neighbors while effectively incorporating the adaptability of the Power Muirhead Mean to produce accurate and reliable predictions.
PMM-KNN[1]
Flowchart of the PMM-KNN algorithm.
While the PMM-KNN algorithm shows improved classification accuracy, it also incurs a marginal increase in computational time. To mitigate this issue in practical applications, several optimization techniques can be considered.
Dimensionality reduction
Reducing the dimensionality of the data before applying the PMM-KNN algorithm can decrease computational complexity. Techniques such as Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) can be employed to project high-dimensional data into a lower-dimensional space while preserving essential data structure. This reduction in dimensionality can significantly decrease the computational burden during the distance computation and aggregation phases.
Efficient data structures
Implementing efficient data structures such as k-d trees or ball trees can expedite the nearest neighbor search process. These data structures enable faster querying of k-nearest neighbors, especially in high-dimensional spaces, thereby reducing overall computational time.
Parallel processing
Parallel processing techniques can distribute the computational load across multiple processors or cores. By parallelizing distance computations and centroid calculations, the PMM-KNN algorithm can achieve a substantial reduction in execution time.
Approximate nearest neighbors
Approximate nearest neighbor search algorithms, such as Locality-Sensitive Hashing (LSH), offer a trade-off between accuracy and computational efficiency by approximating the k-nearest neighbors instead of computing them exactly. This approach can lead to significant speedups with minimal impact on classification accuracy.
Incremental updates
For dynamic datasets where new data points are continuously added, incremental update techniques can be employed. These techniques allow for efficient updates by only processing new data points and their impact on existing centroids, rather than recomputing the entire PMM-KNN model from scratch.
In summary, our method amalgamates the strengths of K Nearest Neighbors and Power Muirhead Mean, paving the way for a more sophisticated and powerful approach to data classification. By seamlessly integrating data neighborhood selection with adaptable data aggregation, our methodology outperforms traditional KNN algorithms and other state-of-the-art classification techniques, setting a new standard for accuracy and reliability in data analysis and pattern recognition tasks.
Data and testing
In this section, we will discuss the datasets that we have chosen to use as our benchmarks, and then we will explain the assessment approach that we have chosen for our model.
Datasets
In this research, we have used five real-world datasets, which have shown in Table 1.
All datasets used for evaluation of the PMM-KNN model
All datasets used for evaluation of the PMM-KNN model
In this study, we acknowledge certain limitations and assumptions in our experimental design. One limitation is the selection of datasets, which may not encompass the full diversity of real-world data distributions. Consequently, the generalizability of our results to all possible datasets remains uncertain. Additionally, the choice of PMM parameters, while optimized for our selected datasets, may require further tuning for different data scenarios.
To address these limitations, future work should include testing the PMM-KNN algorithm on a broader range of datasets, including those from real-world applications, to validate its effectiveness and versatility. Moreover, exploring automatic parameter selection methods for PMM can enhance the algorithm’s adaptability across diverse datasets.
Next, we are going to briefly introduce the criteria that have been used to measure our model performance. As we know, the proportion of true predictions to all predictions named Accuracy can not describe our model solus. Therefore, we have introduced two other assessments consisting of Specificity and Sensitivity.
Specificity and sensitivity
The ability of a model to correctly identify true positives in each of the given categories is measured by the parameter known as Sensitivity. A model’s capacity to assess and forecast the true negatives of each accessible category is referred to as its specificity. All category models may benefit from using these measures. The equation that needs to be used to calculate these metrics is provided down below. As we need to define positive and negative classes, we should select all possible pairs from classes and assign “positive” and “negative” labels to them. Then, a True Positive (TP) is a result for which the model accurately predicted the positive class. A True Negative (TN) is a result in which the model accurately predicts the negative class. False Positives (FP) are outcomes in which the model forecasts the positive class inaccurately, and finally, a False Negative (FN) is an outcome where the model incorrectly predicts the negative class.
Selecting optimal parameters for the Power Muirhead Mean (PMM) is crucial for the performance of the PMM-KNN algorithm. The parameters
Grid search
Grid search involves exhaustively searching a predefined set of parameter values to identify the combination that yields the best performance. This method, although computationally intensive, provides a comprehensive evaluation of the parameter space.
Random search
Random search offers a more efficient alternative to grid search by randomly sampling parameter values from predefined distributions. This approach can often find near-optimal parameter settings with fewer evaluations.
Bayesian optimization
Bayesian optimization is a probabilistic model-based approach that iteratively selects the most promising parameter values based on past evaluations. This method balances exploration and exploitation, leading to efficient convergence to optimal or near-optimal parameters.
Cross-validation
Integrating cross-validation within the parameter selection process provides robust estimates of model performance for different parameter settings. Evaluating the algorithm on multiple folds of data ensures that the selected parameters generalize well to unseen data.
Results
In this section, we will discuss the experiment results and analysis of the proposed method.
We have thoroughly compared the performance of the PMM-KNN method with several state-of-the-art classification methods, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Naïve Bayes (NB). To ensure fair and meaningful comparisons, we diligently optimized the implementation details and settings of all comparative methods. For each method, we carefully tuned the hyperparameters using cross-validation techniques, aiming to achieve the best possible performance.
Initially, we split our datasets into training and testing sets and utilized the same data partitions for evaluating all classifiers. Subsequently, we calculated the mean accuracy of each method using a robust 10-fold cross-validation approach. To optimize PMM-KNN, we considered the
The support equation Eq. (8) is essential for determining the degree of influence or support one data point provides to another within the Power Muirhead Mean framework. This support measure helps in understanding the relationships between different data points and influences the aggregation process.
The Eq. (8), ensures that the support value is normalized between 0 and 1. When the distance is zero (i.e., the data points are identical), the support is at its maximum value of 1. As the distance increases, the support value approaches zero.
The optimization process for the comparative methods, including KNN, SVM, and NB, was carried out meticulously. We explored a wide range of hyperparameter values, fine-tuning them through cross-validation, to ensure that each method’s settings were optimal. This attention to detail guarantees that the comparison between PMM-KNN and other classifiers is fair and unbiased.
By carefully optimizing the parameters and settings of all methods involved in the comparison, we can confidently assert that the reported performance improvements of PMM-KNN are not due to an unfair advantage in parameter selection. Instead, our findings accurately reflect the genuine superiority of PMM-KNN in comparison to the state-of-the-art classification methods. We trust that these optimization efforts strengthen the validity and reliability of our results, ensuring a comprehensive evaluation of the proposed PMM-KNN algorithm.
The evaluation results for various datasets and classification methods are presented in Table 2. The table shows the mean accuracy along with sensitivity and specificity for each method on different datasets.
Datasets evaluation results
Comparison of PMM-KNN with other methods
The PMM-KNN algorithm showed significant improvements in datasets with complex and high-dimensional distributions, such as the Landsat Satellite and EEG Eye State datasets. This can be attributed to the PMM’s ability to capture intricate data relationships and mitigate the impact of outliers.
In contrast, the improvements were less pronounced in simpler datasets like Iris, where the data structure is less complex, and traditional KNN already performs well. These variations highlight the strength of PMM-KNN in handling challenging classification scenarios.
To statistically validate the accuracy differences, we conducted paired t-tests for each dataset and method comparison. The results are shown in Table 3. The p-values indicate whether the accuracy differences are statistically significant or not. If the p-value is less than 0.05, the difference is considered statistically significant.
In the Landsat Satellite dataset, PMM-KNN exhibited statistically significant improvements over Gaussian Naive Bayes (GNB), Support Vector Machine (SVM), and traditional K-Nearest Neighbors (KNN). Although PMM-KNN did not show significant differences on the Iris dataset, it maintained competitive performance. On the Breast Cancer dataset, PMM-KNN outperformed SVM with a statistically significant difference. For the EEG Eye State dataset, PMM-KNN achieved significant improvements over GNB, SVM, and KNN. Lastly, on the Digits dataset, PMM-KNN demonstrated outstanding performance, significantly outperforming GNB, SVM, and KNN. These consistent significant differences indicate the superior capability of PMM-KNN in handling complex data distributions and its potential as a powerful approach for various data classification tasks.
Computational efficiency
In terms of computational efficiency, PMM-KNN was compared with traditional KNN. The additional computations involved in PMM did result in a marginal increase in processing time. However, this increase is justified by the significant gains in classification accuracy and robustness. Future work should explore optimization techniques to reduce computational overhead while maintaining performance improvements.
Theoretical impact of potential optimization techniques
While the current implementation of the PMM-KNN algorithm demonstrates significant improvements in classification accuracy, it also incurs a marginal increase in computational time. The aforementioned optimization techniques, although not yet implemented, suggest promising avenues for future work to enhance computational efficiency.
By incorporating dimensionality reduction, efficient data structures, and parallel processing, it is anticipated that the computational time can be significantly reduced without compromising classification accuracy. Approximate nearest neighbor search and incremental updates provide additional strategies to balance computational efficiency with accuracy.
Overall, our experimental results demonstrate the effectiveness of PMM-KNN across various datasets. However, the degree of improvement varies. It consistently outperformed traditional KNN and showed significant improvements over other state-of-the-art methods, especially on complex and high-dimensional datasets. The adaptability of the Power Muirhead Mean enabled PMM-KNN to capture the underlying data distributions more effectively, leading to enhanced accuracy and robustness in data classification.
These results underscore the potential of PMM-KNN as a powerful and versatile approach for a wide range of data classification tasks. Future research could focus on further refining the parameter selection process for the Power Muirhead Mean and exploring its application in other machine learning algorithms. The automated tuning of the Power Muirhead Mean parameters could offer even more significant improvements and further simplify the application of PMM-KNN in real-world scenarios.
Conclusion and future work
In this paper, we introduced the innovative Power Muirhead Mean K-Nearest Neighbors (PMM-KNN) algorithm, which has demonstrated remarkable performance in data classification tasks. By combining the power of the traditional K-Nearest Neighbors with the adaptability of the Power Muirhead Mean, our proposed methodology effectively addresses some of the limitations of conventional KNN approaches and enhances the accuracy and robustness of the classification process.
Through extensive experimentation on various benchmark datasets, we have shown that the PMM-KNN algorithm outperforms traditional KNN and other state-of-the-art classification methods in most cases. Utilizing Power Muirhead Mean empowers the algorithm to capture the underlying data distribution more effectively, thereby improving the centroid estimation process. Consequently, our approach offers superior performance in handling data with varying distributions, making it highly suitable for real-world applications.
Potential impact
The PMM-KNN algorithm holds significant potential for various real-world applications, particularly in domains requiring robust and accurate classification of high-dimensional data. Examples include medical diagnostics, where accurate classification of complex data can lead to better patient outcomes, and remote sensing, where precise classification of satellite imagery is crucial for environmental monitoring.
Future work
Future research should focus on automating the selection of PMM parameters. Potential methodologies include machine learning techniques such as hyperparameter optimization frameworks (e.g., Bayesian optimization, grid search) and adaptive algorithms that can dynamically adjust parameters based on the dataset characteristics. Such advancements will further enhance the usability and adaptability of PMM-KNN in diverse applications.
In conclusion, the Power Muirhead Mean K-Nearest Neighbors algorithm represents a significant advancement in the field of data classification. By fusing the strengths of KNN with the adaptability of the Power Muirhead Mean, our proposed methodology exhibits promising potential for addressing complex classification challenges. We encourage researchers to explore and extend the application of the PMM-KNN algorithm to various domains and datasets, as it holds the promise of delivering enhanced accuracy and robustness in data analysis and pattern recognition tasks.
Declarations
Code availability
To promote research transparency and facilitate the reproducibility of our findings, we confirm that the implementation of the PMM-KNN algorithm is available upon request. Researchers interested in accessing the code may contact us at kourosh@aut.ac.ir. We are committed to supporting the scientific community in their efforts to validate and extend our work. By providing access to the code, we aim to foster collaboration and enhance the rigor of scientific inquiry in the field of machine learning and data classification.
Data availability
The datasets analyzed during the current study are available in the following links:
Iris Dataset Breast Cancer Wisconsin Digits Landsat Satellite EEG Eye State
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
