Abstract
Polycystic Ovary Syndrome (PCOS) is a hormonal condition that typically affects female during the time of their reproduction. It is identified by the disruptions in hormonal balance, particularly an increase in levels of androgen (male hormone) in the female body. PCOS can lead to various symptoms and health complications including irregular menstrual cycles, ovarian cysts, fertility issues, insulin resistance, weight gain, acne, and excess hair growth. The real-world PCOS detection is a challenging task whilst PCOS specific cause is unknown and its symptoms are unclear. Thus, accurate and timely diagnosis of PCOS is crucial for effective management and prevention of long-term complications. In such cases, Machine learning based PCOS prediction model support diagnostic process, address potential errors and time constraints. Machine learning algorithms can analyze large set of patient data, including medical history, hormonal profiles, and imaging results, to assist in the diagnosis of PCOS. In particular, the performance of data analysis chore and prediction model is improved by ensemble feature selection strategies. These methods concentrate on selecting a subset of pertinent features from a broader range of features. The unstable nature of the outcome of feature selection algorithm is a frequent issue in practical applications, when it is applied multiple times on similar dataset or with slight modifications in the data. Thus, evaluating the robustness of feature selection algorithm is most important. To address these issues and quantify the robustness, this study uses Jenson-Shannon divergence, an information theoretic approach with ensemble feature selection method to handle the various findings, such as complete ranking, half ranking and top-k lists (without ranking). Furthermore, this article proposes a hybrid machine learning classifier with SMOTE – SVM for the prompt detection of PCOS and the performance of the model is compared with a number of other individual classifiers including KNN (K-Nearest Neighbour), Support Vector Machine (SVM), AdaBoost, LR –Logistic Regression, NB –Nave Bayes, RF –Random Forest, Decision Tree. The proposed SWISS-AdaBoost classifier surpassed other models with 97.81% of accuracy and AUC of 99.08%.
Keywords
Introduction
PCOS stands as a prevalent endocrine condition impacting women in their childbearing years, distinguished by persistent irregularities in hormonal secretion. Numerous cysts develop inside the ovaries as a result of these abnormalities. This condition is widespread, affecting women of diverse ethnicities and geographic regions. It not only affects fertility but also causes high sleep apnea, cholesterol levels, type II diabetes, anxiety, high bp (blood pressure) and etc., between 5% and 20% of women of reproductive age around the world have PCOS [6, 16]. The global prevalence of PCOS disease is influenced by various factors, like genetic predisposition, lifestyle and environmental factors, as well as shifts in diagnostic standards and public awareness. It’s also crucial that PCOS may not be properly or fully diagnosed. Machine Learning has emerged as a transformative force in healthcare, offering significant advancements in disease diagnosis. Due to its capacity to analyses complicated data and spot patterns that may be challenging for human clinicians to recognize, it is crucial in the detection and diagnosis of PCOS. Machine learning models increased accuracy can result in earlier and more precise diagnosis, which will benefit PCOS patients overall [11]. Healthcare providers can identify PCOS more accurately and quickly by utilizing the analytical capabilities of machine learning algorithms. The prognosis and quality of life of PCOS sufferers are eventually improved by the early detection, timely intervention, and specialized management measures.
The accuracy, stability, and dependability of feature selection procedures have been improved with the ensemble feature selection methods in various areas, including healthcare, finance, and image analysis. These techniques offer a strong and complete framework for choosing the most instructive features and enhancing overall predictive modeling by utilizing the collective wisdom of several methodologies [19, 20]. The instability of the results when using feature selection algorithms repeatedly on similar dataset or with minor data adjustments is a prevalent problem with this approach. Stability evaluation tests must be included in feature selection procedures in order to guarantee their stability. These tests assist in determining the performance of ensemble feature selection techniques when it is used frequently on comparable dataset or when the data undergoes slight modifications.
A hybrid machine learning approach combines the prediction of multiple machine learning algorithms to solve a specific problem. The ability to leverage the strengths of different algorithms and techniques makes this approach as powerful tools for solving complex problems and achieving superior performance compared to using a single algorithm alone.
The following are the key contribution of this article: Propose an Ensemble feature selection technique to find most important features and increase the model’s accuracy Employs Jenson Shannon approach, an information theoretic strategy with three alternative results (complete ranked list, half ranked list, and top –k list) for the feature selection strategies stability evaluation. Implement a hybrid machine learning model to quickly identify PCOS. The performance of the suggested model is compared to that of conventional machine learning classifiers, such as Decision Tree, NB (Naive Bayes), Support Vector Machine, AdaBoost, KNN (K-Nearest Neighbour), Logistic Regression and Random Forest.
This article is organized as follows. Section 2 illustrates the different proposed approach for stability assessment of feature selection methods and also it discusses the various Machine Learning techniques used for PCOS Detection. Methods and Methodologies used in this article are discussed in section 3. Proposed model is explained in section 4. Dataset description and preprocessing is discussed in section 5. Performance of the proposed model is presented in section 6 and section 7 concludes the research work.
Feature selection methods and stability analysis
Deepak et al., [3] proposed RSt (Rank stability) a frequency-based stability measure to evaluate the ensemble FS (feature selection) algorithm according to feature ranking and feature subsets. Also, the proposed RSt satisfies the essential properties such as fully defined, strict monotonicity, minimum and maximum bounds and correction for change. They have performed the experiments on 11 different real-world dataset with ensemble-based feature selection methods and employed AdaBoost and SVM classifiers for testing the accuracy over traditional feature selection techniques. Utkarsh et al., [8] Analysis of the various feature selection methods (filter, wrapper, and embedding), as well as several stability metrics, such as –stability by index, stability by rank, and stability by weight, are presented. The authors also addressed the solutions to address the instability of feature selection methods. Rocío et al., [14] In order to assess the stability of feature selection techniques for different outputs (Full Ranked, partial and Top –k list), Jenson Shannon divergence method, based on information theory, has been introduced. To assess the reliability of feature selection approach outcomes, the authors used food quality dataset and compared the results with two different stability analysis methods Spearman’s rank correlation and KI (Kunchava’s index) and proved that Jenson Shannon approach is suitable for testing the different outcomes with feature ranking. Barbara [15] proposed ensemble ranking approach with bootstrap aggregation for feature extraction. Stability of each feature selection technique has been evaluated with different permutation levels of data. The authors used two different models includes such as Support Vector Machine (SVM) and RF –Random Forest over 18 different datasets and evaluated the performance.
Hybrid based machine learning techniques
Ritika et al., [1] proposed SMOTE with stacked hybrid model for early diagnosis of PCOS. To build a hybrid model, used six different classifiers as base learner including AdaBoost, Support Vector Machine (SVM), Naïve Bayes, LR –Logistic Regression, Decision Tree (DT), Random Forest and performed meta level implementation. For feature extraction, a backward feature elimination technique is employed. The authors compared the outcomes of the suggested stacking techniques with those of a single classifier. Among all the stacking classifier, Stack –Ada boost method outperformed with 90.24% accuracy. Hela et al., [2] To address the class imbalance problem, the authors employed SMOTE and Edited Nearest Neighbor (ENN). To extract the necessary features and reduce the dimensionality, Recursive Feature Elimination –RFE and Random Forest is used. The experimentation was performed with two different training and testing ratio of 70 : 30 and 80 : 20 and the proposed stacking classifier with Random Forest produces the highest result of 100 % than other models. Sayma et al., [4] proposed stacking ensemble model with one meta model as boosting model and considered five traditional models as base learner. Principle Component Analysis (PCA), Chi –Square method and RFE –Recursive Feature Elimination is used as technique for feature selection. The proposed system output is compared with the results of the conventional machine learning model. Stacking –Gradient Boosting model produced 95.7 % accuracy than other models. Laith Alzubaidi et al., [5] addressed the issue of limited training data (data scarcity) in various applications and provided the different solutions including GAN (Generative Adversarial Network), DeepSMOTE –Deep Synthetic Minority Oversampling Technique etc.,. Homay et al., [10] used SBS –sequential backward selection, Pearson correlation and Random Forest to explore the important features. The effectiveness of the classifier with fewer attribute is assessed, and the outcomes are compared with Ensemble RF (Random Forest), ET (Extra Tree), AdaBoost and MLP (Multilayer Perceptron). Amsy Denny et al., [17] used PCA (Principle Component Analysis) to identify the most relevant features (8 features) and employed different traditional machine learning approaches includes CART –Classification and Regression Tree, Support Vector Machine, KNN (K Nearest Neighbor) and (RF) Random Forest. The authors achieved 89.02% accuracy with Random Forest than other approaches.
Previous studies have shown a tendency to find the best subset of features from a given dataset using feature reduction techniques. These reduced feature subsets are then utilized for machine learning classification. However, only a small amount of analysis has been done on various reduced feature subsets that include various combinations and amounts of features from the total dataset. Additionally, there hasn’t been considerable study on how different machine learning classification strategies perform when employing different feature subsets obtained from various feature reduction methods. Furthermore, the robustness of feature selection methodologies in managing a variety of outcomes has not been properly evaluated.
This research aims to bridge the existing research gaps by focusing on the effective and efficient PCOS –Polycystic Ovary Syndrome detection through the optimal feature selection. To accomplish this, the proposed work employs Jenson Shannon Divergence to handle different outcomes produced by ensemble feature selection methods and evaluate its stability and also hybrid stacking classifier is implemented for the classification purposes. The developed stacking classifier is trained and evaluated to ensure its effectiveness.
Methods & Methodology
This section presents the ensemble feature selection methods and its stability assessment. Also discusses the hybrid machine learning technique, where diverse traditional models are combined in a novel way to enhance predictive performance.
Ensemble feature selection techniques
Feature selection is an extremely effective technique for increasing the level of accuracy of predictive algorithms. It ensures increased model performance with reduced computational times by diligently choosing the most important elements. Therefore, an exhaustive selection method is used to extract the best set of features from the PCOS dataset using ensemble filter and ensemble wrapper feature selection techniques.
Ensemble filter methods
Ensemble feature selection combines multiple individual filter methods to collectively determine the importance of features in a dataset. These techniques are used to exploit the diverse perspectives of different filter methods to identify the most relevant and informative features for a given machine learning task. It overcomes the limitations of individual filter methods and provides a more robust feature selection process [7, 20]. In this article, high performed individual filter methods for the provided dataset includes chi –square, Relief, fisher score and mRmR has been combined to determine the important features.
Ensemble wrapper method
Ensemble wrapper methods for feature selection are the combination of outputs of multiple individual feature selection techniques with the predictive power of a machine learning model. It integrates the feature selection process into the model’s training and evaluation, using the model’s performance as the criteria for selecting the best features. In order to determine the highly important features, the proposed model used Recursive Feature Elimination (RFE) with a variety of different classifier models. RFE-SVM (Support Vector Machine), RFE-RF (Random Forest), RFE-ET (Extra Tree), and RFE-LR (Linear Regression) are among the best performing classifier models for the provided dataset.
Stability assessment –feature selection method
The stability of a feature selection technique reflects its capacity to consistently recognize pertinent features and maintain their order of importance across diverse samples. A stable feature selection method is characterized by producing consistent outcomes when applied to various data subsets, enhancing its reliability and trustworthiness. To ensure the robustness of feature selection algorithm, this article employs Jenson –Shannon divergence, an information theoretic based approach with ensemble feature selection methods to handle the different outcomes such as complete ranked, half ranked and top –k list. This metric possesses the desired properties includes upper/lower bound, correction for change and maximum. The following subsection explains the stability assessment of different results produced by ensemble feature selection techniques.
Complete ranked list –Jenson shannon divergence
For the stability assessment, the outcome of the feature selection process is converted into a probability distribution [14]. Feature ranking approaches assigns the highest weight to the attributes at the top of the list. As their rank is established by the outcomes of the feature ranking algorithm, the likelihood reduces gradually in accordance with that ranking. This ensures that the most important features are given greater emphasis while those lower in rank receive comparatively lower weights. Thus, following [14], for the complete ranked list, rv = (r
1, r
2, r
3, … … … r
n
) the ranking vector, is mapped into pv = (p
1, p
2, p
3, … … … p
n
) the probability vector with
Where,
The degree of similarity between the distributions produced by several runs of a feature ranking algorithm can be assessed using the Jensen-Shannon divergence. Given a set of ‘n’ distributions (p
1, p
2, … … … p
n
), each of which represents a run of a particular feature ranking algorithm represented as
The feature i’s average probability is represented by
Stability metric SJ
s
for stability analysis is expressed as:
Where, D
S
–represents Jenson Shannon Divergence with ‘n’ ranking output and the divergence value for a completely random ranking generation is represented by
Where p i indicates features probability with rank.
For the half ranked list, converting the feature ranking’s output into a probability distribution is done by,
SJs is calculated using Equation (3), D
S
,
For the feature subset with top –k list, converting the feature ranking’s output into a probability distribution is done by,
And
SJs is calculated using Equation (3) and D S is calculated using Equation (2).
Hybrid stacking model is an approach for combining base level classifier predictions and providing them as new input to the meta level classifier. In this article, K-Nearest Neighbor, Support Vector Machine, Naïve Bayes, Random Forest, Logistic Regression, Ada Boost and Decision Tree are considered as the base level classifier and one of these models are selected as meta level classifier.
Proposed Model
PCOS is a disease marked by the presence of excess androgen and ovarian dysfunction. Multiple ovarian cysts, irregular menstrual periods, elevated testosterone levels, and insulin resistance are the symptoms of Polycystic Ovary Syndrome (PCOS). It is caused by a confluence of genetic and environmental factors, while the specific origin is yet unknown. PCOS is associated with a number of health issues, including infertility, irregular menstruation, obesity, diabetes, and cardiovascular disease [6]. For enhanced outcomes and the prevention of complications, early PCOS diagnosis is essential. Leveraging machine learning models can be very helpful in medical diagnostics, assisting medical practitioners in detecting patients with PCOS [9]. To establish an effective treatment approach, the proposed work focused on two main phases (Fig. 1). Ensemble feature selection technique with Stability Assessment Hybrid stacking machine learning model

Proposed framework for PCOS detection.
The effectiveness of data analysis tasks and prediction models is greatly enhanced by ensemble feature selection methods. These methods emphasize on extracting a smaller number of pertinent features from a bigger pool of available features. In this article, both Ensemble filter and Ensemble wrapper method are implemented. The method with highest stability is considered for the model training. Algorithm 1 explains the procedure of Ensemble Filter Method, it is a feature selection procedure that enhances the robustness and efficacy of feature selection by combining the results of various distinct filter methods and it also used to reduce feature dimensionality for better model performance. To produce a more accurate feature ranking, it makes use of the many perspectives offered by various filtering techniques. According to the ranked lists relevancy ratings, the significant features are chosen using proposed ensemble filter approach, which combines the results of best performed individual filters such as Chi-Square, Relief, Fisher Score, Minimum Redundancy and Maximum Relevance.
Algorithm 2 explains the procedure of Ensemble wrapper method; it generates various feature subsets, trains distinct machine learning models for each subset, and evaluates the performance of each model. The optimal feature subset of the top-performing model is then chosen as the final set of features to be utilized for training the machine learning model. Recursive Feature Elimination (RFE) is used in this proposed work to discover significant features using a variety of different classifier models. RFE-SVM (Support Vector Machine), RFE-RF (Random Forest), RFE-ET (Extra Tree), and RFE-LR (Linear Regression) are the best performed classifier models that were used. RFE is used iteratively with each classifier to choose the most pertinent features, improving the dataset comprehension and interpretability for the individual models.
Influence of Ensemble Feature selection on stability
The stability of the chosen features can be significantly influenced by ensemble feature selection techniques. When an algorithm is repeated on the same dataset or other dataset that are similar, stability refers to how consistently the results of feature selection are produced. In order to ensure that the chosen features are not significantly influenced by the specific random initialization or data perturbations, a stable feature selection approach should yield similar or identical feature subsets across different runs. The proposed method used three primary approaches to ensure the selected features robustness.
Approach 1: Dataset D has been used to create training dataset X and testing dataset T. Multiple training samples, X1, X2, Xm, are generated by bootstrapping from X.
Approach 2: To create k distinct training data samples, k-fold cross validation is employed. To evaluate the stability of the proposed ensemble feature selection technique, 5 –fold validation is used in this proposed work to produce 5 different sets of data samples with their ranking. One of the five folds is used as the testing dataset T for each cross-validation, and the other four folds are pooled to form the training dataset X. Five distinct training samples, X1, X2, X3, X4, and five equivalent testing samples, T1, T2, T3, and T5, are obtained after all iterations.
Approach 3: The stability of the feature selection method is assessed using the Jenson Shannon divergence approach when it is applied to several sub-data sets with ranking and various outputs, such as complete ranked, half ranked and top K (without ratings). The proof for this can be found in [14]. As a result, using three separate results, the Jenson Shannon technique is used to assess the stability of the suggested ensemble feature selection methods.
Hybrid stacking model
To improve the overall model performance, Hybrid stacking model is used to combine the individual classifier prediction at the base level and assign that as a new input to the meta level. The steps involved in hybrid stacking model are explained in Algorithm 3. In this proposed model K-Nearest Neighbor, Support Vector Machine, Naïve Bayes, Random Forest, Logistic Regression, Ada Boost and Decision Tree is implemented as the base level and one of these models are selected as meta level classifier. Prediction of all the base level classifier is combined and assigned as a new input to the meta level classifier.
Dataset
PCOS dataset from the Kaggle Repository is used in this article, which comprises all the clinical and physical characteristics of PCOS patients. There are 541 instances and 43 attributes in the original dataset. SI. No and Patient File No. have been eliminated from the PCOS dataset since they have no impact on the output feature. Then, an attempt was made to convert the DataFrame to float data type and fill the remaining missing values using the mean of the relevant columns. The PCOS dataset is imbalanced; the target feature of the dataset is PCOS (Y/N) which consists of 541 rows and 43 columns. Out of 541, total rows of 177 have PCOS/YES, and 364 contain PCOS/No. To balance the dataset, SMOTE –SVM oversampling approach is used.
Experiments and Result Discussion
Stability assessment –feature selection method
In this article, the proposed method employed two different feature selection techniques, including Ensemble filter and Ensemble wrapper approach. A Jenson Shannon Divergence is used to evaluate the stability of both approaches on different outcomes such as complete ranked, half ranked, and top-k lists. The method with highest feature stability score is considered the model training. Table 1 represents the stability score of complete ranked list for individual filter method with different threshold values 16, 23, 30, and 35 and the corresponding ensemble features list of 9, 17, 26, and 34 using Jenson Shannon Divergence. It is clear from the results that the proposed ensemble filter method produces the highest score of 91.75% with the threshold value of 16 than individual methods.
Jenson Shannon Stability Assessment –Individual vs proposed Ensemble Filter Method (complete ranked list)
Jenson Shannon Stability Assessment –Individual vs proposed Ensemble Filter Method (complete ranked list)
Table 2 depicts the stability score of individual wrapper methods with different threshold values 16, 23, 30 and 35 and the corresponding ensemble feature list of 9, 17, 24 and 34 using Jenson Shannon Divergence. From the table, it is clear that the proposed ensemble wrapper method produces the highest score 92.39% with the threshold of 16 than individual methods. While comparing the stability score of both approaches, ensemble wrapper method is more stable than ensemble filter method at all threshold level (Table 2). Therefore, feature list produced by the proposed SWISS (ensemble wrapper with stability assessment) is considered for model training and the experimentation is conducted at different threshold values. The results are shown in Fig. 2.
Jenson Shannon Stability Assessment –Individual vs proposed Ensemble Wrapper Method (complete ranked list)

Stacking model performance analysis with proposed SWISS method at different threshold values.
Among all, the features suggested by proposed SWISS with threshold 23 produces the highest accuracy of 97.81% and hybrid model is trained using those stable features.
Figure 3 shows the stability evaluation of the top-k (without ranking) list and the half-ranked list. It is noted that the stability score of top-k (without ranking) outcomes of feature selection methods are less than half ranked list. It shows the importance of ranking in the process of feature selection. Additionally, the stability of the feature selection methods decreases with an increase in the number of features. Therefore, selecting robust features that improve model performance is more crucial.

Stability Assessment of Half ranked list vs Top-k list at different threshold values.
Performance analysis of individual classifiers with and without feature selection is presented in Tables 3 and 4. The individual classifiers used in this article includes K-Nearest Neighbor, (SVM) Support vector machine, (NB) Naïve Bayes, RF-Random Forest, LR (Logistic Regression), AdaBoost and DT (Decision Tree).
Performance analysis of individual classifier without FS & oversampling
Performance analysis of individual classifier without FS & oversampling
Performance analysis of individual classifier with FS & oversampling
Table 3 shows that without feature selection and oversampling, Logistic Regression performed well than other models with 91.27% accuracy, 87.93% precision, 85.28% Recall, 86.38% f1 score and AUC score of 95.39%. Table 4 shows that with feature selection and oversampling, Random Forest outperformed with 93.69% accuracy, 93.65% Precision, 99.35% Recall, 96.42% f1 score and AUC score of 95.72%. It shows that, the performance of Random Forest classifier has improved with feature selection. Figure 4. Shows the ROC curve of individual classifier with feature selection and it was shown that the true positive rate of Random Forest is higher than other models.

ROC Plot of individual classifier with feature selection.
Table 5 represents the performance analysis of stacking hybrid model without feature selection and oversampling. The result shows that stacking AdaBoost produces 88.89% as highest accuracy, 78.57% as precision and recall, 93.19% as AUC score.
Performance analysis of stacking classifier –without Feature selection and oversampling
Performance analysis of stacking classifier –without Feature selection and oversampling
Table 6 represents the performance analysis of stacking classifier with stable features produced by the proposed SWISS model and oversampling. The result shows that SWISS with Stacking AdaBoost classifier produces highest accuracy of 97.81%, 97.80% of precision and 99.68% of recall, f1 score of 98.73% and AUC score as 99.08%. It is noticed from the results that the proposed stacking model with stable features produced by SWISS and SMOTE-SVM outperformed than individual models. Figure 5. Shows the ROC curve of the proposed model and it was shown that, true positive rate of SWISS-AdaBoost is higher than other meta models. The result comparison of proposed work and existing work is shown in Table 7. Ritika et al., [1] used Backward Feature Elimination (BFE) as feature selection technique. The authors used SMOTE for oversampling and implemented six stacking learners –LR, SVM, DT, RF, NB and AdaBoost with 30 features and obtained 90.24% of accuracy. Sayma et.al., [4] implemented five stacking classifiers such as GradBoost, XGBoost, AdaBoost, CATBoost, Random Forest with 25 features produced by principle component analysis (PCA) and attained 95.70% of accuracy. The proposed model used 17 features produced by SWISS method. SMOTE –SVM is used to oversampling the given dataset and implemented seven base and meta learners including KNN, LR, SVM, DT, RF, NB, AdaBoost and obtained 97.81% of accuracy and 99.08% of AUC, which is higher than the existing work.

ROC plot for proposed model with stable features.
Performance analysis of stacking classifier –with Feature selection and oversampling
Result comparison of proposed vs existing work
In order to properly treat and manage PCOS in women, this article describes data-driven early diagnosis of the disorder. This investigation has made it possible to identify a few key characteristics that would serve as the main indicators for diagnosing PCOS patients, help medical professionals quickly and accurately identify diseases, and be affordable for patients who underwent a number of tests to diagnose their illnesses. The PCOS dataset being used is imbalanced. SMOTE-SVM is therefore applied to the training set to balance the data. A Jenson Shannon Divergence was used in this proposed study to assess the features obtained by the individual and ensemble feature selection approaches. As the result, highly stable features produced by the proposed ensemble wrapper method with the stability score of 92.39% is considered for model training. The proposed hybrid stacking model with seven base learners: K-Nearest Neighbor, Support Vector Machine, Naïve Bayes, Random Forest, Logistic Regression, AdaBooost and Decision Tree are trained and implemented the meta learner. The result of the proposed model is compared with individual classifiers with and without feature selection. From Table 6, it is observed that reduced and more stable feature list with balanced dataset performed well with SWISS –AdaBoost and produced the accuracy of 97.81% and AUC of 99.08%. The stability measure presented in this research is applicable only to the feature selection algorithm that selects features based on ranking. Therefore, in future studies, we plan to examine stability measure with optimization-based feature selection algorithms like PSO (Particle Swarm Optimization) and Genetic Algorithm (GA) based feature selection to extract information on individual features and examine the results with different classifiers.
Financial support
This project was funded by RPS (Research Promotion Scheme) of AICTE –(All India Council for Technical Education) –File. No: 8- 98/FDC/RPS/POLICY-1/2021-22 (AQIS ID: 1-9331521061).
Declaration of competing interest
The authors have no conflicts of interest to declare.
