Abstract
Purpose:
Vaginal infections are prevalent causes of gynecological consultations. This study introduces and evaluates the efficacy of four Machine Learning algorithms in detecting vaginitis cases in southern Mexico.
Methods:
Utilizing Simple Perceptron, Naïve Bayes, CART, and AdaBoost, we conducted classification experiments to identify four vaginitis subtypes (gardnerella, candidiasis, trichomoniasis, and chlamydia) in 600 patient cases.
Results:
The outcomes are promising, with a majority achieving 100% accuracy in vaginitis identification.
Conclusion:
The successful implementation and high accuracy of these algorithms demonstrate their potential as valuable diagnostic tools for vaginal infections, particularly in southern Mexico. It is crucial in a region where health technology adoption lags behind, and intelligent software support is limited in gynecological diagnoses.
Introduction
In Mexico, where the prevalence of vaginal infections among women between the ages of 20 and 45 is high, it is essential to diagnose gynecological disorders [45]. Bacterial infections, unprotected sexual activity, incorrect use of condoms, exposure to chemicals from hygiene products or creams, imbalances in the body’s microorganisms, weakened immune system, and lifestyle factors can all contribute to the development of gynecological pathologies [45–48, 54]. Misdiagnosis of vaginal infections in women can cause the infection to worsen and last longer, which can increase the risk of HIV and other sexually transmitted infections as well as cause warts and other infections, particularly for those who do not receive early or appropriate treatment [3, 5, 55]. In addition, in some underdeveloped areas, new technological tools based on artificial intelligence, which have been shown to minimize time and increase efficiency in diagnosis [60], are underestimated and little use is made of them.
Currently, as shown in Section 3: related work, various algorithms have been employed, to help in the correct (or timely) identification of pathologies, serving as support in medical diagnosis. In the present study, four classification algorithms were implemented to help formulate a more accurate and reliable diagnosis in the identification of four types of vaginitis {Candidiasis, Chlamydia, Gardnerella, and Trichomoniasis}. The dataset was built using the information coming from medical diagnoses stored in Comprehensive Emergency System (CES) 1 of the Department of Gynecology and Obstetrics of the Women’s Hospital in Villahermosa, Tabasco, Mexico.
The rest of the article is divided as follows: section 2: vaginal infections, defines the gynecological medical terminologies addressed in this research; related previous works are shown the section 3: related work; section 4: material and methods, describes the approach One-Versus-One and One-Versus-All, established in Machine learning for sectioning multiclass cases to binary cases, also describes the dataset and algorithms used, as well as the validation metrics used; the design of experiments is specified in section 5: experiments and results and, finally, in section 6: conclusion and future work, details the conclusions obtained of the study and what are the future challenges.
Vaginal infections
According to the Department of the Secretary of Health in Mexico, every year more cases are present for vaginal infections, to 2018 it was recorded that 30% of gynecological consultations were by infection, with an age range between 20 and 45. It was also indicated that women more likely to have vaginal infections are pregnant women because their defenses are lower and have hormonal changes, also those with diabetes mellitus [3]. There are at least three types of infections: i) vulvitis, when the condition is on the external genitals, ii) cervicovaginitis, when present in the neck of the uterus and iii) vaginitis, when the infection is in the vagina [4]. This case study focuses on the third type.
Vaginitis
This condition typically occurs due to alterations in the usual equilibrium of vaginal bacteria, an infection, or a decline in estrogen levels after menopause. It produces inflammation of the vagina, secretions, itching, and pain [5]. Vaginitis is divided into four subtypes [4, 5]: Gardnerella. It refers to the imbalance in the normal saprophytic microbiota of the vagina, with a decrease in Lactobacillus spp and population overgrowth of Gardnerella vaginalis and other aerobic and anaerobic bacteria. This infection can be transmitted through sexual contact [48, 49]. In addition, it is correlated with urological abnormalities and complications of urological procedures [51]. Candidiasis. It is a fungi infection caused by the fungus Candida that usually occurs in the skin or mucous membranes [52]. Some predisposing conditions are altered physiological processes, pregnancy, endocrine diseases, excessive use of acid soaps, poor hygiene, and use of synthetic clothing [53, 54]. Trichomoniasis is a prevalent sexually transmitted infection that is caused by a parasite. The risk factors for contracting this condition include engaging in sexual activity with multiple partners and having unprotected sex. Symptoms of trichomoniasis include malodorous vaginal discharge, genital itching, pain during urination, and swelling in the genital area [50]. Chlamydia is a common bacterial sexually transmitted infection that often does not produce noticeable symptoms, yet can still be transmitted to others through sexual contact.
Table 1 describes established medical symptoms, including abnormal vaginal secretions, itching or irritation, painful urination, and pain during sex.
Symptoms related to vaginitis
Other symptoms mentioned by the expert (gynecologist), such as polyuria—the abnormal production of large amounts of urine—and pollakiuria—the need to urinate more frequently than usual—are also taken into account, as well as the intensity of abdominal pain and pain in the pelvic region—to have a more accurate diagnosis.
To identify a vaginal pathology, an inferential process is carried out, performed from a “medical condition”, which refers to a concept specific to medicine and consists of a physician’s assessment of the patient’s considering patient symptoms and clinical signs, which can lead to an accurate or presumptive diagnosis, which must be corroborated, in some cases, with other studies: analyses, radiographs, magnetic resonances, ultrasounds, among others [6]. The procedure may include: Anamnesis. It refers to information collected by the physician through specific questions, to obtain useful data and to be able to formulate the diagnosis for the patient’s treatment. Physical examination. It is the set of assessments that a physician performs to obtain information about the person physical health. pH test. The pH of vulvovaginal skin changes throughout a woman’s life, even temporarily during each menstruation. pH measurement is often employed in health facilities where there is not good quality and rapid access to laboratories because it is easy to measure. The measurement consists of passing a tape through the vagina, which results in a specific color. Microscopic examination of all vaginal discharge helps to find out if there are any problems associated with burning, itching, excessive discharge, foul odor, or all of the above.
In addition to the above, in some cases, more tests will be needed, to provide a treatment that involves pharmacological intakes for each of the four pathologies described.
In our local case study, carried out in the Women’s Hospital in Villahermosa, Tabasco, Mexico., 600 patients of vaginitis were analyzed, where the diagnosis is made in the traditional way, according to pre-established tests. However, the hospital does not yet have software based on artificial intelligence, which could decrease the diagnostic time and increase the effectiveness in identifying vaginal pathology. Therefore, this study has a positive impact on the use of new technologies in the region, with direct benefit to patients.
Despite being an essential procedure in the realm of health, medical diagnosis can occasionally be ambiguous due to subjective or outside variables, which causes a patient’s sickness to worsen for longer. Many computational techniques have been used to support this medical activity in order to guarantee accurate identification and, consequently, appropriate therapy.
Al-Milli predicted heart disease risk using a backpropagation neural network with a dataset of 166 records, 116 for training, and 50 for testing. Their results were promised [7]. Aljumah et al. studied diabetes treatment through predictive analysis using Oracle data miner. A dataset comprising various age groups was analyzed and classified into two groups: adults and youth. According to the data available, it was possible to obtain a result indicating the treatment that adults should have unlike young people [8]. Eom et al. implemented Support Vector Machine (SVM) and a neural network based on decision trees (DT) and Bayesian networks for cardiovascular disease diagnosis with accuracy surpassing 94% [9].
Al-Aidaroos et al. selected 15 datasets from the UCI repository to evaluate the performance of six algorithms: Naïve Bayes, logistic regression, KStar, DT, neural network, and simple rule-based algorithm (ZeroR). Through 10-fold cross-validation with 10 runs, Naïve Bayes outperformed other algorithms in 8 of 15 datasets, achieving accuracies up to 97%. Logistic regression emerged as the second-best algorithm in 5 of 6 datasets with accuracies reaching 84 [10].
Iliou et al. investigated osteoporosis using a dataset of 3426 subjects, including 1083 pathological cases and 2343 healthy cases. They tested twenty machine learning techniques through 10-fold cross-validation, revealing that no technique demonstrated superiority over others [12]. On the other hand, Yu, et al. studied osteoporosis using 119 in-patient cases, 55 patients with osteoporosis, and 64 without osteoporosis, with average age of 65 years. They implemented a neural network obtaining a sensitivity percentage of 94.5% and specificity of 96.9%, showing that the artificial neural network is effective in osteoporosis diagnosis [11].
Pergialiotis et al. investigated endometrial carcinoma using clinical history and gynecological examination on a sample of 178 women, comprising 106 diagnosed with carcinoma and 72 with normal histology- Employing three classification algorithms, the most favorable outcome was achieved with an artificial neural network, demonstrating a sensitivity of 86.8%, specificity of 83.3%, and overall accuracy of 85.4% [13]. Tseng et al. assessed the significance of risk factors and predicted ovarian cancer recurrence by incorporating various algorithms. Using medical records from Chung Shan Medical University Hospital, they employed Extreme Learning Machine (ELM), C5.0, Multivariate Adaptive Regression Splines, Random Forest (RF), and ensemble learning. The procedure was repeated ten times, providing ten sets of test and training data. Results indicated that the integrated C5.0 algorithm achieved a higher average correct classification rate, reaching 75.67%, in forecasting ovarian cancer recurrence [14]. Klöppel et al. investigated the diagnosis of neurological and psychiatric diseases, including Alzheimer’s and schizophrenia, aiming to predict the disease course and individualized treatment. Using classifiers such as SVM, neural networks, and Gaussian processes with magnetic resonance imaging, their study concludes that employing classification algorithms for disease diagnosis holds promise, offering potential enhancements in medical diagnosis [20]. Apreutesei et al. emphasized the critical importance of early detection of eye changes in diabetic patients for maintaining overall health. The study focused on 101 eye cases associated with open-angle glaucoma and diabetes mellitus. Utilizing neural networks and recurrent Elman networks, the research aimed to identify the relationship between diabetes and glaucoma, achieving a noteworthy probability of correct responses of 95% [21]. Chattopadhyay researched early depression detection through a hybrid system using 302 adult depression cases and 50 healthy controls. Utilizing diffuse Mamdani logic in a multilayer neural network, the algorithm achieved an impressive average accuracy of 95.50% in diagnosing and rating depression [22]. Velikova et al. advocate for Bayesian networks in medical diagnosis due to inherent uncertainties. Their study, bridging theory with practical application, utilized data from 297 women. The model demonstrated robust diagnostic and prognostic capabilities, achieving a notable minimum area under the curve [23]. Wang et al. employed data mining techniques in ovarian cancer diagnostics, achieving the highest accuracy compared to other models in their experimental results [24]. Huang et al. assessed bleeding risk in cardiopulmonary bypass patients using an artificial neural network and intraoperative laboratory data. With 39 cases and 15 input parameters, the results revealed 69.2% exact match, 23.1% with a one-grade difference, and 7.7% with a two-grade difference between actual and intended blood use [25]. Li et al. forecasted the severity of menopausal symptoms using a proposed artificial neural network model with 10 neurons in the hidden layers and 9 risk factors. Achieving an 85% accuracy, the result was deemed competitive compared to similar studies [26]. Lakhani & Sundaram assessed the efficacy of deep convolutional neural networks in detecting tuberculosis on chest X-rays. Using 4 sets comprising 1007 chest X-rays (68.0% training, 17.1% validation, and 14.9% test), they found disagreement in 13 of the 150 test cases. A cardiothoracic radiologist, in a blinded review, correctly identified all 13 cases. This collaborative approach achieved a sensitivity of 97.3% and a specificity of 100% [27]. Yeh et al. constructed a DT model to enhance the diagnosis of cerebrovascular disease, achieving outstanding sensitivity and accuracy rates of 99.48% and 99.59%, respectively [28]. Das et al. investigated valve heart disease, creating an expert system with neural networks. Their model achieved an impressive 97.4% accuracy in classifying experiments on a 215-sample dataset, with sensitivity and specificity reaching 100% and 96%, respectively [29]. Hariharan et al. addressed Parkinson’s disease in the elderly, proposing a Gaussian mixture model for improved dysphonia characterization. Utilizing an online UCI dataset, their approach achieved a remarkable maximum grading accuracy of 100% for the Parkinson dataset [30]. Muthukaruppan & Er developed a particle swarm optimization-based expert system for coronary artery disease detection. Using a training dataset of 478 cases (200 with cardiac disease, 278 healthy), the system achieved a classification accuracy of 93.27% [31].
Abdar et al. examined human liver and disease with 583 instances (416 diseased, 167 healthy). Using Boosted C5.0 and CHAID algorithms, they achieved accuracies of 93.75% and 65%, respectively, highlighting the potential for rule generation in liver disease [32]. Amaral et al. addressed chronic obstructive pulmonary disease (COPD), developing a machine learning-based expert system for diagnosis using forced oscillation measurements with data from 50 individuals. KNN, SVM, and artificial neural network (ANN) classifiers demonstrated diagnostic potential, achieving a minimum sensitivity of 87% and a minimum specificity of 94% [33]. Tenório et al. created a celiac disease diagnostic expert system using 178 training and 38 testing clinical cases. Employing various artificial intelligence algorithms, the mean dependency estimator, a Bayesian classifier, emerged as the most accurate with 80% accuracy, 78% sensitivity, 80% specificity, and an 84% area under the curve [34]. Sanz et al. proposed merging fuzzy rule-based categorization systems with fuzzy interval value sets to enhance cardiovascular disorder diagnosis. Three methods were employed in the inference process: 1) modeling classifier language labels with fuzzy sets of range values; 2) applying an operator; and 3) using genetic fit to optimize parameters and degree of ignorance for each fuzzy set of interval values. This improved correct diagnoses by approximately 3% compared to classic fuzzy sorters and 1% compared to the previous value range fuzzy sorter [35]. Arabasadi et al. highlighted the high mortality associated with coronary artery disease. Using a Z-Alizadeh Sani dataset with records of 303 patients, each with 54 characteristics, they proposed a hybrid diagnostic method. The neural network achieved accuracy, sensitivity, and specificity of 93.85%, 97%, and 92% respectively [36]. López et al. investigated Alzheimer’s disease using the Alzheimer’s Disease Neuroimaging Initiative and SPECT databases. Their neural network achieved precision results of up to 96.7% and 89.52%, representing a notable improvement over the classic voxels-as-features reference approach [37]. Iftikhar et al. assessed risks linked to incorrect diagnostic decisions and the associated costs of clinical medication errors. They classified heart disease using SVM, GA-SVM, PSO-SVM, and a hybrid proposal, GA-SVM-SMO-RBF, achieving 88.10% accuracy [38]. Abdi & Giveki developed a diagnostic model for erythematous-squamous diseases using particle swarm optimization, SVM, and association rules. Utilizing 24 features from the UCI database, they achieved a classification accuracy of 98.91% [39]. Ozkan et al. investigated urinary tract infections using data from routine examinations and definitive diagnoses in 59 patients. They implemented DT, SVM, RF, and ANN, models have 93.22%, 96.61%, 96.61%, 98.30% accuracy, 95.55%, 97.77%, 95.55%, 97.77% sensitivity, and 85.71%, 92.85%, 100% specificity, respectively [40].
Analysis
The studies surveyed consistently showcase remarkable accuracy rates, surpassing 90% in several instances. This underscores the substantial potential of machine learning algorithms in delivering precise diagnostic outcomes across diverse medical conditions. Noteworthy achievements in accuracy serve as a testament to the effectiveness of these computational approaches in significantly improving the reliability of medical diagnoses.
Among the machine learning algorithms used, several common approaches are mentioned, such as SVM, ANN, decision trees, Random Forest, and Naïve Bayes, among others. These algorithms are widely recognized in the field of machine learning and have proven their effectiveness in various medical applications, which helped us to choose two of the most representative algorithms and two of the least used algorithms for comparison in this study.
The implementation of machine learning algorithms in the medical field shows the potential to improve the accuracy and speed of diagnoses (see Table 2), which can lead to earlier and more effective medical care. This in turn can result in better disease management, reduced costs and ultimately benefit patients and the healthcare system as a whole. In contrast to previous work, our contribution is focused on vaginitis diagnosis, using a comprehensive approach with the evaluation of four machine learning algorithms. Specifically, we address the identification of four vaginitis subtypes (gardnerella, candidiasis, trichomoniasis and chlamydia) within a dataset of 600 patient cases.
State-of-the-art of algorithms implemented in the medical area
State-of-the-art of algorithms implemented in the medical area
This section describes the approaches One-Versus-One (OVO) and One-Versus-All (OVA), the dataset used, the algorithms implemented, the validation metrics used, and the experimental configuration.
OVO and OVA approach
The concept behind the OVO approach is quite straightforward: it involves utilizing a classifier for every pair of classes, thereby transforming a problem with c classes into c (c - 1)/2 binary problems of the form <i, j>, where i, j ∈ 1, …, c and i < j. For each specific set of classes i, j, the binary classifier is trained using only the samples from classes i and j, while those from classes k ¬ = i, j are disregarded [19]. The OVA approach is widely used and involves selecting one class at a time and training a classifier to distinguish that particular class from all the others, thereby converting a c class problem into c binary problems, where l = c (representing the total possible combinations for use in binary training subsets). The binary classification problems are formed by taking examples of class i as positive examples and those of all other classes as negative examples [19].
A medical expert assisted in obtaining and proposing characteristics. The Book Williams Gynecology [56] was also used, which provides a comprehensive review of the entire spectrum of gynecological health and treatment of diseases: general gynecology of benign diseases, endocrinology of reproduction and infertility, menopause, reconstructive surgery, pelvic medicine, and oncological gynecology.
Case histories from the Women’s Hospital in Villahermosa Tabasco, Mexico collected in the period 2018–2019 where employed. Specifically, the data were taken from the CES software, with a total of 600 cases of diagnosis of Vaginitis: 75 of Candidiasis, 146 of Chlamydia, 318 of Gardnerella, and 61 of Trichomoniasis.
Of the 600 clinical cases, 10 dataset were built using the OVO and OVA approaches. Table 3 shows the distribution of cases for each dataset
Total records per dataset
Total records per dataset
Each dataset has 10 attributes and also added the column “class” which refers to the pathology for each clinical case and the class takes the value of 0 or 1, having a total of 11 columns as can be seen in Table 4. The specific choice of these attributes was based on recommendations and suggestions provided by medical professionals, thus ensuring the relevance and appropriateness of the characteristics collected.
Dataset (Fragment)
This section describes the algorithms used, they were selected from the review analysis of related works
Simple perceptron
It is a binary classification algorithm created from a neural model [2], composed of two layers of neurons, one input and one output. Input neurons do not perform any computations, only send the information (discrete values of {0,1}) to the output neurons. The simple perceptron is formed by threshold values and is therefore useful for the representation of boolean functions [16, 17]. For the training of a perceptron, the associated weight vector must be randomly initialized, then updated to achieve better results. For each iteration of the algorithm, the vector is updated. It is recommended that the training rate is small (between 0.1 and 0.2), so that the results are correct [17]. The perceptron is composed of: Inputs: the information received by the perceptron. Given a vector, it is the value of each cell. Synaptic weights: are numerical values responsible for establishing the influence of an input on the desired output. They are assigned, regularly, randomly. Activation function: a mathematical function that is responsible for determining an output value once each input has been processed.
x is defined as the stimulus vector (or inputs) and w as the weight vector, both of size m, and z as the activation function.
The perceptron φ(z) is considered an asset when its value is greater than or equal to the threshold θ or otherwise inactive.
If θ is incorporated to the expression, defining w0 = - θ yx0 = 1, z can be written as z = w0x0 + w1x1 + … + w m x m .
The perceptron has a simple learning rule that allows adjusting the values of the weights (w). The steps are followed: assign an initial value to weights of 0 (zero) or random small values and for each training sample x(i) do the following: Compute the output value. Update the weights.
The weights updating is done by increasing or decreasing them in Δ w
j
.
Naïve Bayes is an algorithm used for both prediction and description [18]. It employs frequencies to calculate conditional probabilities, enabling the generation of predictions for new cases [57]. Formally, it is expressed as an event E occurring in combination with one or more mutually exclusive events F
j
.
Let E and F denote two events. The occurrence of event E can be expressed as the union of event E and event F or as the union of event E and the complement of event F. This can be written as:
The Equation 7 is known as the Bayes formula. It is considered to E as evidence of F
j
, and calculate the probability that F
j
occurs given the evidence, P (E|F
i
).
The CART algorithm generates decision trees with binary branches at each decision node. It recursively splits the training dataset into subsets of records with similar values for the target attribute. The algorithm grows the tree by exhaustively searching all available variables and possible splitting values for each decision node, and selecting the optimal split based on the Gini index [43].
In the context of classification, P
i
represents the probability that a tuple in dataset D belongs to class C
i
[58]. To calculate the Gini index, a binary split is performed for each attribute [44], and a weighted sum of the impurity of each partition is calculated. For a binary division of data D based on attribute A into subsets D1 and D2, the Gini index of D is given by:
In the case of a discrete value attribute, the subset that provides the minimum Gini index for the chosen is selected as a division attribute. In the case of continuous value attributes, the strategy is to select each pair of adjacent values as a possible split point and point with a smaller Gini index chosen as the split point.
The division attribute is selected as the one with the minimum Gini index in CART.
CART is versatile as it can be used with categorical values and can handle missing attribute values. The algorithm employs cost complexity pruning and can generate regression trees. It uses rules to split data into a node based on the value of a variable, allows stopping the rules to determine when a branch becomes terminal and cannot be split further, and provides a prediction for the target variable at each terminal node.
The AdaBoost algorithm was proposed in [1], to generate a strong high-precision sorter by combining multiple low-performance sorters [41]. Adaboost trains multiple weak classifiers on the same dataset, assigns weights to each classifier based on its accuracy, and combines their predictions to make a final prediction. The algorithm adjusts the weights of misclassified samples and trains the next weak classifier to focus on them, improving the model accuracy with each iteration. Adaboost continues to train until a specified number of weak classifiers are created or until all samples are classified correctly [42]. AdaBoost works as follows: A random subset of the training data is initially selected. A learning algorithm is chosen based on the accurate prediction of the previous training. The highest weight is assigned to incorrectly classified observations so that they are more likely to be correctly classified in the next iteration. Weights are assigned to each learner based on its accuracy, with the most accurate learner receiving the highest weight. This process iterates until the entire training dataset is fitted without error or the maximum number of estimators is reached. To classify new data, a “vote” is taken among all the created learning algorithms.
Confusion matrix. The confusion matrix (see Table 5) shows metrics that describe the performance of a supervised machine learning model in test data. Confusion matrix
Where TP is the number of positives correctly classified as positive by the model, TN is the number of negatives correctly classified as negative by the model, FP is the number of negatives that were incorrectly classified as positive, and FN is the number of positives that were incorrectly classified as negative. Based on these cases, parameters can be defined to evaluate the diagnosis, the most common are:
Accuracy. Refers to the dispersion of the set of values obtained from repeated measurements of a magnitude. The lower the scatter the greater the accuracy, is obtained with the formula
Sensitivity. Indicates the ability of the test to detect the presence of pathology (in this case), obtained with the formula
Specificity. Indicates the ability of the test not to detect the presence of pathology when not present, obtained with the formula
Expertise. The level of satisfaction of the results is described based on the opinion of the medical expert.
Experiments were conducted on a lenovo ideapad 330, procesador Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz, RAM 8.00 GB, operating system 64 bits, python 3.7, and software Excel for creating and displaying datasets in CSV format.
Experiments and results
Four algorithms were evaluated: i) Perceptron Simple, ii) Naïve Bayes, iii) CART, and iv) Adaboost, each with the 10 dataset proposed in the Table 3.
As shown in the Table 6, for each dataset were obtained their respective percentages of correct classification, the algorithm Simple Perceptron achieve 100% with the dataset 2, 4, 5, and 8. CART algorithm and Adaboost achieved 100% with the dataset 1, 2, 4, 5, 6, 8, and 10. On the other hand, the worst performance was obtained with Naïve Bayes algorithm.
Experiments results
Experiments results
*Response time is not acceptable compared to expert diagnosis at a medical appointment.
Confusion matrices
Tables 7, 8, 9, and 10 show the results of the confusion matrices for each algorithm, obtained by evaluating each of the 10 dataset in the test subset. The total of true positives is written with background blue. The total of true negatives with background gray, the total of false negatives with background green, and the total of false positives with background white.
Confusion matrix for the Simple Perceptron
Confusion matrix for the Simple Perceptron
The Perceptron Simple algorithm (see Table 7), with the dataset 1 test subset achieved 1.0 sensitivity and 1.0 specificity. With dataset 2 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 3, 7, and 9 response time was not acceptable. With the dataset 4 was achieved 1.0 sensitivity and 1.0 specificity. With the dataset 5 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 6 was achieved 0.94 of sensitivity and 1.0 specificity. With the dataset 8 was achieved 1.0 sensitivity and 1.0 specificity. With the dataset 10 was achieved 0.98 of sensitivity and 1.0 specificity.
For the algorithm Naïve Bayes (see Table 8), with the dataset 1 test subset was achieved 0.64 of sensitivity and 0.42 specificity. With dataset 2 was achieved 0.67 of sensitivity and 0.25 specificity. With dataset 3 was achieved 0.46 of sensitivity and 0.48 specificity. With dataset 4 was achieved 0.48 of sensitivity and 0.47 specificity. With dataset 5 was achieved 0.45 of sensitivity and 0.46 specificity. With dataset 6 was achieved 0.41 of sensitivity and 0.62 specificity. With dataset 7 was achieved 0.44 of sensitivity and 0.65 specificity. With dataset 8 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 9 was achieved 0.98 of sensitivity and 0.98 specificity. With dataset 10 was achieved 1.0 sensitivity and 0.97 specificity.
Confusion matrix for Naïve Bayes
For the algorithm CART (see Table 9), employing the dataset 1 test subset was achieved 1.0 sensitivity and 1.0 specificity. With dataset 2 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 3 was achieved 1.0 sensitivity and 0.96 specificity. With dataset 4 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 5 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 6 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 7 was achieved 1.0 sensitivity and 0.96 specificity. With dataset 8 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 9 was achieved 0.96 of sensitivity and 1.0 specificity. With dataset 10 was achieved 1.0 sensitivity and 1.0 specificity.
Confusion matrix for CART
For the algorithm Adaboost (see Table 10), employing the dataset 1 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 2 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 3 was achieved 1.0 sensitivity and 0.79 specificity. With dataset 4 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 5 was achieved 0.96 of sensitivity and 1.0 specificity. With dataset 6 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 7 was achieved 0.98 of sensitivity and 0.93 specificity. With dataset 8 was achieved 1.0 sensitivity and 1.0 specificity. With dataset 9 was achieved 0.96 of sensitivity and 0.97 specificity. With dataset 10 was achieved 1.0 sensitivity and 1.0 specificity.
Confusion matrix for AdaBoost
According to the physician Rubí López Álvarez, with the Professional Certificate number 7574655 to practice in Mexico, the results obtained are considered favorable. It is worth mentioning that is suggested to increase the number of cases to observe the changes in the precision that turned out to be a little low, so it is also mentioned to address the clinical picture in greater detail considering other possible characteristics.
Analysis and discussion
In our study, we employed a carefully constructed dataset from 600 cases of Vaginitis, collected at the Women’s Hospital in Villahermosa Tabasco, Mexico, during the period 2018-2019. The dataset construction was carried out using CES software, which provided a detailed understanding of the pathological categorization in the Vaginitis spectrum. This comprehensive approach ensures not only a complete representation of gynecological pathologies, but also an accurate alignment with actual clinical conditions in Villahermosa Tabasco, Mexico.
The intrinsic quality of the dataset is a crucial component of our study, as it directly influences the reliability of the results obtained. It is essential to emphasize that the results are linked to the meticulous quality of the data set.
By analyzing the behavior of four representative machine learning algorithms, we observed that the algorithms Simple Perceptron, CART, and AdaBoost, generate outstanding results in the 10 datasets, although Simple Perceptron did not obtain results in 3, 7, and 9, in an acceptable time for medical diagnosis in situ. These challenges faced by the Simple Perceptron algorithm in specific datasets underscore the need for further optimization. While, the Naïve Bayes algorithm exhibited promising outcomes, with acceptable results (≥85%) in 5 datasets and results (≤72%) in the remaining 5 datasets. This variance underscores the algorithm’s sensitivity to different datasets, emphasizing the importance of a nuanced evaluation.
With the 10 attributes that established with the help of the expert in the Table 4, for the diagnosis of Vaginitis in its possible pathological categorization, it is observed that the results are reliable in the prediction of the medical diagnosis, representing a benefit for the patient, since it can reduce costs for timely treatment, which leads to a shorter recovery process.
The results obtained experimentally, show that it is possible to develop (front-end) software that, based on (back-end) machine learning algorithms, can support increasing the percentage of timely detection of gynecological pathologies, in addition, there are free programming languages for payment and use.
Conclusion and future work
In this paper, we have investigated and evaluated four machine learning algorithms (Simple Perceptron, Naïve Bayes, CART and AdaBoost) for vaginitis diagnosis on a clinical dataset. Our experiments revealed promising results in terms of diagnostic accuracy, with algorithms such as CART and AdaBoost achieving classification rates above 90% in most cases, this is due to the properties of the algorithms described in sections 4.3.3 and 4.3.4. The ability of these algorithms to diagnose vaginitis with high accuracy has the potential to improve medical care by enabling early and accurate diagnosis.
In regions where specialist physicians are not available, the implementation of systems based on these algorithms could be very useful to provide an initial medical evaluation and refer patients to specialized treatment as needed. In addition, this research represents a first step in the exploration of machine learning applications in the medical field in our region (south of Mexico). Our results suggest that this area has the potential to benefit the local medical community.
It is important to note that this research is not without limitations, and there are opportunities for future work. The disadvantaged Naïve Bayes with accuracies of less than 50%. However, we believe that the results obtained are a significant step in the application of machine learning techniques for medical diagnosis and have the potential to make an important contribution to the field of regional health care.
The following future work should be considered: Include other clinical features (attributes). Explore if it is possible to get the same results with fewer attributes employing dimensional reduction. Expand, if possible with data from another hospital, the number of cases, to have more records. Develop an expert system for functional use in health centers where there is no specialist doctor. Perform a granular analysis of patient subgroups to unravel the nuanced impact of age, comorbidities, and clinical presentations on algorithmic accuracy. Perform a clustering analysis using techniques such as t-SNE, ISOMAP or PCA. This will allow us to explore the existence of subtypes within vaginal pathologies, providing additional insight into the variability in the data. We will consider an Explainability in Artificial Intelligence (XAI) study to provide a clearer understanding of how model outputs are generated. This transparent analysis will address the need to explain the internal rules and functions of the featured classifiers, thus satisfying the request for detailed answers for end users.
Footnotes
Data availability
The data used in this study is available only upon request.
Acknowledgments
The authors thank the Women’s Hospital in Villahermosa, Tabasco, Mexico and the study subjects for providing the information to carry out the study.
The authors thank physicians Rubí López Álvarez and Claudia Ivonne Ramírez Santiago for their continued support to conduct the study and the feedback on the results.
The authors thank Adriana G. López-Ramírez for help in machine learning experiments and José Carlos de Jesús Montero Rodríguez for help in dataset construction.
