Abstract
Objective
Some patients with benign paroxysmal positional vertigo (BPPV) do not improve with a single maneuver and may require multiple maneuvers. This study aims to utilize machine learning (ML) to identify parameters predisposing multiple CRMs, thus enhancing the predictability of treatment requirements in BPPV patients.
Study design
Retrospective study.
Setting
Hospital.
Patients
This study included 520 participants diagnosed with BPPV between 2018 and 2023, with a mean age of 56.2 ± 14.0 years.
Interventions
Age, BPPV type, comorbid diseases, gender, and number of maneuvers that the patients recovered with were determined. The target outcome—“number of maneuvers”—was dichotomized as either one (0) or more than one (1). The models’ success was evaluated using metrics such as precision, F1-score, accuracy, balanced accuracy, recall, area under the Receiver Operating Characteristic (ROC), and area under the curve (AUC).
Results
The applied maneuver number to treat BPPV was 188 (36%) in one maneuver and 332 (67%) in more than one maneuvers. Gradient Boosting Machine (GBM) had the best AUC in maneuver number estimation. Also, logistic regression resulted the best precision score; XGBoost showed the best F1 and recall score while support vector classifier showed the best accuracy and balanced accuracy scores.
Conclusions
Machine learning models with high predictive capabilities can help identify patients likely to need multiple maneuvers, allowing for more efficient treatment planning and enhanced patient outcomes.
Introduction
Benign paroxysmal positional vertigo (BPPV), the most common cause of vertigo, occurs as a result of otoconia in the utricle falling into the semicircular canals. Displaced otoconia can move freely in the semicircular canals (SSC) or adhere to the cupula. 1 Thus, excitations or inhibitions that abnormally stimulate the vestibulo-ocular reflex occur in head movements in the affected canal axis. These abnormal stimulations cause short-term and severe positional vertigo in patients. There are many canalith repositioning maneuvers (CRM) used to place dislodged otoconia back into the vestibule in line with the bulbofugal movement. Semont 2 and Epley 3 maneuvers are generally used for anterior and posterior SSC. For lateral SSC BPPV, barbaque roll 4 or gufoni maneuver 5 is usually applied. More than 75% of BPPV patients recover with a single CRM. 6 However, some patients require multiple maneuvers for a complete recovery. It has been stated that spin problems, hypertension, number of affected channels and BPPV etiology are predisposing factors for multiple CRM.6–8 However, the number of maneuvers required for BPPV recovery cannot be determined in advance.
Recent studies have begun exploring how machine learning can aid in clinical decision-making and disease prediction. A multimodal deep learning approach integrating eye movement videos and posture data has shown around 81.7% diagnostic accuracy for BPPV, emphasizing the importance of combining ocular and positional cues. 9 In addition, a random forest–based study demonstrated that spontaneous nystagmus, head-shaking nystagmus, and video head impulse testing (vHIT) effectively screened acute, episodic, and chronic vestibular syndromes with accuracies of up to 90%–91%. 10 Although previous work has identified various risk factors associated with repeated maneuvers in BPPV, the use of machine learning to predict whether a patient will require multiple CRMs remains novel.
In healthcare field, supervised learning models are widely used in order to determine disease prediction or aiding clinical decision-making. Studies have explored predisposing factors for multiple maneuvers in BPPV. This study aims to utilize ML to identify parameters predisposing multiple CRMs, thus enhancing the predictability of treatment requirements in BPPV patients.
Methods
This retrospective study was conducted on patients who applied to the Audiology, Balance and Speech Disorders Diagnosis and Rehabilitation Unit between 2018 and 2023 and were diagnosed with BPPV. Patient registration forms of the patients who came to our clinic were examined. Age, BPPV type, comorbid diseases, gender and number of maneuvers that the patients recovered with were determined. Following criteria were applied to determine the final form of data used in the study:
Inclusion Criteria: • Patients aged 18 years and older who presented to the Audiology, Balance, and Speech Disorders Diagnosis and Rehabilitation Unit between 2018 and 2023. • Patients with a clinical diagnosis of benign paroxysmal positional vertigo (BPPV). • Patients for whom complete records were available regarding age, affected canal and ear, comorbid diseases, gender, and the number of maneuvers required for recovery.
Exclusion Criteria • Patients whose BPPV remained unresolved despite repeated canalith repositioning maneuvers. • Patients younger than 18 years of age. • Patients with BPPV secondary to other vestibular pathologies (e.g., Meniere’s disease and vestibular neuritis). • Patients with missing or incomplete data on age, affected canal and ear, comorbidities, or treatment responses.
All eligible patient forms were reviewed to confirm completeness of data and consistency with BPPV diagnostic criteria before final enrollment in the study. As a result of the screening, 620 patient data was included in the study.
Machine learning models
A variety of machine learning techniques applied to handle classification tasks. K-nearest neighbors (KNN) predicts a class by examining the majority label among the K-nearest data points, 11 while Decision Trees (DT) 12 use a tree-like structure where each node splits data based on the most significant feature. Random Forest (RF) 13 constructs multiple decision trees and aggregates their outputs to improve accuracy and reduce overfitting, and Support Vector Machine (SVM) 14 finds an optimal hyperplane to separate classes with maximum margin. Logistic Regression (LR), 15 despite its name, is primarily used for classification by modeling the probability of class membership via a logistic function.
We also utilized ensemble gradient boosting methods and advanced libraries. XGBoost 16 builds trees sequentially to correct errors from previous ones, leveraging parallel processing for speed. Gradient Boosting Machine (GBM) 17 creates an ensemble of weak learners in a stage-wise manner, optimizing a chosen loss function. CatBoost 18 excels with categorical features and helps prevent overfitting, while LightGBM 19 is designed for efficient training on large datasets with minimal memory usage.
Hyperparameter optimization is a critical process in machine learning that focuses on finding the optimal set of hyperparameters—such as n_estimators, max_depth, learning_rate, and colsample_bytree—to maximize a model’s performance. These parameters govern the complexity, regularization, and sampling strategies in models like XGBoost, LightGBM, and CatBoost, and directly influence the model’s predictive power and generalization ability (Table 2). Traditional methods like grid search and random search can be inefficient, especially when dealing with complex models and large datasets. Optuna is a modern, flexible framework designed for hyperparameter optimization in machine learning. It leverages advanced algorithms like the Tree-structured Parzen Estimator (TPE) and includes features such as dynamic search space definition and trial pruning. These capabilities allow Optuna to efficiently explore the hyperparameter space and enhance model performance. Bayesian optimization, a core method used in Optuna, constructs a probabilistic model of the objective function to intelligently select the most promising hyperparameters, effectively balancing exploration and exploitation. This approach improves optimization efficiency and effectiveness, making Optuna a powerful tool for achieving optimal configurations in machine learning models.
In defining the search space for each hyperparameter, since there is no standard consensus on each hyperparameter, we used common best practices (e.g., learning rate from 0.001 to 0.3) and initial exploratory tests (each range for hyperparameters were presented at Table 2). These rangers were covered via documentation of each library and relevant literature.12,15,18–22 Because machine learning training can involve stochastic processes—such as random data splits (train/test), random seeds for parameter initialization, and random subsets for bagging—we took several measures to ensure reliable hyperparameter evaluation. First, we fixed random seeds (random state) to aid reproducibility. Second, we used multiple folds in cross-validation to smooth out anomalies in individual splits. Third, Optuna’s adaptive approach helps mitigate random fluctuations by refining its search around promising regions in parameter space. Consequently, our final selected hyperparameters reflect robust, high-performing configurations rather than outcomes of chance or overfitting to a single data partition.
The diverse set of machine learning models were chosen to capture a wide range of predictive capabilities and interpretability levels. Classical approaches like KNN, DT, and LR are simple to implement and interpret, making them useful for rapid prototyping and understanding the data’s underlying structure. More sophisticated ensemble methods—such as RF, XGBoost, GBM, CatBoost, and LightGBM—are known for their high accuracy and robustness, particularly in cases of complex interactions between features. Finally, SVM were included for their strong theoretical foundations and proven performance in high-dimensional spaces. By comparing multiple algorithms, we aimed to identify the model that best balances predictive power, interpretability, and computational efficiency for our specific classification tasks.
Results
Data enrollment
Initially, the database screening identified 603 clients. However, the final analyses included 520 participants following a multi-step data cleaning process. First, we applied Tukey’s method to detect and remove outliers in the age variable; any age values deemed outliers by the interquartile range (IQR) criterion were excluded. Next, any rows lacking information on the categorical variables were removed, as were entries with missing values in the maneuver count or affected canal fields. In addition, clients diagnosed with vestibular disorders outside the study’s inclusion criteria were also excluded. Because the dataset contained a relatively small proportion of missing data, we opted for complete-case removal rather than imputation. Finally, we applied standard scaling (z-score normalization; z = (x − μ)/σ ) to the Age variable to ensure a consistent scale across analyses. Consequently, 520 valid records remained for the final analysis.
The dataset contained information on participants’ age, gender, affected semicircular canal (SSC), diabetes mellitus status, presence of neck problems, hearing problems, number of comorbidities, and the number of maneuvers required to treat BPPV. Variables such as gender, diabetes mellitus, neck problems, hearing problems, and the number of maneuvers were coded as binary outputs. Conversely, the number of comorbidities was recorded as integers, while the affected SSC was categorized into four groups. The target variable—the number of maneuvers required to treat BPPV—was coded as a binary variable: 0 indicating one maneuver and 1 indicating more than one maneuver. This transformation was necessary due to the class imbalance associated with higher numbers of maneuvers.
Data analysis and machine learning model development were performed using Python version 3.11. The models were trained and hyperparameter optimization was conducted using libraries such as Scikit-learn, XGBoost, LightGBM, CatBoost, and Optuna.
The data underwent 5-fold cross-validation. In each fold, the models randomly split the entire dataset into 80% for training and 20% for testing. The performance in each fold was compared against the ground truth and quantified using the area under the Receiver Operating Characteristic (ROC) Curve (AUC). The models’ success was evaluated using metrics such as precision, F1-score, accuracy, balanced accuracy, recall, ROC, and AUC.
Class imbalance
In this study, the target variable had a ratio around 322/188, which can be accepted as a class imbalance. Class imbalance occurs when the number of instances in one class significantly outnumbers those in another class within a dataset. This can lead to biased models that perform well on the majority class but poorly on the minority class, thereby reducing overall model effectiveness. To address this issue, we employed techniques such as using class weights and the “scale_pos_weight” or “class_weight” parameters to adjust the balance between classes during model training.
Participant demographics
The demographic information of patients.
Classification analysis
Model hyperparameters.
Machine learning model results.
KNN: K-nearest neighbors, RF: random forest, GBM: Gradient Boosting Machine, DT: decision tree, XGB: XGBoost (extreme gradient boosting), LR: logistic regression, CAT: CatBoost, LGBM: Light Gradient Boosting Machine, SVC: support vector classifier.

Area under the curve.

Confusion matrices of ML models. Confusion matrices illustrating each model’s classification performance (rows: actual class; columns: predicted class). The “0” class corresponds to patients requiring only one maneuver, whereas the “1” class indicates multiple maneuvers were needed. Diagonal entries represent correct classifications (true negatives and true positives), while off-diagonal cells reflect misclassifications (false positives and false negatives). Higher counts along the diagonal suggest more accurate predictions.
Feature importance of models
The feature importance plots for nine different machine learning models were generated to evaluate the significance of various features in predicting the target variable (Figure 3). Age consistently appears as one of the most significant features across several models including KNN, RF, GBM, XGB, LR, CAT, and LGBM. This indicates that age is a crucial factor influencing the target variable. The positive values in most models suggest that an increase in age positively impacts the prediction. Hearing loss is another important feature prominently contributing to the predictions in models like KNN, DT, GBM, XGB, and LGBM. The feature consistently shows a positive contribution, indicating that the presence of hearing loss influences the target variable positively. Neck problems are highly significant in RF, GBM, XGB, CAT, and LGBM. This feature’s importance is highlighted especially in tree-based ensemble methods. The positive values suggest that neck problems are positively associated with the target variable. Diabetes mellitus consistently appears as an important feature across multiple models, including KNN, RF, GBM, DT, XGB, LR, CAT, and LGBM. The impact of diabetes mellitus varies slightly across models but generally contributes positively. SCC Canal’s importance is observed in models such as KNN, RF, GBM, DT, XGB, and CAT. This suggests a notable influence on the target variable. Generally, the feature shows a positive contribution across the models. Gender is less consistently important compared to other features, yet it appears in models like KNN, RF, DT, GBM, XGB, and CAT. The contribution of gender is relatively mixed, with both positive and negative values across different models. The number of comorbidities shows variable importance across models, with a noticeable contribution in KNN, RF, GBM, DT, and CAT. The feature’s impact is mixed, with a tendency to positively influence the predictions. Hypertension appears as a significant feature in models such as KNN, RF, GBM, DT, XGB, LR, CAT, and LGBM. The positive contribution indicates that hypertension is positively correlated with the target variable. Feature importance of models. Feature importance plots derived from each model, ranking predictors by their relative contribution to classification decisions. Larger bars indicate stronger influence on the model’s output. In panels where bars extend to the left or right, negative and positive contributions are distinguished accordingly (i.e., features on the left exert a suppressive effect, while those on the right enhance the predicted outcome). Clinical variables such as age, comorbidities, and BPPV subtype can thus be compared to determine which factors most substantially drive the prediction of maneuver requirements.
Discussion
In clinics where dizziness and related disorders are studied, BPPV (benign paroxysmal positional vertigo) is the most frequently encountered disorder. 23 Identifying the characteristics of BPPV is crucial in terms of time, workforce, patient satisfaction, and financial burden. 24 Although the factors influencing the number of maneuvers in BPPV have been extensively studied in the literature, no studies employing machine learning for this purpose have been found. Therefore, this study aimed to model the demographic factors and comorbidities affecting the number of maneuvers required for BPPV treatment. For this purpose, nine models were developed to distinguish between BPPV cases that resolved with one maneuver and those requiring multiple maneuvers. Among these models, the AUC range was 0.71 to 0.79, indicating a high prediction accuracy.
BPPV is an age-related condition that is most commonly observed in individuals between 40 and 60 years of age.7,25 Studies have shown that it is approximately 2 to 3.2 times more prevalent in women than in men.7,25 The response rate to a single-session canalith repositioning maneuver varies between 37% and 87%. 7 Success rates improve further with repeated maneuvers. In our study, the mean age of the participants was 56.2 ± 14.0 years. Females comprised 75% of the participants, and the success rate of the single-session treatment was 36%. These findings are consistent with previously reported data in the literature.
The Gradient Boosting Machine (GBM) demonstrated the highest AUC score (0.788), which suggests that it is the best model at distinguishing between BPPV patients who recover with a single maneuver and those who require multiple maneuvers. AUC is a robust measure for imbalanced datasets, as it evaluates how well the model can differentiate between classes, 26 making GBM the top performer when the goal is overall classification performance across the entire dataset. This is particularly important in clinical scenarios where both false positives (predicting a need for multiple maneuvers when only one is required) and false negatives (predicting a single maneuver when multiple are needed) must be minimized. Given its strong AUC score, GBM can be considered a highly reliable model for predicting maneuver requirements in BPPV patients.
However, when focusing on precision, which is the ability to correctly identify patients who require multiple maneuvers while minimizing false positives, Logistic Regression (LR) achieved the best score (0.821). High precision is critical in clinical decision-making, as overestimating the need for multiple maneuvers could lead to unnecessary treatments, increasing patient burden and healthcare costs. 27 Therefore, LR may be an ideal model when the primary goal is to ensure that patients are not subjected to unnecessary additional treatments.
XGBoost (XGB) emerged as the top-performing model in terms of both recall (0.879) and F1-score (0.809). With the highest recall score, XGB excels at identifying patients who require multiple maneuvers, minimizing false negatives. This is particularly important in clinical settings where missing a patient in need of additional treatment could lead to prolonged symptoms and delayed recovery. 27 Furthermore, XGB’s high F1-score demonstrates its ability to strike a balance between precision and recall, making it a robust model for cases where both over-treatment and missed treatment must be minimized. These qualities position XGB as a highly effective tool for ensuring that patients who need further maneuvers are accurately identified, while also maintaining a balance between avoiding unnecessary maneuvers and ensuring complete recovery.
Several previous studies have investigated the factors influencing the need for multiple maneuvers.6–8 Korkmaz et al. 7 reported that age, gender, canal type, and symptom duration did not affect the number of maneuvers required for treatment. However, the authors stated that hypertension increased the number of maneuvers needed for successful treatment. Hypertension is a significant vascular condition that can reduce the perfusion of the vestibular organ. Ischemia may lead to more extensive otolithic debris formation than usual. Therefore, hypertension accompanying BPPV may contribute to the need for multiple maneuvers to reposition the otolithic particles effectively. In their study, Moreno et al. 6 reported that age and gender did not influence the number of maneuvers required for successful treatment, but the etiology of BPPV did. The authors stated that BPPV cases resulting from head trauma required more maneuvers compared to idiopathic BPPV cases. Another study 8 found that gender and age did not affect the number of maneuvers needed; however, lateral canal BPPV and bilateral BPPV were associated with an increased requirement for multiple maneuvers. The authors attributed this to the lower efficacy of repositioning maneuvers for lateral canal BPPV and the greater difficulty in resolving bilateral BPPV cases.
While the factors determining the number of maneuvers varied across models, the most notable finding was that hypertension was associated with an increased number of maneuvers in eight models (KNN, RF, GBM, DT, XGB, LR, CAT, and LGBM). Following hypertension, age was identified as a negatively influencing factor in seven models (KNN, RF, GBM, XGB, LR, CAT, and LGBM), indicating that younger patients required fewer maneuvers. These predictions are strongly aligned with the literature.28–30 Conversely, the variable with the least impact on the number of maneuvers across all models was gender. Despite a 3:1 female-to-male ratio in the study, gender did not significantly affect the number of maneuvers required, which is an intriguing finding. Similar female-to-male ratios have been reported in other studies. Additionally, it is known that osteoporosis and increased hormonal fluctuations with age in women can affect the BPPV treatment process. 31 The discrepancy in this study’s results compared to the literature is believed to stem from the non-homogeneous age distribution. Future studies plan to model based on age groups.
Across all models, the absence of hearing loss, neck problems, and hypertension significantly reduced the number of maneuvers required. Notably, hearing loss was a significant factor. Although many previous studies on BPPV found no association between hearing loss and BPPV, a 2023 meta-analysis 32 discussed the relationship between hearing loss and BPPV in the presence of early-onset osteoporosis before the age of 65.
Cervical problems are commonly observed in elderly patients. 7 These patients with cervical disorders have a limited range of motion in neck flexion and extension. As a result, performing repositioning maneuvers on them is quite challenging. In most cases, repositioning maneuvers are either modified or performed using soft cervical collars for safety reasons. Tan et al. 29 recommended TRV chairs as the first choice for BPPV patients due to these challenges, highlighting that TRV chairs enhance treatment efficacy and shorten treatment sessions. Similarly, our study demonstrated that cervical issues pose a significant risk factor for multiple maneuvers. Additionally, we attribute the influence of age on the number of successful maneuvers to the ease of performing the procedure. In younger individuals, head and neck positioning can be adjusted more easily, thereby increasing the success rate of repositioning maneuvers.
Although our findings show promise, several factors limit the broader applicability of this study. First, data collection from a single clinical center restricts external validity, as populations in different regions or healthcare settings might present with varying demographic and clinical profiles. Second, the limited sample size may reduce the model’s capacity to detect less common BPPV subtypes and potential nuances in patient responses. Third, persistent class imbalances—particularly among patients requiring multiple maneuvers—can skew performance metrics, even when techniques such as class weighting are applied. Fourth, our definition of the outcome (i.e., “multiple maneuvers”) may not fully reflect real-world complexity, where issues like patient adherence, comorbid conditions, and clinical judgment often influence treatment courses. Lastly, algorithmic stochasticity introduces variability in machine learning models, as random initialization and sampling can produce slightly different results without rigorous seeding or repeated runs. Future research should address these limitations by recruiting larger, more diverse populations, refining endpoint definitions, and conducting thorough validation analyses to ensure robustness and generalizability.
The comprehensive analysis of feature importance across multiple models reveals several critical insights into the predictors of the target variable. Age, hearing loss, neck problems, and diabetes mellitus emerge as consistently significant features, suggesting their crucial role in influencing the target variable. This aligns with existing literature where demographic factors and specific health conditions are pivotal in medical predictions and health assessments.29,33,34 Tree-based models, such as Random Forest, Gradient Boosting Machine, XGBoost, CatBoost, and LightGBM, particularly highlight the importance of neck problems, SCC canal issues, and diabetes mellitus, indicating their strong predictive power in complex ensemble methods. The Logistic Regression model also underscores the significance of these features, emphasizing their robustness across different modeling techniques. Interestingly, the gender exhibits variability in its importance and contribution, reflecting possible underlying interactions with other features or differential impacts across the population. The number of comorbidities and hypertension also show varied importance, which could be attributed to their diverse influence in different contexts or subgroups within the data. In conclusion, this multi-model feature importance analysis provides a robust framework for identifying key predictors and understanding their contributions to the target variable. The findings underscore the need for personalized approaches in medical predictions, considering the prominent features highlighted across various models. Future research could explore the interactions between these features and their combined effects on predictive accuracy, enhancing the understanding and application of machine learning in healthcare settings.
This study highlights the importance of considering various demographic and comorbid factors in the treatment of BPPV. Machine learning models with high predictive accuracy can assist in identifying patients who may require multiple maneuvers, thereby optimizing treatment plans and improving patient outcomes. Our findings indicate that, overall, the GBM model is the most effective algorithm for determining the number of maneuvers required for BPPV treatment. Using this algorithm, machine learning-based software can predict the necessary number of maneuvers for successful BPPV treatment (Supplemental 1: https://bppvclassifier.streamlit.app/). Future research should focus on age-specific modeling and include additional factors such as osteoporosis to enhance the robustness of predictions.
Supplemental
As part of our findings, machine learning-based software has been developed to predict the number of maneuvers required for successful BPPV treatment: https://bppvclassifier.streamlit.app/
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors whose names are listed immediately certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge, or beliefs) in the subject matter or materials discussed in this manuscript.
