Abstract
BACKGROUND:
Periodontitis (PD), a form of gum disease, is a major public health concern as it is globally prevalent and harms both individual quality of life and economic productivity. Global cost in lost productivity is estimated at US$54 billion annually. Moreover, current PD assessment applies only after the damage has already occurred.
OBJECTIVE:
This study proposes and tests a new PD risk assessment model applicable at point-of-care, using supervised machine learning methods.
METHODS:
We compare the performance of five algorithms using retrospective clinical data: Naïve Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Tree (DT).
RESULTS:
DT and ANN demonstrated higher accuracy in classifying the patients with high or low PD risk as compared to NB, LR and SVM. The resultant model with DT showed a sensitivity of 87.08% (95% CI 84.12% to 89.76%) and specificity of 93.5% (95% CI 91% to 95.49%).
CONCLUSIONS:
A predictive model with high sensitivity and specificity to stratify individuals into low and high PD risk tiers was developed. Validation in other populations will inform translational value of this approach and its potential applicability as clinical decision support tool.
Keywords
Introduction
Similar to other chronic diseases, periodontitis (PD) progression is insidious and chronic with subtle symptomology that becomes apparent after much damage has been done to the underlying bone [1, 2, 3]. As the severity of the PD increases, the underlying alveolar bone and associated periodontal ligaments are damaged rapidly [1]. The 2018 report from the Centers for Disease Control and Prevention (CDC) estimated that 42% of the U.S. population more than 30 years of age and 66% of the population more than 65 years of age have some form of PD [4]. Notably PD also represents a potentially modifi able risk factor if treated in early stages, and may reduce the onset and progression of some systemic diseases, including diabetes [5, 6, 7, 8, 9, 10, 11, 12]. A study reported that a periodontal intervention in individuals who were recently diagnosed with Type 2 diabetes (T2DM) reduced overall healthcare costs by $ 1,799 over two years [14]. Hence, prevention, early recognition, and intervention are projected to not only decrease the bone damage and tooth loss but also reduce exacerbation of some systemic diseases that are driven by inflammatory mechanisms that are shared across many common chronic diseases.
Due to the complex relationship between oral and systemic diseases, multiple factors need to be considered simultaneously to effectively evaluate oral or systemic health risk. The current evidence base suggests that modeling medical variables, such as blood glucose levels, duration of T2DM, and severity of T2DM, along with dental parameters, such as tooth loss, periodontal pocket depth, gingival bleeding, and radiographic bone loss, have utility in assessing PD risk in dental settings [15, 16, 17, 18, 19]. However, interactions among multiple candidate risk factors for PD converge to make risk assessment at point of care (POC) difficult for medical provider. Moreover, primary care providers (PCPs) currently lack training and availability of analytical tools in medical setting to inform medical decision-making and care delivery at POC [5]. Hence, effective approaches for mining, processing and evaluating PD risk based on historical clinical and demographic data captured during the patient visit in medical and dental records are needed to develop models that process these data and create algorithms that project relative risk for disease. Demonstration of generalizability of predictive accuracy would support potential for creation of clinical decision support tools for translation into clinical practice.
Machine learning (ML) approaches to PD risk evaluation have been previously applied to predict and diagnose PD and PD risk in dental settings with electronic capture of dental variables [17, 19]. However, to date, there are no models that assess PD risk in an interdisciplinary setting. Few studies have applied ML to model PD risk. Two historic studies have attempted PD prediction using ML algorithms and were undertaken in dental settings. A study by Shankarapillai et al. assessed PD risk by comparing the Levenberg Marquadt algorithm (LMA) and the Scaled Conjugate Gradient (SCG) algorithm [19]. The study utilized data of 230 patients and 16 variables including history of diabetes and hypertension as medical variables and scored the PD risk on scale of 1 to 5. The study showed that LMA (a variant of backpropagation with an ANN) outperformed the SCG algorithm. A second study by Ozden et al. attempted to diagnose periodontal diseases using SVM, DT and ANN algorithms predicting classification across six periodontal conditions [17]. The study utilized 100 training and 50 test sets of 150 patients with 11 dental features. The total accuracy predicted by SVM and DT was 98%, however the study did not model medical parameters.
Nagarajan et al. further demonstrated capacity to apply bioinformatics classification algorithms including linear discriminant analysis (LDA), quadrant discriminant analysis (QDA), Naïve Bayes (NB) and Support Vector Machines (SVM) analysis to explore risk profiling of patients with gingivitis and PD based on salivary analysis of molecular marker expression across the two conditions [20]. In a separate study, these investigators further applied LDA, SVM and NB classification algorithms to predict individuals most at risk for disease progression among patients exhibiting a range of PD severity and treatment modalities based on levels of 27 biomarkers monitored in serum, saliva and gingival plaque. Thus, application of classifier algorithms demonstrates high utility in analysis of PD, a condition characterized by heterogeneous etiological origin and complex host response where prediction of outcome is singularly challenging [21]. The objective of this study was to create a PD risk assessment prototype through assessment of relative risk contributed by historical clinical, demographic, behavioral exposures, and other relevant variables captured in an integrated electronic medical and dental record (iEHR). Availability of such a tool would allow early detection, increased opportunities for improving oral health status and reduce incidence of underlying infectious and inflammatory processes contributing to other systemic diseases. A further goal of the study was to identify and optimize the most informative subset of clinically available attributes for PD risk prediction through the application of feature selection methods. This study is among the first studies to create a PD risk assessment model using the machine learning approach and combining assessment of both relevant clinical data and dental data captured in an iEHR, with applicability in a medical, as well as dental, setting.
Methods
Data retrieval
The Institutional Review Board of the healthcare organization was reviewed and approved the study. To achieve the objective of the study, retrospective data were extracted from the large healthcare organization’s enterprise data warehouse (EDW) for patients who visited dental and medical clinics between 2010 and 2016. Candidate factors previously associated with PD are catalogued in Table 1 and were specified for collection and analysis.
List of all data features included in the prediction model development
List of all data features included in the prediction model development
A cohort of 11,048 (4,766 positive cases and 6,282 controls) patients with no missing data was randomly selected with enrollment limited to Non-Hispanic/Latino ethnicity and White/Caucasian race since other races and ethnicities were underrepresented. The goal of predicting PD was treated as a classification problem, with patients who have moderate or severe PD being classified as ‘high risk PD candidates’ (positive cases) and mild or no PD as ‘low risk PD candidates’ (controls). We used Centers for Disease Control and Prevention (CDC)-American Academy of Periodontology (AAP) definition to define the severity of PD [6]. Frequency distribution analysis was first performed to summarize the data in histogram formats illustrating relative distributions. For optimizing the accuracy of the model, outliers were removed from the dataset. Categorical discretization was done for some of the variables, including high density lipids (HDL), low density lipids (LDL), total cholesterol and, triglycerides, according to Adult Treatment Panel III classification [22]. Performance estimation was conducted using a stratified 10-fold cross validation approach. Table 1 summarizes the list of all data features include in the prediction model.
Machine learning algorithms
To test the generalizability of supervised ML in classifying patients for PD risk, the study focused on five well-known methods: Naïve Bayes (NB), Logistic Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Decision Tree (DT). The study utilized the implementation of these ML algorithms available in the Waikato Environment for Knowledge Analysis (WEKA) open-source tool [23].
To identify a representative subset of attributes, a univariate filter, i.e., information gain with the ranker method was employed [24]. To validate the predictive performance of the resultant models, a subset of 10% (1,104/11,048) of the total data set was kept aside.
Performance measures
To assess the prediction model performance of different algorithms, the study compared ML algorithms using the following performance measures:
The area under the ROC curve (AUC) as defined by Hand and Till for binary classification [25].
where Sensitivity, also termed recall, is the ratio of the number of correctly classified ‘positive cases’ instances to the total number of ‘controls’ instances.
where Precision is the ratio of the number of correctly classified ‘positive cases’ instances to the total number of instances that are classified as ‘controls’.
where Specificity is the ratio of the number of correctly classified ‘positive cases’ instances to the total number of instances that are classified as ‘controls’.
where Accuracy is the ratio of the number of correctly classified instances to the total number of instances.
F-measure is the harmonic mean of precision and recall.
Matthew’s Correlation Coefficient (MCC) considers the accuracy and error rates of high PD risk and low PD risk and is calculated by the following equation:
A paired t-test was performed to find if there were any statistically significant differences in the performance for the various algorithms.
Patient characteristics for dataset excluding the evaluation dataset
The overall mean age of the patients was 47.36 years (
Feature selection
The application of information gain with the ranker method to the dataset of 9,944 patients generated a representative subset of 71 variables from 190 variables as shown in Fig. 1.
The results of feature selection.
Machine learning: The results of the performance estimated through 10-fold cross validation are shown in Table 2. The ROC curves are shown in Fig. 2.
The results show that DT significantly outperformed the other algorithms in terms of sensitivity, specificity, AUC, F-measure, MCC, and precision (
DT and ANN demonstrated higher accuracy in classifying the patients with high or low PD risk as compared to NB, LR, and SVM. The resultant model with the DT algorithm classified
In this study, the specificity and precision for DT achieved 90.20% [95%CI (89.20–91.10)] and 90.20% (90.10–90.20)], respectively. The rates of precision in this study were slightly lower than those reported by Shankarapillai et al. [19], where the precision was 98% for DT in diagnosing PD. However, the rates of specificity in the current study were higher than those reported by Shankarapillai et al. (specificity
Results of classifiers after feature selection
ROC curves for the algorithms.
This study demonstrated successful creation of algorithms with high capacity to classify PD risk. Using univariate filtering, the study selected the subset of features with non-zero information gain and recognized that random blood glucose, dental calculus, number of missing teeth, lipid panels (including triglyceride levels and HDL), diastolic blood pressure, BMI, oral hygiene status determined by the dental provider, frequency of tooth brushing, diabetes status, tobacco use status, age, gender, and PPD displayed highest performance.
Overall, the results of performance assessment of algorithms created to classify relative risk for PD showed that DT demonstrated highest performance for disease risk classification than other ML algorithms. Notably, the F-measure, which represents a harmonic mean of precision and recall, has a greater importance in evaluation as it evaluates the relationships between high PD risk instances within the data and those given by the classifiers. Despite the association of high type 1 error with cross-validated t-tests, evidence suggests that cross-validated t-tests are powerful in determining whether a learning algorithm outperforms another on a particular learning task [28]. To determine the real difference between algorithms (seen in type II errors), this study statistically analyzed the results of the algorithms by conducting a 10-fold cross validated t-test on F-measure. The results suggest that for such type of data, ANN or DT can be used for modelling. DT offers the additional advantage of being interpretable, however, allowing people to review the decision rules that have been learned, which may be important in gaining the trust of clinical providers.
Selection of data variables led to the novel observation that measurement of probing depths at interproximal tooth surfaces significantly outperformed measures taken at the buccal and lingual surfaces. The study posits that the apparent superiority of interproximal measurement may be attributable to the interproximal bone, which is more coronal in position than the labial or lingual/palatal bone. A slight deepening of the pocket in the interproximal areas could more easily impact the bone. This observation adds to the current standard of care with respect to commonly applied indices used in clinical dentistry (e.g., Russell’s periodontal index, Loe and Silness gingival index) and also are consistent with the definition proposed by AAPD for periodontal diseases, where interproximal surfaces are assessed to determine the severity of PD [29, 30]. The progression of PD involves furcation areas of the multi-rooted teeth in maxillary (upper jaw) and mandibular (lower jaw). Tooth surfaces, such as mesiolingual, mesiobuccal, distobuccal and distolingual of maxillary and mandibular molars, were identified as significant determinants during feature selection. For calculating clinical attachment loss, location of cemento-enamel junction (CEJ) is necessary. Interestingly, the sites identified by feature selection are also used as reference lines for determining relative clinical attachment (RCAL) loss when it is difficult to locate CEJ [31]. Moreover, these sites were also consistent with outcomes of a study that investigated the deepest crevice points in the mouth to provide practitioners with minimum number of sites to probe [32]. Clinical implications of the model.
In the present study, random blood glucose (RBG) level was identified as a significant factor contributing to PD risk. This observation further reinforces association between PD and T2DM and would have considerable relevance in a clinical setting.
Notably, Medicare and Medicaid status were incorporated into the model to explore the relationship between insurance status and PD risk and were retained as significant variables. These findings suggest that there may be a possible correlation between Medicare/Medicaid status and PD risk especially in aging populations and attests to the value of machine learning to discover such relational patterns.
The study has some limitations. The study data was collected at a single healthcare system. Notably, the study population of this large healthcare organization’s service area extends across largely rural communities in northern, western, and central Wisconsin counties [33]. Residents of this area disproportionately exceed the State’s average population statistics for persons considered to be in lower socio-economic strata and also for individuals
Clinical relevance and future directions: potential translational value of the risk classification algorithms and their applicability to the future development of clinical decision support tools.
Practice patterns for dental and medical healthcare delivery, their respective individual reimbursement systems, and the current state of dental and medical academic practice continue to reflect the siloing of medical and dental healthcare delivery models [13, 33, 34]. The increasing scientific evidence supporting oral and systemic disease associations has cast primary care providers (PCPs) as proactive participants in establishing oral and systemic care for patients with chronic diseases, including PD [5]. Establishing interdisciplinary care for improving healthcare practice and expanding the access to preventive oral health care through PCPs has been proposed by the National Academy of Medicine and others [34]. Notably, evidence supports that patients visit their PCPs with higher periodicity compared to dental providers [35]. Based on the 2017 report of the National Center for Health Statistics, nearly 85% of the adult U.S population contacted a medical health care professional in the past year [36]. Proportionately it was also noted that about 64% of adult U.S. population had a dental visit in 2016 [37]. This paradigm shift in delivering integrated care has necessitated the incorporation of oral-health screenings at point-of-care in a medical setting [5]. Using a risk based approach, PCPs could assess the risk of developing a future disease based on evidence gathered from current and documented retrospective clinical data of patients [9, 16]. According to the Agency for Healthcare Research and Quality report, clinical decision support tools (CDST) can potentially lower costs, improve efficiency and reduce patient inconvenience [38]. The growing recognition of machine learning (ML) approaches using data in clinical settings is considered as a significant opportunity, not only to improve patient care but also to support quality of care initiatives [39, 40]. A better appreciation of the systemic effects and well-known periodontal risk factors along with behavioral factors has shifted the focus positing that collectively, the sum of risk contributed by a combination of individual factors provides better predictive power than with any single risk factor.
Findings from this study suggest that ML methods would be effective when applied to improving patient care through early detection of PD or to new preventive approaches to PD by assisting healthcare professionals to evaluate patients’ PD risk based on a combination of historical data and current status. As many factors might potentially affect individual variability in developing PD risk of a patient, this study considered a wide variety of predictive factors, including oral factors, such as dental calculus and number of teeth present, which are also evaluable in a medical setting. The study analyzed multiple widely available ML approaches to identify those with highest potential for translation into clinical care to assist healthcare providers in making effective and knowledge-driven decisions. Future steps include incorporation of such models into the iEHR and validating model performance in a clinical setting.
Conclusions
To the best of our knowledge, this study is the first to apply ML algorithms in the context of PD risk assessment in an interdisciplinary setting by comparing five predictive ML algorithms: NB, LR, SVM, ANN, and DT and evaluated their sensitivity and specificity for stratifying PD risk. Evaluation of performance of these algorithms in other populations is essential to demonstrate their generalizability and relevance and determine their potential translational value and utility as clinical decision support tools in the medical setting.
Footnotes
Acknowledgments
The authors would like to acknowledge Marshfield Clinic Research Institute and University of Wisconsin-Milwaukee. The authors would furthermore like to thank Dr. Ingrid Glurich for reviewing the initial version of the manuscript. Funding was provided by Marshfield Clinic Research Institute.
Conflict of interest
The authors declare no real or potential conflicts of interest.
