Abstract
Background
Poorly regulated and insufficiently maintained medical devices (MDs) carry high risk on safety and performance parameters impacting the clinical effectiveness and efficiency of patient diagnosis and treatment. After the MD directive (MDD) had been in force for 25 years, in 2017 the new MD Regulation (MDR) was introduced. One of the more stringent requirement is a need for better control of MD safety and performance post-market surveillance mechanisms.
Objective
To address this, we have developed an automated system for management of MDs, based on their safety and performance measurement parameters, that use machine learning algorithm as a core of its functioning.
Methods
In total, 1997 samples were collected during the inspection process of defibrillator inspections performed by an ISO 17020 accredited laboratory at various healthcare institutions in Bosnia and Herzegovina. This paper presents solution developed for defibrillators, but proposed system is scalable to any other type of MDs, both diagnostic and therapeutic.
Results
Various machine learning algorithms were considered, including Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB) and Logistic Regression (LR). In addition, random forest regressor and XG Boost algorithms were tested for their predictive capabilities in the field of defibrillator output error prediction. These algorithms were selected because of their ability to handle large datasets and their potential for achieving high prediction accuracy. The highest accuracy achieved on this dataset was 94.8% using the Naive Bayes algorithm. The XGBoost Regressor with its r2 of 0.99 emerged as a powerful tool, showcasing exceptional predictive accuracy and the ability to capture a substantial portion of the dataset's variability.
Conclusion
The results of this study demonstrate that clinical engineering (CE) and health technology management (HTM) departments in healthcare institutions can benefit from proposed automatization of defibrillator maintenance scheduling in terms of increased safety and treatment of patients, on one side, and cost optimization in MD management departments, on the other side.
Keywords
Introduction
Medical staff are nowadays more confident while performing diagnosis and treatment due to sophistication of Medical Devices (MDs) which allows them a better data analysis and control over diagnosis or treatments. Despite stringent international regulations governing the life-cycle of these devices, instances of malfunctions persist, leading to reported incidents of patient injuries and fatalities.1–8 Various incidents involving patient injuries and incidents with death outcomes caused by medical devices are reported every year by users, healthcare professional or manufacturers. Some of the world's the most prominent databases containing this data are the FDA Manufacturer's Facility Device Experience (MAUDE) database 9 and European Database on Medical Devices (EUDAMED) database. 10
The number of these incidents is alarming, and suggests that medical device post-market surveillance, supervision mechanisms and maintenance strategies are not implemented efficiently to ensure patient safety and quality of healthcare. 10
Despite the existence of self-test protocols that are usually built in the MDs software, often medical professionals cannot recognize performance malfunction which directly affects patient diagnosis or treatment.1,2,4,5,11–14 Such malfunctions are seen as huge deviations of MDs patient related output parameters. For a defibrillator, this means that critically ill patient will be either treated with higher or lower energy level than needed which results in failed resuscitation or burned patients.15,16 Events like this are efficiently prevented in those healthcare institutions that implemented adequate supervision mechanisms, such as periodical parameter measurement of safety and performance characteristics.15,16
This has been addressed with the new MDR 2017/745 where the need for better MD safety and performance post-market surveillance mechanisms is emphasized. Post-market surveillance strategies are implemented and enforced differently within the legal framework of each country in the world, 11 and even though, the high rates of MD malfunctions which are resulting in patient injuries suggest that lacking approaches for MD management should be redefined and improved. For instance, in Bosnia and Herzegovina, the post market surveillance mechanism and MD quality assessment are defined through an independent safety and performance inspections for 11 different MD types with measuring functions.2,5,14,17–23 This has been regulated following the rules of Legal Metrology. According to this framework, defibrillators are periodically tested by legally appointed, ISO 17020 accredited inspection laboratory. 24 All safety and performance measurements and defibrillator information, such as serial number, type, manufacturer and location are stored in developed database. 24
Acknowledging the power of data analysis, this study explores the potential of collected data to predict MD performance and anticipate future failures. The research focuses specifically on defibrillators, critical medical devices used in healthcare institutions. The hypothesis driving this investigation is that machine learning algorithms can provide accurate predictions about defibrillator performance and potential failures, thereby optimizing current medical device management strategies.
According to Taghipour 25 annual medical device maintenance and management cost in healthcare institution is approximately 1% of the total budget. Healthcare institutions, unfortunately cut these costs, so the usage of MDs as a consequence results in higher rate of incidents with serious injuries or deaths of the patients. Also, they state that numerous optimization models for medical device maintenance have been developed, but healthcare institutions still do not benet from these methods as other industries do. Due to increased complexity of healthcare institution environment and increased technological complexity of MDs, performing maintenance strategies in traditional manner is causing a lot of difficulties. Traditional MD management is based on software programs 26 that provide continuous updates, increase inventory accuracy, documents maintenance history and data analysis/reports. By introducing machine learning into healthcare, in terms of MD management strategies, presented challenges can be resolved since raw data can be transformed into useful information and analysis of the big data collections would improve outcomes and reduce the burden on the healthcare system. Machine learning algorithms can learn from experience with respect to some task and performance measurements. 27
Materials and methods
Data mining in healthcare is an emerging field of high importance for providing prognosis and a deeper understanding of medical data. Data mining applications in healthcare include analysis of health care centers for better health policymaking and prevention of hospital errors, early detection, prevention of diseases and preventable hospital deaths, more value for money and cost savings, and detection of fraudulent insurance claim. Researchers are using data mining techniques in the medical diagnosis of several diseases such as diabetes, stroke, cancer and heart disease.
This study incorporates a diverse dataset detailing defibrillator models, external inspections, energy levels, measured values, and associated error metrics. Data preprocessing ensures uniformity through handling missing values, standardizing energy levels, and categorizing error measurement. Datasets are split into training and validation sets for algorithm exposure and evaluation. Standard classification metrics gauge each algorithm's effectiveness in predicting defibrillator performance and detecting potential inaccuracies. Descriptive statistics and statistical tests inform our understanding of feature distributions and discern significant patterns. This methodology provides the foundation for exploring machine learning algorithms, aiming to identify effective approaches for predicting defibrillator performance and addressing potential inaccuracies.
This study was conducted on the basis of defibrillator inspections performed by an ISO 17020 accredited laboratory at various healthcare institutions in Bosnia and Herzegovina. In total, 1997 samples were collected during the inspection process. Of these, 1311 samples were collected between 2017 and 2020 and were used to train the model for performance prediction. The remaining 686 samples, collected between 2020 and 2022, were used to feed the created predictive model and to validate the method. Among the key features within the dataset are parameters such as ‘Verification is correct’, ‘External inspection’, and specific error measurements like ‘Measurement error’ for our puproses. These features are pivotal in capturing essential nuances related to defibrillator functionality, allowing for a nuanced examination of their performance characteristics. The dataset has undergone a preprocessing phase to handle missing values, ensuring its compatibility with machine learning algorithms for accurate and reliable analysis. This preprocessing step is crucial for maintaining the integrity of the dataset, enabling robust and meaningful insights to be derived. The primary purpose of this curated dataset is to facilitate evidence-based clinical engineering practices. The visualization illustrates the distribution of data between the training and testing sets following a train-test split. The blue bar represents the training set, comprising approximately 79.97% of the entire dataset. Conversely, the orange bar represents the testing set, accounting for 20.03% of the dataset. Notably, approximately 79.97% of the dataset is allocated to the training set, while the remaining portion, constituting 20.03%, is designated for the testing set. This visualization provides a clear depiction of how the dataset is partitioned for training and evaluation purposes, facilitating an understanding of the relative sizes of the training and testing sets Figure 1.

Train and test percentages.
Various machine learning algorithms were considered, including Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Random forest regressor (RFG), XG Boost regressor (XGBoost) and Logistic Regression (LR). These algorithms were selected because of their ability to handle large datasets and their potential for achieving high prediction accuracy.
During training, the algorithms were provided with the input features (i.e., defibrillators data) and the corresponding output labels (i.e., pass or fail status based on the dataset). The algorithms used this data to learn how to accurately predict the classification status of defibrillators based on their performance data. The performance of the trained models was then evaluated using the validation set. Metrics such as accuracy, precision and sensitivity were used to assess the models’ performance. These metrics provided insight into how well the models were able to predict the classification status of ventilators that were not previously processed based on their performance data. Comparing the performance of the machine learning algorithms reveals nuances in their predictive accuracy and effectiveness in addressing defibrillator performance issues. Each algorithm exhibits distinct strengths and considerations, influenced by factors such as model complexity, interpretability, and predictive power.
In evaluating the effectiveness of machine learning models, various metrics are employed to provide a comprehensive assessment. These include:
Accuracy: Measures the proportion of correct predictions among the total predictions made, offering a general sense of the model's overall effectiveness. Precision: Indicates the proportion of positive identifications that were actually correct, crucial in contexts where false positives have significant consequences. Sensitivity: Reflects the proportion of actual positives correctly identified, vital in medical scenarios like defibrillator performance prediction where missing a true positive can have serious implications.
The specific parameter settings chosen for each algorithm were based on randomized values that could potentially achieve the desired results. These parameters are crucial in determining the accuracy and performance of the model, and their selection requires careful consideration to obtain the best results. Below are the Confusion Matrix tables (Tables 1, 2, 3 and 4) which depict the results that were achieved when implementing the selected machine learning algorithms. The results achieved across all four implemented algorithms are presented in Table 5, which shows results of the accuracy, precision, and sensitivity.
Performance assessment of decision tree algorithm.
Performance assessment of decision tree algorithm.
Performance assessment of Random Forest algorithm.
Performance assessment of Naïve Bayes algorithm.
Performance assessment of Linear Regression algorithm.
Performance comparison across all predictive models.
These results suggest that the Naive Bayes algorithm performed the best among the four algorithms in classifying the dataset, achieving the highest accuracy, precision, and sensitivity. Possible explanation for such results may be that the Naive Bayes algorithm is a probabilistic algorithm that assumes independence between the features, making it simple yet effective in classification tasks. The Decision Tree algorithm also performed well, achieving high accuracy, precision, and sensitivity values. The Random Forest algorithm, which uses an ensemble of decision trees, did not perform as well as expected, with a lower accuracy and sensitivity than the other algorithms. The Logistic Regression algorithm also achieved good results, but not as high as those of the Naive Bayes algorithm.
Naive Bayes algorithm was found to be the most effective in classifying the dataset in this study. However, the performance of each algorithm may vary depending on the nature of the dataset and the features used. These findings can be useful in the development of future studies on classification tasks and may aid in the development of more accurate and reliable machine learning algorithms for real-world applications. In terms of accuracy, the naive Bayes algorithm achieved the highest accuracy value of 94.8%, followed closely by the decision tree algorithm with an accuracy of 93.3%. The logistic regression algorithm had an accuracy of 91.6%, while the random forest algorithm had the lowest accuracy of 90.2%. Regarding precision, the Naive Bayes algorithm again achieved the highest precision value of 92.7%. The decision tree and random forest algorithms had very similar precision values of 91.1% and 91.5%, respectively, while the logistic regression algorithm had the lowest precision value of 90.4%. In terms of sensitivity, the Naive Bayes algorithm achieved the highest sensitivity value of 91%, followed by the logistic regression algorithm with a sensitivity of 90.3%. The decision tree algorithm had a sensitivity of 88.7%, and the random forest algorithm had the lowest sensitivity value of 89.6%.
When putting this research into perspective with previous research done on the topic, the developed DT classifier achieves comparable performance with the work of Badnjević et al. from 2019. 27 The higher overall performance in comparison to the work presented in 27 is due to a significantly larger dataset used that attributes for more variability between the defibrillators. Hence, the work presented here additionally confirms the potential of ML algorithms used for defibrillator performance and confirms the results presented in. 27 Hence, a step forward was taken and regressor algorithms were modeled to predict the individual measurement errors of defibrillators. Table 6 shows the performance comparison across regressors.
Performance comparison across regressors.
The Random Forest Regressor is trained on various features relevant to defibrillator functionality. The application of the Random Forest Regressor to the defibrillator performance dataset yielded noteworthy results. The Mean Squared Error (MSE), calculated at 36.54, provides insight into the precision of the algorithm in predicting. errors. Additionally, the R-squared Score (R²) at 0.30 indicates the model's ability to explain variance in the target variable. While the achieved results demonstrate a reasonable level of predictive accuracy, further exploration and fine-tuning of the model are warranted.
The XGBoost Regressor operates by constructing a multitude of decision trees iteratively. Each tree corrects the errors of its predecessor, gradually improving the model's predictive accuracy. The algorithm utilizes gradient boosting techniques, assigning more weight to instances with larger errors to prioritize the rectification of challenging predictions. The XGBoost Regressor, operating through iterative construction of decision trees, showcased exceptional predictive accuracy with an MSE of 0.45 and an impressive R² of 0.99. Focused on key columns related to measured values, errors, and allowed deviations, it proved highly effective in capturing variability within the dataset. The model's performance has profound implications for clinical engineering, enabling early identification of potential issues and contributing to enhanced patient safety.
This project serves as an excellent example of how AI, particularly machine learning, can elevate healthcare practices. By using data-driven insights, healthcare professionals have access to an innovative tool that aids in optimizing resources, reducing risks, and improving patient safety. As the integration of AI in medicine continues to evolve, initiatives like this pave the way for inventive approaches to medical device management, opening up a future where predictive analytics substantially contribute to the efficiency and effectiveness of healthcare delivery.
The highest accuracy achieved on this dataset was 94.8% using the Naive Bayes algorithm. This accuracy was used to predict the performance of defibrillators in post-market surveillance. The Random Forest Regressor, with its adaptability and resilience, demonstrated commendable error prediction accuracy. While achieving reasonable results, further refinement is required to unlock its full potential for proactive quality control and maintenance. The XGBoost Regressor emerged as a powerful tool, showcasing exceptional predictive accuracy and the ability to capture a substantial portion of the dataset's variability. Its implications for clinical engineering include early identification of potential issues, contributing to enhanced patient safety and device reliability.
These findings can be significant for future prospective because it offers a solution for performance prediction of defibrillators. By using machine learning algorithms, it is possible to identify potential issues with defibrillators and prevent potential failures that can lead to severe consequences for patients. This study provides evidence that machine learning algorithms can be effective tools for post-market surveillance of medical devices and can be used to improve patient safety.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
