Abstract
Background
Healthcare institutions throughout the world rely on medical devices to provide their services reliably and effectively. However, medical devices can, and do sometimes fail. These failures pose significant risk to patients.
Objective
One way to address these issues is through the use of artificial intelligence for the detection of medical device failure. This goal of this study was to develop automated systems utilising machine learning algorithms to predict patient monitor performance and potential failures based on data collected during regular safety and performance inspections.
Methods
The system developed in this study utilised machine learning techniques as its core. Throughout the study four algorithms were utilised. These algorithms include Decision Tree, Random Forest, Linear Regression and Support Vector Machines.
Results
Final results showed that Random Forest algorithms had the best performance on various metrics among the four developed models. It achieved accuracy of 94% and precision and recall of 70% and 93% respectively.
Conclusion
This study shows that use of systems like the one developed in this study have the potential to improve management and maintenance of medical devices.
Introduction
In the hospitals of today, or any other medical setting, it is practically impossible to provide any service, diagnostic or treatment without relying on medical devices at some point of the procedure. The reliance of healthcare sector eased the diagnostic process and improved treatment, thus improving patient outcome overall. 1 This reliance also created a challenge of maintaining and managing medical devices to ensure their proper function that is to the satisfaction of all those involved and to ensure they fulfil their role and potential.2,3
On the other hand technological development in recent decades lead to the advent of Artificial Intelligence (AI) which can be used to perform various tasks that are often challenging, or time-consuming to be performed by humans. 4 Systems that employ AI have been developed and used to solve vast array of problems in various fields of human endeavour. 5 One area in which AI can improve state of the current practices is healthcare. 6 In Healthcare AI has been used in diagnosis and monitoring of disease.7,8
This work explores and proves that intelligent systems employing AI can be used in medical settings to manage and monitor medical device, in particular, patient monitors. The goal of this article is to show that such systems can and should be integrated in patient monitors to increase their reliability and improve their utilisation in hospitals.
Patient monitors are essential tools in healthcare, providing continuous evaluation of a patient's condition and enabling quick identification of any deterioration. 9 These monitors can measure and record vital signs such as body temperature, heart rate, and blood pressure. 10 In emergency care, they play a crucial role in screening, triage, and ongoing evaluation of a patient's response to treatment. 11 Patient monitors provide healthcare professionals with crucial information about the status of the patients which can be used to detect and prevent the deterioration of the patient's status.
Although there are international standards and regulations governing medical devices, including mechanical ventilators,12–15 covering every aspect of the device's life cycle, device-related problems and setbacks are not uncommon. The number of reported incidents involving medical devices can be seen in registries such as the one by the Food and Drug Administration (United States of America) (FDA USA) called Manufacturer's Facility Device Experience (MAUDE). 16 This is a database filled in by medical staff and manufacturers of medical equipment describing the problems that occurred with a device.
Similar solution with reporting device problems has also been implemented in Europe with the creation of the European Database on Medical Devices (EUDAMED) that has the same purpose. 17 These two databases are one of the most prominent around the world.
Exploring MAUDE database shows that there were four registered incidents involving patient monitors, two of which resulted in injury. 16 This suggest the need to adjust surveillance strategy of patient monitors to improve both patient safety and the safety of medical professionals.
The new Medical Device Regulation (MDR) 2017/745 addresses this by emphasising the need for improved safety and performance post-market surveillance mechanisms for all medical devices, including patient monitors. 12 Post-market surveillance strategies are implemented and enforced differently in each country around the world. What still lacks in this approach is impartial, traceable and reproducible evidence of device performance, that is where the science of metrology needs to be employed. 18
As patient monitors involve measurement of physiological parameters they can be referred to as medical devices with measuring function (MDMF). 19 Every MDMF is designed and produced in a way that ensures that the device provides accurate, precise and stable measurements within the limits indicated by the manufacturer and having regard to the intended purpose of the device. 20
This approach has been used as part of post-market surveillance in some countries. For instance, in Bosnia and Herzegovina, in 2014 legal metrology framework for 11 medical devices with measuring functions was adopted, including patient monitors.21–23 According to this framework, patient monitors are periodically inspected by a legally appointed ISO 17020 accredited inspection body. 24 During inspection procedure, all safety and performance data about patient monitors are stored in unique developed database25,26 and consists of key general information such as manufacturer, model, serial number and location of device, as well as performance and safety data measured for each device by using calibrated etalons.
Since periodical inspections are conducted minimally once every year, a huge database of general, performance and safety information for every single device has been created. The analysis of device performance over time gives valuable insight in its performance but also provides great input for failure risk assessment. Therefore, through this study, it was investigated how this big data can be used to predict mechanical ventilator performance and possible future failures. Such an approach would transform the current reactive approach into a predictive one.
Traditional maintenance strategies are causing a lot of difficulties due to the increased complexity of the healthcare institution environment and the increased technological complexity of medical devices.27,28 This indicates a need to improve medical device maintenance and management strategy and introduction of AI could help in achieving this goal.
Methods and materials
The data used in this study was acquired between 2017 and 2022 through the inspection of patient monitors in Bosnia and Herzegovina. These inspections were carried out in accordance with the Legal Metrology Framework by a national laboratory appointed for this purpose. The laboratory is accredited according to the ISO 17020 standard, which ensures the competence of inspection bodies.
During the inspection process patient monitors are firstly visually examined, and secondly they are tested using the appropriate etalon to measure the necessary values both expected and measured values of the which are used to determine error for a particular device for all relevant parameters are recorded. Based on these values measurement uncertainty and absolute and relative error is calculated. All data-points generated through the inspection process are collected and stored in a curated database established for the purpose. This data was used for this study.
The dataset was composed of 2889 samples, of which 2765 passed, and 124 failed the inspection. The dataset contained expected values, measured values and relative error for five parameters. These parameters include heart rate, oxygen saturation, heart signal amplitude, respiration rate, and invasive and non-invasive blood pressure, both systolic and diastolic. Along the measurement values the dataset consisted of the results of visual inspection and electrical safety inspection. These results were categorical, either pass or fail.
Before the development of the AI models data preparation was performed. This included exploratory data analysis which was used to gain overall insight into the structure of the dataset. The model was trained using relative error for all recorded measurements as it represented normalised measurement values.
After the preparation of the data the dataset was stratified into training set and testing set. The training set consisted of 80% of the overall dataset while the testing set consisted of the remaining 20%. The splitting of the dataset was conducted at random while making sure that both subsets maintain same proportion of positive and negative inspection results. Table 1 presents division of dataset and the composition of both training and testing set.
Division of the dataset.
Division of the dataset.
For the development of the model a set of machine learning algorithms was used. The algorithms used are Decision Tree, Random Forest, Linear Regression and Support Vector Machines (SVM). Machine learning techniques were selected for this purpose due to their well-established ability to generalise data structured like the one used in this study, and these algorithm in particular are selected based on their particular strengths which aligned with the needs and limitations of the problem at hand.
The previously mentioned algorithms were exposed to the data from the training set which produced ML model. After completing training the models were validated using the testing set. The testing step produced a predictions for input features for previously unseen data. The result of the prediction was used to measure the performance of the particular model and compare all four developed model to one another.
The four algorithms were compared using three metrics, accuracy, precision and recall. The Decision Tree algorithm achieved accuracy of 87%, precision of 62% and recall of 91%. Figure 1 shows confusion matrix and Receiver Operating Characteristic (ROC) curve for DT model.

Confusion matrix and ROC curve for DT algorithm.
Random Forest algorithm performed marginally better with accuracy of 88%, precision of 63% and recall of 94%. RF also had bigger area under the ROC curve (AUC) compared to DT, which can be seen along the confusion matrix in Figure 2.

Confusion matrix and ROC curve for RF algorithm.
For the Logistic Regression all three metrics, accuracy, precision, and recall, scored 94%. Confusion matrix and ROC curve for the LR algorithm are shown in Figure 3. The LR model achieved 70% and 93% for precision and recall, respectively. Figure 3 shows confusion matrix and ROC for LR model.

Confusion matrix and ROC curve for LR algorithm.
SVM algorithm preformed worse compared to LR, with accuracy of 93%. SVM model had the lowest recall among all algorithms that were used in this study 54%. The recall for SMV was 93%. Figure 4 show confusion matrix and ROC curve for SVM that were observed in this study.

Confusion matrix and ROC curve for SVM algorithm.
These results show that all four models performed well and achieved sufficiently high accuracy (over 80%). However, Logistic Regression performed better than all other models and achieved the highest accuracy while maintaining low false-positives rate which was difficult to achieve due to unbalanced dataset. The performance of all four algorithms are summarised in Table 2. The LR model used Newton's method for root-finding problem with Cholesky factorisation. The model also used different weight for two classes pass and fail, that is 1, and 2 respectively.
Achieved results using ML algorithms (%).
One possible limitation of these models is that they are trained on an unbalanced dataset that is mostly composed of the results of devices that successfully passed the inspection. The dataset is comprised of passed inspections in ration of 19 : 1. Number of failed inspections is relatively low and thus it is possible that models generalise inspections labelled pass better than those that are labelled fail.
Another limitation is the possibility of overfitting for the given dataset. The likelihood of this event is minimised through parameter optimisation for the implemented algorithms as well as through the use of cross-validation that show that given models perform just as well for previously unseen data.
AI is increasingly being applied to medical devices for performance prediction and failure analysis. Machine learning and deep learning algorithms are being used to develop predictive models for diabetes management and medical device failure. 29 These AI-driven approaches can analyse multi-modal data, including structured maintenance records, to improve prediction accuracy and reduce intervention time. 30 AI applications in medical devices range from automatic retinal screening to clinical diagnosis support and patient self-management tools (Nomura et al., 2021). However, the implementation of AI in medical devices faces challenges related to ethical, legal, and social concerns, as well as the need for harmonised regulations to ensure more equitable access, privacy, and accountability. 31 Current EU and US regulations on medical devices provide initial framework for AI safety and performance but require further development to address transparency and accountability issues. 32
Medical devices are important feature of modern healthcare environment and are likely to become even more important in the coming years. The increasing number of medical devices poses a challenge for their successful maintenance and management, which is necessary to ensure that the medical devices fulfil their intended purpose. A way to achieve this is through the implementation of artificial-intelligence-based systems along with the medical device to monitor their performance.
During the course of this study a set of four AI models based on Machine Learning were developed. The four models are Decision Tree, Random Forest, Linear Regression and Support Vector Machines. These models were trained and tested on the data collected during regular mandatory safety inspection of patient monitors. Observed results show that achieved the best performance on the given task.
This study shows that AI models can be successfully used to predict the performance of patient monitors. And that their use for this purpose has the potential to improve their reliability and improve patient outcomes.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
