Abstract
Background
Analysis of data from incident registries such as MAUDE has identified the need to improve surveillance and maintenance strategies for infusion pumps to enhance patient and healthcare staff safety.
Objective
The ultimate goal is to enhance infusion pump management strategies in healthcare facilities, thus transforming the current reactive approach to infusion pump management into a proactive and predictive one.
Results
Through detailed analysis of the achieved results, it was found that all applied machine learning methods yielded satisfactory results, with accuracy ranging from 0.98% to 1.0%, precision from 0.99% to 1%, sensitivity from 0.98% to 1.0%, and specificity from 0.87% to 1.0%. However, Decision Tree and Random Forest methods proved to be the best, both due to their maximum achieved values of accuracy, precision, sensitivity, and specificity, and due to result interpretability.
Conclusion
It has been established that machine learning methods are capable of identifying potential issues before they become critical, thus playing a crucial role in predicting the performance of infusion pumps, potentially enhancing the safety, reliability, and efficiency of healthcare delivery. Further research is needed to explore the potential application of machine learning algorithms in various healthcare domains and to address practical issues related to the implementation of these algorithms in real clinical settings.
Introduction
An infusion pump is a device used for various medical purposes, but they all share one common feature: they deliver a constant flow of medication over a specific period of time. In most cases, a trained user will control the infusion rate and timing using software embedded in the device's user interface. There are several advantages to using infusion pumps compared to manual administration of fluids, including the ability to deliver fluids in extremely small quantities and at precisely determined rates or automatic intervals. 1
The use of high-risk medications and other essential fluids often requires infusion pumps; therefore, pump malfunctions can seriously affect patient safety. 2 The study 3 found that out of 325 inspected infusion pumps in the period 2015–2016, 3.38% of devices did not satisfy electrical safety inspections, while out of 314 infusion pumps that passed electrical safety testing, 5.73% had performance issues that did not comply with performance requirements (flow rate). According to the Manufacturer and User Facility Device Experience (MAUDE) database from 2016 to 2024, there have been incidents related to infusion pumps, including 17 injuries and 329 malfunctions. 4 No deaths were recorded. 4 Unfortunately, the number of such incidents is increasing, both for infusion pumps and other medical devices, indicating that post-market surveillance and maintenance procedures for medical devices after delivery to healthcare facilities may not be adequate for the needs of the healthcare industry, ultimately jeopardizing patient safety. The new Medical Device Regulation (MDR) 2017/745 addresses this by emphasizing the need for improved safety and performance post-market surveillance mechanisms for all medical devices, including infusion pumps. 5
Although post-market surveillance is mandatory for medical products, its implementation varies from country to country. 6 For example, in Bosnia and Herzegovina, accredited laboratories regularly test all infusion pumps according to ISO 17020 standards, and information on flow, visual inspections, and other important details are stored in a dedicated database as part of the established framework for medical products.7,8 During the inspection process, all safety and effectiveness data of infusion pumps are stored in a developed unified database,9,10 and consist of key general information such as manufacturer, model, serial number, and device location, as well as effectiveness and safety data measured for each device using calibrated standards.
Several studies have shown the potential of machine learning algorithms in predicting the performance of infusion pumps. For example, in a study, 11 authors used machine learning algorithms to predict pump alarm occurrences based on real-time sensor data. The study found that machine learning algorithms could accurately predict pump alarms, suggesting that these algorithms can be used to increase the safety and reliability of infusion pumps. While the focus of this study was only on infusion pump alarms, the study conducted in this paper goes further by analyzing data to predict future performances and potential failures of infusion pumps. The ultimate goal is to improve medical device management strategies in healthcare facilities. This way, the current reactive approach to managing medical devices can be transformed into a proactive and predictive one. To achieve this, several machine learning methods, including Logistic Regression, Decision Tree, Random Forest, Naive Bayes, and Support Vector Machine, were applied to real infusion pump data collected in Bosnia and Herzegovina over a 7-year period.
Materials and methods
This study utilized real data collected from 2015 to 2021 through the inspection of infusion pumps in Bosnia and Herzegovina. These inspections were conducted by the national laboratory designated for this purpose in accordance with the Legal Metrology Framework. The laboratory is accredited according to the ISO 17020 standard, ensuring the competence of inspection bodies.
Understanding the dataset used and analyzing it through familiarity with its characteristics and data types was a crucial step before the actual classification. This involved examining the dataset to understand its structure and to select an appropriate subset of attributes, identifying missing or incomplete data, and cleaning the dataset to ensure that the data are consistent and accurate. The data cleaning process involved imputing missing values with zeros, using the median value of attributes, and transforming the data to ensure they are suitable for use in machine learning algorithms. This step was crucial in ensuring that the algorithm produces reliable and accurate predictions, as the quality of the data directly impacted the performance of the model. By analyzing the data, statistics for the annual pass and fail rates of infusion pumps were obtained. From Table 1, it can be seen that 90.2% of the infusion pumps passed the verification process.
Yearly pass and fail rates during the data collection period.
Yearly pass and fail rates during the data collection period.
The total number of samples in the dataset used is 988, of which 790 samples are used for training the model, while the remaining 198 samples are used for validating the model (20% of the dataset). Various machine learning algorithms were considered for performing binary classification on these samples (pass/fail state), including Logistic Regression, Decision Tree, Random Forest, Naive Bayes, and Support Vector Machine. These algorithms were chosen for their ability to handle large datasets and their potential to achieve high prediction accuracy.
The structure of the prediction model illustrates the architecture of the predictive model incorporating three main layers of the expert system (Figure 1). For the system development phase, these are the input data layer, the classifier layer (machine learning algorithms), and the predictive model as the output layer. For the second phase of validation, the layers consist of new data, the predictive model serving as the classifier layer, and the prediction as the output layer.

Structure of prediction model.
In the end, the effectiveness of the trained models was assessed using a validation set. Performance metrics such as accuracy, precision, sensitivity, and specificity were used to evaluate the model's performance. This metric provided insight into how well the models were able to predict the pass/fail status of new infusion pumps based on their performance data. Additionally, a performance analysis of the classification model was conducted through a confusion matrix, which provides detailed insight using true and false predictions.
The selection of appropriate parameters for machine learning models is a crucial step in achieving good results. This is important because suitable parameters enable the model to adapt to the specificities of the data and optimize its performance for a particular problem. If the parameters are not optimally tuned, the model may suffer from overfitting or underfitting to the data, resulting in poor performance on new, unseen data.
In this study, the grid search method was employed to select optimal parameters for the machine learning models. This method systematically explores different combinations of parameters to find the best configurations, as presented in Table 2.
Parameter used in machine learning models.
Parameter used in machine learning models.
For the Logistic Regression algorithm, a model was created using L2 regularization to prevent overfitting, with a stopping criterion tolerance of 0.0001, which may result in longer training times but leads to a more precise model. The model does not utilize additional regularization (C = 1.0). Intercept is added to the model, and the ‘liblinear’ solver is used for optimizing the loss function. The maximum number of iterations is set to 100, which is a common practice for most problems as it maintains a balance between convergence and computational efficiency.
The Decision Tree model utilizes the Gini criterion for selecting the best split to minimize the total impurity weight in the subtrees, with an optimal splitter. The tree has no depth limitation, and the minimum number of samples required to split an internal node is 2. Additionally, each leaf will have at least one sample. Finally, all available features will be considered when searching for the best split.
The parameters for the Random Forest model are adjusted to achieve optimal performance and avoid overfitting. Employing multiple trees with different subsets of data, while limiting the depth and the number of samples required for splitting nodes, helps create a robust model that generalizes well to new data.
The Naive Bayes model is typically used with automatically determined prior class probabilities and incorporates stability in feature variance calculations to avoid numerical instabilities. These settings enable the model to adapt to the data and provide stable and reliable results.
The parameters used for the Support Vector Machine (SVM) model include C, which represents the regularization parameter controlling the balance between minimizing training error and maximizing margin, and kernel, which determines the type of kernel function used in the model, set to “linear” in this case, implying the use of a linear kernel function. The parameter gamma, specifying the kernel coefficient, is set to ‘scale’, meaning automatic calculation of gamma values. Degree refers to the degree of the polynomial kernel function, set to 3 here, while the coefficient of the polynomial function is set to 0.0. All these parameters collectively define the structure of the SVM model, influencing its behavior and performance during training and prediction.
For the purpose of evaluating the performance of the utilized algorithms, model accuracy was requested as a key metric describing how well the model classifies or predicts data. However, considering accuracy alone is not sufficient. Therefore, it is analyzed in combination with other performance metrics, such as precision, sensitivity, and specificity. Precision is defined as the percentage of correctly predicted positive outputs relative to the total number of predicted positive outputs. Sensitivity, on the other hand, assesses how successfully the model identified true positive outputs relative to all true positive outputs. Specificity is a metric that measures how successfully the model predicted negative outputs relative to all true negative outputs. The combination of these metrics allows for a more comprehensive assessment of the performance of machine learning models and helps us gain a clearer picture of their effectiveness in solving specific problems (Figure 2).

Validation performance metrics.
Based on Figure 2, it can be concluded that all algorithms exhibit high performance, but Decision Tree and Random Forest stand out as they demonstrate maximum accuracy, precision, sensitivity, and specificity. Logistic Regression shows the lowest performance, primarily due to its lower specificity value compared to others (0.87).
Confusion matrix is a tool for summarizing the performance of a classification algorithm. This matrix provides a clear picture of the classification model's performance. Based on the confusion matrix, it is possible to assess how the algorithm behaves in terms of false positives and false negatives, which allows for a better understanding of its actual performance. Table 3 shows the information from confusion matrix of each algorithm used, from which it is evident that the majority of samples are correctly classified. This indicates their proficiency in identifying real instances of positive and negative classes, i.e., those that have passed and those that have failed the performance inspection. However, there are differences in performance among the algorithms in terms of classification errors. For instance, Naive Bayes shows a slightly higher number of false negative instances compared to other algorithms, implying that it misses some true positive cases. On the other hand, Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine exhibit minimal false positive and false negative cases.
Information from the confusion matrix of each algorithm.
This research has several limitations that need to be considered. One limitation is that the machine learning models were trained on data related to infusion pumps used in healthcare facilities in Bosnia and Herzegovina. Expanding the database to include data on infusion pumps used in other countries would make the achieved results more informative. Additionally, it's possible for the sample to be biased towards certain classes or characteristics, as the used dataset may contain significantly more infusion pumps with pass rate than with a fail rate (see Table 1), which could potentially influence the evaluation results of the algorithms.
For the problem of binary classification, as presented in this study, both feedforward and feedback artificial neural networks can be utilized, as shown in. 12 The results show that the feedforward neural network architecture with 10 neurons in a single hidden layer exhibits the best performance. The overall accuracy of this neural network is 98.83% for predicting the performance of infusion pumps. The recurrent neural network achieved an accuracy of 98.41%. From the above, it can be concluded that this approach yields promising results. However, it is worth noting that it is more complex compared to the machine learning models presented here. The final choice of algorithm should be based on a comprehensive evaluation of various factors, considering the specific needs of the classification problem and practical aspects such as interpretability, computational efficiency, and the ability to handle different types of data.
This study highlights the importance of analyzing the performance of infusion pumps to predict future device effectiveness and potential failures, which would ensure the provision of high-quality healthcare services. To achieve this, several machine learning methods were applied to real data from infusion pumps in Bosnia and Herzegovina collected over a period of 7 years. Through a detailed analysis of the results obtained, it was found that all applied machine learning methods produced acceptable results, with accuracy ranging from 98% to 100%, precision from 99% to 100%, sensitivity from 98% to 100%, and specificity from 87% to 100%. However, Decision Tree and Random Forest methods proved to be the best, both due to achieving maximum values of accuracy, precision, sensitivity, and specificity, and due to the interpretability of the results.
This establishes that machine learning methods are capable of identifying potential issues before they become critical, thus playing a crucial role in predicting the performance of infusion pumps, potentially improving the safety, reliability, and efficiency of healthcare delivery. In further research, these methods could be tested on a larger and more diverse dataset to provide additional insights and confirmation of the results presented here. Additionally, it would be beneficial to explore the results achieved by applying machine learning methods to the data involving a larger number of infusion pump manufacturers. Continued research is necessary to explore the potential application of machine learning algorithms in various healthcare domains and to address practical questions related to the implementation of these algorithms in real clinical settings.
