Abstract
Background
Dialysis is a very complex treatment that is received by around 3 million people annually. Around 10% of the death cases in the presence of the dialysis machine were due to the technical errors of dialysis devices. One of the ways to maintain dialysis devices is by using machine learning and predictive maintenance in order to reduce the risk of patient's death, costs of repairs and provide a higher quality treatment.
Objective
Prediction of dialysis machine performance status and errors using regression models.
Method
The methodology includes seven steps: data collection, processing, model selection, training, evaluation, fine-tuning, and prediction. After preprocessing 1034 measurements, twelve machine learning models were trained to predict dialysis machine performance, and temperature and conductivity error values.
Results
Each model was trained 100 times on different splits of the dataset (80% training, 10% testing, 10% evaluation). Logistic regression achieved the highest accuracy in predicting dialysis machine performance. For temperature predictions, Lasso regression had the lowest MSE on training data (0.0058), while Linear regression showed the highest R² (0.59). For conductivity predictions, Lasso regression provided the lowest MSE (0.134), with Decision tree achieving the highest R² (0.2036). SVM attained the lowest MSE on testing dataset, with 0.0055 for temperature and 0.1369 for conductivity.
Conclusion
The results of this study demonstrate that clinical engineering (CE) and health technology management (HTM) departments in healthcare institutions can benefit from proposed automated systems for advanced management of dialysis machines.
Keywords
Introduction
Medical devices often have a very complex design. Because of their complexity, medical devices might often contribute to the medical error. Device-related errors may cause several problems, especially in a very critical field such as medicine. Every 10th patient is harmed every year as a result of such medical errors. Medical errors can also result in high financial cost. 1 According to an FDA report, early dialysis devices were exposed to technical problems, which led to the development and application of international safety standards. Those safety standards are nowadays implemented by all medical device manufacturers. 2 Medical devices can be classified based on the potential risk that a patient is exposed to with the usage of such devices. European Union classifies dialysis devices as Class IIa 3 which indicates that dialysis devices have a moderate to high risk and require special controls. Dialysis machines are also classified as measuring devices, since they contain sensors that measure and process different measurements and have a crucial role in the overall device quality.4–6
Dialysis is the most widely used and recognized way of treating the patients with renal disease. During dialysis, the patient's blood is passed through an artificial kidney, a hemodialyzer. While the blood flows on one side of the hemodialyzer, an electrolyte solution is continuously produced on the other side of the device, which is used to normalize the abnormalities in the patient's blood. This process includes the removal of impurities and excess water. Nowadays, this process is performed by commercially produced equipment made by several medical equipment producers, including Fresenius, Gambro, and B. Braun. 2 Globally, around 3 million people receive dialysis treatment. 7 Around 370,000 people in the US receive dialysis treatment annually, which costs about $11.1 billion. 8 Renal disease is considered to be in end-stage when the kidneys work at less than 10%–15% of normal capacity, 9 and the usage of dialysis devices is necessary for keeping the patient alive. 10 Around 50,000 people die each year due to kidney failure. 8 It is reported in the MDR data base that from 1992 to 1996 there have been 47 reported death cases in the presence of a dialysis machine. In 10% of these cases, it was reported that the death was caused by a technical error of the dialysis device. 11 Even though the current dialysis devices are life sustaining, they have cannot match the functionality of a healthy kidney. Conventional dialysis treatments can result in poor clinical outcomes, high mortality, and low quality of life of patients, due to their inadequacy. Due to the advancements in technology, new ways of treating kidney diseases are under development. New devices that challenge the current dialysis treatment paradigms are smaller, lighter, and usable outside of the clinics. 7
It is crucial to properly maintain dialysis devices in order to provide quality care, reduce costs and prevent problems. There are various maintenance techniques which can be classified as either corrective, preventive or predictive maintenance.12–17 Predictive maintenance focuses on predicting how the device will function based on the data collected in real time.18–20 In recent years, there have been rapid advancements of the data driven approaches. These approaches utilize the technology of machine learning in order to help with making decisions based on the provided data. This can provide benefits such as reducing maintenance costs, reduce repair stops, improve user safety, increase production and life span of the device, and reduce costs. Despite the potential benefits and a great potential to bring improvements, this technology is still in its early stage of adoption in healthcare.
This paper discusses the usage of machine learning models in order to predict the level of error of various dialysis devices in order to test if the device is adhering to the international safety standards. Since our goal is to determine the error value, which is a real number, we will use regression models. In order to achieve this goal, the paper proposes the usage of the following three regression models: Linear regression, Random Forest regression and Lasso regression. We compare how accurate each model was at predicting the actual errors using the given dataset.
Materials and methods
The complete process of developing the machine learning models in order to predict the measurement errors of dialysis devices is represented in the seven steps as follows:
Data collection - The data is being collected from one or more sources. The quality, as well as the quantity of the data is very important for the final result and performance of the model. Data preparation - It is important that data is in the correct format, well balanced, standardized and additionally processed in order to ensure the highest performances of the model. Model selection - Model is selected based on the goal that we are trying to accomplish. Model training - The processed data is being loaded into the model, which needs to find a relationship between the input data and the output of the model. Model evaluation - After the model is done training, we need to test its performance on the data that was not previously loaded into the model. This ensures that the model will be able to work with real data. Fine-tuning the model - In a lot of cases we can further improve the performance of the model by changing some of its hyperparameters. Prediction - After the model is done training and we are satisfied with the results it achieved, we can use the model to predict values using real data.
The researchers were provided with an Excel spreadsheet file which contained information about various measurements of different dialysis devices. The provided dataset contains information about 1034 different dialysis device measurements obtained from healthcare institutions in Bosnia and Herzegovina using etalon, MessaLabs 90 XL, over a 7 year period. Each measurement contains the following data:
Manufacturer of the device. There are a total of four manufacturers in the dataset - Fresenius, B. Braun, Nipro and Baxter. Device model name. There are a total of five different device models in the dataset - 5008, 4008, Dialog, Surdial and Artis Physio Plus. A field which indicates whether the device passed the verification. External inspection 1–4. Measured value, expected value, their difference (error) and the allowed size of error. If error value is greater than the allowed size, then the measurement is marked as aligned. These values are provided for both temperature and conductivity measurements. There is also a field that contains additional notes if provided.
Out of 1034 measurements, there were:
23 (2.22%) measurements that did not pass the verification. 12 (1.16%) measurements that did not pass the first external examination, 17 (1.64%) measurements that did not pass the second external examination, 12 (1.16%) measurements that did not pass the third external examination and 17 (1.64%) measurements that did not pass the fourth external examination. 2 (0.19%) measurements were not aligned in the terms of temperature, and 4 measurements were not aligned in the terms of conductivity. 2 (0.19%) measurements where temperature was aligned and conductivity was not, 0 measurements where conductivity was aligned and temperature was not, 1030 cases where both were aligned and 2 cases where neither temperature nor conductivity were aligned. 1011 (97.77%) measurements that passed verification, had also passed all external examinations and both temperature and conductivity values were aligned. 4 (0.38%) measurements did not pass verification because either temperature or conductivity were not aligned. 19 (1.83%) measurements did not pass verification because they did not pass at least one external check. There were 22 (2.12%) rows that are labeled as ‘not measured’.
Only the measurements that passed all examination checks, as well had both temperature and conductivity aligned managed to pass the verification. On the other side, the measurements where at least one examination check failed, or temperature or conductivity values did not align, did not pass the validation check.
For all of the measurements, the maximum permissible error is ±0.3°C, and the allowed conductivity error is ±1.5 mS/mm. For temperature measurements the mean error is 0.165°C, standard deviation is 0.155°C, minimal error recorded is 0.00°C and maximal error recorded is 4.22°C. For conductivity measurements the mean error is 0.658 mS/mm, standard deviation is 1.451 mS/mm, minimal error recorded is 0.00 mS/mm and maximal error recorded is 44.54 mS/mm.
The two measurements of temperature that were greater than the allowed error value (±0.3°C) were 0.8°C and 4.22°C. The three measurements of conductivity that were greater than the allowed error value (±1.5 mS/mm) were 5.7 mS/mm, 2.44 mS/mm and 44.54 mS/mm. The following figure represents a histogram of all measurement errors of temperature and conductivity that are lesser than the allowed value.

Histograms of measurement errors.
2.1. Data preprocessing
Before loading the data into the machine learning model, it should be properly preprocessed. For convenient data manipulation, the provided Excel spreadsheet file was converted to comma-separated value (CSV) format. The resulting CSV file contains 1034 rows and 21 columns. Columns that contained the same value for each record were dropped, since they do not contribute to the final prediction. For example, the column that indicates the type of measurement (e.g., Temperature) and its allowed error value is the same for all of the rows. The rows that contain Yes or No values were converted to 1 and 0. The rows that had missing measurements were dropped and converted to numeric format. Manufacturer name and device model name were converted to numeric values using Label encoder, which labels each unique value in a column with a different number. We renamed the devices’ manufacturer and model names to ensure consistency. For example, devices labeled with “5008S” and “5008 S” would be considered a different device if not properly renamed, even though they are meant to represent the devices with the same model.
The goal is to predict whether the model will pass the verification or not. Provided that the device had passed all of the external examinations, the verification status will depend on the measurement error of temperature and conductivity. That being said, the model should be able to predict the error values of each measurement. Since the model needs to predict a value that is numeric and continuous, we need to use a regression model.
For the purpose of our research 10 machine learning models in total were trained. The models are as following:
Decision tree model to predict dialysis machine performance status Random forest model to predict dialysis machine performance status Support vector machine model to predict dialysis machine performance status Logistic regression model to predict dialysis machine performance status Linear regression model to predict temperature error Linear regression model to predict conductivity error Random forest model to predict temperature error Random forest model to predict conductivity error Lasso regression model to predict temperature error Lasso regression model to predict conductivity error
In order to predict the performance status of the dialysis device, all measurements of temperature and conductivity as well as visual inspection status were taken into account. In order to predict the temperature error we have used the device model, expected value of temperature, expected value of conductivity, measured value of conductivity and error of conductivity measurement. Similarly, in order to predict the conductivity error we have used the device model, expected value of conductivity, expected value of temperature, measured value of temperature and error of temperature measurement.
The specific parameter settings chosen for each algorithm were based on randomized values that could potentially achieve the desired results. These parameters are crucial in determining the accuracy and performance of the model, and their selection requires careful consideration to obtain the best results. Below are the Confusion Matrix tables (Table 1, 2, 3 and 4) which depict the results that were achieved when implementing the selected machine learning algorithms. The results achieved across all four implemented algorithms are presented in Table 5, which shows results of the accuracy, precision and sensitivity.
Performance assessment of decision tree algorithm
Performance assessment of decision tree algorithm
Performance assessment of random forest algorithm
Performance assessment of support vector machine algorithm
Performance assessment of logistic regression algorithm
Performance comparison across all predictive models
As it can be inferred from Tables 1–5, logistic regression model achieved the highest performance in terms of predicting dialysis machine performance status. A step further was taken and the remaining 6 machine learning algorithms were used to predict temperature and conductivity error of dialysis machines on the basis of their previous performance.
Each model was trained 100 times. During each training both datasets were split into training, test and evaluation data, with the data being shuffled each time. We have allocated 80% of the used data to the training dataset, and 10% of the used data to the training and validation datasets each. Achieved results are represented in the Table 6 below.
Model performance during training
Both MSE and RMSE indicate the mean error that the model makes when making predictions. Hence, the aim of the development phase is to bring the two parameters to the lowest possible extent. As mentioned previously, 100 iterations were run for each model in order to accommodate the possible variations in hyperparameters as well as take into account the computational infrastructure that may influence model's performance. Lasso regression achieved the lowest MSE and RMSE on training data in both conductivity and temperature error prediction, while SVM regression achieved the lowest MSE and RMSE on validation data in both conductivity and temperature error prediction.
R2 as a parameter is an indicator of correlation between the result predicted by the algorithm and the target outcome. The range of R2 consequently is between 0 and 1 where closeness to 1 indicates a higher degree of correlation. Linear regression performed much better in terms of R2 score than the other two models at temperature error prediction, while Random Forest achieved a higher R2 score at predicting conductivity measurement errors. The following figures represent the RMSE value and R2 score of both training and validation datasets during all 100 times the model was trained using Linear regression model.

RMSE value of lasso regression model.

R2 value of lasso regression model.
As it can be seen from the figures, both RMSE and R2 values varied significantly over 100 iterations during the development phase and no convergence can be observed between training and validation datasets. Due to the nature of the dataset, especially due to the relatively small number of devices that were not aligned, none of the models had shown a satisfying performance in predicting the value of an error. Due to the nature of the dataset, the proposed performance metrics could be considered misleading. To further expand onto this research, future researchers could use different performance metrics, as well as a more balanced dataset.
The paper aimed to show how well the error in the temperature and conductivity measurements of dialysis devices can be predicted. This study shows that the errors that occur in the measurements of dialysis devices cannot be easily estimated with a high accuracy with the provided data. This indicates that the error is unrelated to various parameters such as device model, and measurements of other values.
One of the main problems that this paper indicates is the trustworthiness of AI, especially in the fields such as healthcare, where it is very important to have very accurate and trustworthy data. The proposed models did not manage to explain why large measurements errors occur in dialysis devices. This problem could also be further expanded on by applying the techniques of Explainable AI (XAI), which aims to provide an understandable reasoning why the models come to such conclusions.
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
