Abstract
The Acute Kidney Injury (AKI), is a disease that affects the kidneys and is characterized by the rapid deterioration of these organs, usually associated with a pre-existing critical illness. Being an acute disease, time is a key element in the prevention. By anticipating a patient’s state transition, we are preventing future complications in his health, such as the development of a chronic disease or loss of an organ, in addition to decreasing the amount of money spent on the patient’s care.
The main goal of this paper is to address the problem of correctly predicting the illness path in various patients by studying different methodologies to predict this disease and propose new distinct approaches based on this idea of improving the performance of the classification.
Through the comparison of five different approaches (Markov Chain Model ICU Specialists, Markov Chain Model Features, Markov Chain Model Conditional Features, Markov Chain Model and Random Forest), we came to the conclusion that the application of conditional probabilities to this problem produces a more accurate prediction, based on common inputs.
Keywords
Introduction
The Acute Kidney Injury (AKI), is a disease that affects the kidneys, characterized by the rapid deterioration of these organs and usually associated with a pre-existing critical illness [1]. According to studies around two million people die every year of this disease worldwide [2] and about 30% [3] of this deaths could be avoided if the disease was discovered in time.
Being an acute disease, time is a key element in the prevention. By anticipating a patient’s state transition, we are preventing future complications in his health, such as the development of a chronic disease or loss of an organ, in addition to decreasing the amount of money spent on the patient’s care.
Similarity to other diseases, we can look at the AKI as a multiclass classification problem, where the goal is not to predict the current class (
The purpose of this paper is to address the problem of correctly predicting the illness path in various patients by studying different methodologies to predict this disease and propose new and distinct approaches based on this idea of improving the classification.
One possible approach is to apply data mining techniques, in which patients records are analyzed and used to predict the disease progression. The patient’s evolution is related to the comorbidities and consequently to the type of Intensive Cate Unit (ICU) where they are treated in [4]. For instance, a patient with a surgical heart problem will be treated at Cardiac Surgery Recovery ICU. It is expected that the disease’s evolution will be different according to the patient’s current state and the hospitalization unit. Taking this, we can apply these data mining techniques to each ICU to maximize the accuracy of the prediction.
To diagnose a disease, all the information about a patient is relevant to health professionals but, depending on the disease, some attributes maybe more relevant than others. For instance, in the acute kidney injury, analysis related to the kidneys are far more important than others like temperature, but they are still important, as part of the diagnosis. Besides this, depending on the disease, knowing what was the previous state of the disease or if a patient had already the disease in the previous analysis is very important to the current diagnosis and treatment.
So another possible approach involves the engineering of new features that can not only store more information about the next state but also condition the algorithm for the correct diagnosis of the disease. These features are created using a probabilistic algorithm (Markov Chain Model), which is used to classify the data. The resulting probabilities are then added to the previous dataset as probabilistic conditional features, where the information about the possible state of the disease is stored. Besides attempting to improve the prediction of this disease, this approach attempts also to mitigate the known problem of short sightedness or myopia in algorithms [5] by creating intermediate-level attributes.
This paper is organized in four sections: Section 2 describes the related work in both the AKI disease and feature engineering. Section 3 describes the AKI disease, the criteria used to classify it, the dataset used and the distribution of the data. Section 4 describes the approaches implemented and Section 5 the results obtained on the tests.
Related work
The fact that acute kidney injury is a very difficult disease to diagnose, being characterized by drastic changes in patient’s state, that can not always be immediately detected through clinical analysis, has aroused interest in researchers in recent years, such as Cruz et al. [6], that back in 2013 applied Bayesian Networks to the AKI problem. In this case, the authors used Weka and GeNIe as development tools to implement this algorithm and as data they used the MIMIC II database, from PhysioNet. With these tools, they created two distinct models and compared them. The first model (first iteration model) was created with features considered relevant in literature. The second model (second iteration model) was created based on information considered relevant by nephrologists.
In 2016, Kate et al. [7] studied different methods (Logistic Regression, Support Vector Machines, Decision Trees and Naive Bayes) as well as ensemble methods of these algorithms in a population of hospitalized older adults (over 60 years old). The objective of this study was to find a way to detect if a patient is going to develop AKI and also detect if a patient already has it. In this paper, the authors discuss the advantages and disadvantages of the chosen algorithms as well as the ensemble methods (emphasizing that the combination of methods increases the performance in relation to the use of a single algorithm). Finally, they discuss the differences between the algorithms and the fact that they can be used in different situations, namely in clinical decision support tools to prevent the disease’s progression or in systems for disease management in a hospital.
Moving away from the AKI problem and concentrating on existing methodologies, there are already several methods that combine well known techniques aiming to obtain more accurate and readable models. Part of our work is based on Gama and Brazdil Cascade Generalization [8] framework. The method extends primitive language representation by coupling classifiers iteratively. In the first iteration the predictions of the used classifier extend the original input space dataset) and in each one of the following iterations the new classifiers extend the space representation from the previous iteration. In the last iteration a classifier model is built using the representational language that was extended iteratively and incorporating each classifier bias.
One critical issue when searching for new descriptors is the way we build them. Zheng [9] provides an overview on building new descriptors by combining primitive attributes using boolean operators. The most used operators are conjunction, disjunction, negation, M-of-N and X-of-N.
Bagallo and Haussler [10] propose three algorithms that build new descriptors in Boolean domains using logical conjunction and negation. Other techniques exist to build new attributes but some issues remain open. Are all new attributes are interesting and relevant? How can we select them?
Yu and Liu [11] separate feature selection methods in two main categories: wrapper and filter. Wrapper method uses predictions of a learning algorithm to select features, in a process that could be iterative like the Fringe algorithm [10]. The filter method selects features independently from any classifier [12]. The authors propose FCBF an algorithm that uses filter methodology. They also create Symmetrical Uncertainty, a symmetrical non-linear heuristic, to rank features by measuring correlation between each feature and the class.
Joining these two themes (feature engineering and medicine) we have already several authors who have developed ways to create new features to help solve problems in this field. For example, Garla and Brandt study in their paper “Ontology-guided feature engineering for clinical text classification”[13] the creation of new features in the clinical text classification. In this article, the authors apply the use of semantic similarity to group similar concepts, that later united in a single feature.
In addition to these authors, others have also developed in this area, such as Xu et al. [14] who combined the use of feature engineering, machine learning and rule-based methods to retrieve relevant information from clinical discharge summaries. In this paper they discuss the relevance of extracting information from these summaries, making the information more structured and easy to access. Similarity to the previous paper, these authors also worked with text mining in the clinical fields.
Acute kidney injury
Acute kidney injury or AKI is a disease that affects the kidneys, and is described as a rapid decrease of the kidney function, specifically at the elimination of the toxins produced by the organism [1]. Depending on the progression and severity of the disease, a patient can expect to get a full recovery, develop a chronic disease, with the need or not of kidney transplant, or even death [4]. The best way to prevent any permanent damage to the kidneys is to prevent the progression to more severe stages. Given its severity, this disease is mostly seen in ICUs as a complication of a pre-existing disease.
To classify patients by severity of their state and act accordingly, several criteria can be applied. Currently, the most commonly used is the RIFLE criteria [4], since it has been shown to have the best results in several types of patients [15].
This criteria is divided in levels and outcomes (Fig. 1). The levels represent the stages the patients with this disease can be at (risk, injury and failure). The outcomes are the representation of the patient’s future, if the disease persists: kidney loss in four weeks or end stage kidney disease (ESKD) in three months.
RIFLE criteria [16].
The diagnosis with the RIFLE criteria can be made through two different methods: by combining serum creatinine and the Glomerular Filtration Rate (GFR), through measuring the variance between the patient’s normal values and the current values, or by measuring the urine output through a tabulated period of time.
In recent studies, some researchers tried to link this disease with genetics [17, 18]. In this studies, it has been discovered that there may be a predisposition to the disease in people who present gene polymorphism,1
“The occurrence together in the same population of two or more genetically determined phenotypes in such proportions that the rarest of them cannot be maintained merely by recurrent mutation.”
To solve this problem, data from the challenge “Predicting Mortality of ICU Patients: the PhysioNet/Computing in Cardiology Challenge 2012”2
Patient identifier Weight Measurement time Height Age Albumin Alkaline phosphatase (ALT) Aspartate transaminase (AST) Bilirubin Blood urea nitrogen (BUN) Cholesterol Creatinine Fractional inspired oxygen (FiO2) Glasgow coma scale (GCS) Glucose Bicarbonate (HCO3) Hematocrit (HCT) Heart rate (HR) Potassium (K) Lactic acid Magnesium (Mg) Mean blood pressure (invasive and non-invasive) (MAP, NIMAP) Mechanical ventilation Sodium (Na) Arterial blood gas (PaCO2, PaO2, SaO2) Troponin (Tropl, Tropt) Urine White blood cell count (WBC) Temperature Blood pressure (invasive and non-invasive diastolic blood pressure, and invasive and non-invasive systolic blood pressure) (DiasABP, NIDiasABP, NISysABP, SysABP) Hemoglobin saturation Platelets Arterial pH Breath rate
Using this information as the base, a dataset was created. This dataset has only data from patients with at least one creatinine measurement. Besides the information from the original dataset, other attributes specific to the AKI problem were computed from it (glomerular filtration rate and initial creatinine).
One important measure to correctly diagnose the AKI disease and to predict the disease stage is the GFR, as we can see in Eq. (1). Combined with the creatinine, this is the basis of the diagnosis, according to the RIFLE criteria.
In this equation, if the patient is male, the 0.742 is replaced by 1. Another important calculated value is the initial creatinine measure. This value is required because, according to the RIFLE criteria, the AKI stage is obtained through the difference between the normal creatinine (or GFR) and the measured creatinine (or GFR). Despite being the standard to diagnose the AKI disease, the RIFLE criteria has the limitation of having to use the normal creatinine value, which is often not available upon patient admission. If the real one is not available, it can be calculated. In this case is attributed to the patient a normal GFR (75 mg/dL) [4] and by using a derivation of the GFR equation (see Eq. (2)), the normal creatinine value is extracted:
As stated previously, the dataset used is a modified version of the PhysioNet dataset, more focused on the AKI disease. The new dataset is composed by data from 6558 different patients, with an average of 1.19 different stages per patient. Each patient as a set of measurements similar to Table 1. Since the patients are in ICUs and therefore are frequently monitored, the time interval between each measurement is regular.
Patient’s data example
Patient’s data example
By analyzing Fig. 2, which represents the distribution of the RIFLE criteria stages in the dataset, we can verify that 66% of the patients do not show any symptoms of the disease, 14% are in risk, 9% can develop an injury, 11% of the patients can have kidney failure.
Distribuition of AKI stages.
The distribution of the progression of the disease is mostly stationary, that is, 65% of the patients in a stage (Normal, Risk, Injury or Failure) tends to maintain it during his stay in the ICU, as shown in Fig. 3.
Distribution of the disease progression.
In this dataset there are several dependencies between the variables as we can see in the tables Tables 10–15 (in the Appendix). These dependencies were calculated by applying the Spearman Correlation Coefficient [19].
As we can see in these tables that we have several dependencies, some of them being strong relationships, like GFR and Creatinine that have a strong negative relationship, and others being weak, such as Blood Urea Nitrogen (bun) and Urine. These dependencies can be traced back to its medical origins, such as an increase in Weight normally increases Cholesterol or the relation between different blood analysis.
Although through this analysis we can find correlations between different variables, it is necessary to understand if a correlation is between two variables or if there is a third variable involved.
Since this dataset is a medical dataset, we don’t have all the information available at all times and some of this information may never be available if it is not relevant to the patient’s health status. This dataset has the disadvantage that the values are only available once in each measurement (that is, there is only one entry) and so to mitigate this problem, the data has been sorted by patient and time in order to be able to replicate existing data to fill in the missing entries. Despite that, some that is still missing, corresponding to patients where certain variables are never measured or are measured long after their admission.
Considering that the objective of our work is to use as much information of a patient as possible, no outliers were removed, mainly because these outliers are not in fact noisy values, since they are within the acceptable parameters for the measurements in question, but rather values that are outside the norm. As for missing data, no further preprocessing was applied, since in our first proposed approach (Section 4.2) it is only used the ICU and Stage/Next Stage variables, which don’t have any entry missing. For the other approaches (Sections 4.3 and 4.4) which are feature engineering methods, the algorithm chosen to use these features will be an algorithm that is robust to noise and missing data.
To predict this disease, there are already several techniques that could be applied. The general idea of the proposed approaches is, knowing the patient’s current stage, as well as the patient’s clinical records, try to predict the AKI progression. Within the extensive list of algorithms that can be used for solving this problem, the following were explored: Random Forest [20] and Markov Chain Model [21]. In addition to these two algorithms, three adaptations of these algorithms were implemented.
The first proposed approach (Section 4.2) is based on the idea that the AKI manifests mostly in ICUs, so it is expected that the comorbidities associated with these ICUs will influence differently the progression of this disease. This approach uses Markov Chain Model as the base algorithm and divides the dataset into four more restrained Markov Chain Models, considering the type of ICU where the patient is hospitalized.
The second and third approaches take this idea of conditional probabilities in a different direction: instead of creating multiple models with partial data, divided using a common attribute, this conditioning is accomplished through the creation of new features, called probabilistic features, that are the result of a pre-classification using a probabilistic algorithm, in this case Markov Chain Model. Therefore, what this means is that the second and third proposed solutions are not new techniques for creating classification models, but rather new feature engineering techniques, based on a pre-existing knowledge of the AKI, that aim to optimize the creation of these models and that can be applied to any classification algorithm.
So in the second approach (Section 4.3) the goal is to create new features based on the entire dataset. What this means is that, in this case, our goal is to create N new features (where N is the number of possible classes), which represent the probability of transiting from one state A to each of the other states, thus reinforcing the role of the previous state the diagnosis the next state.
The third and last approach presented (Section 4.4) is a combination of the two previous ones: in this approach new features are also created but these are conditioned not only by the previous state, but also by a attribute, called most discriminant attribute.
In addition to achieving better results, our aim with these two approaches is to help mitigate the known problem of short sightedness or myopia in algorithms [5] by joining data from multiple attributes into a single one. This problem causes the algorithm to leave aside a feature in a split that does not present a high information gain at that split, however the feature is actually important to the problem, resulting in the loss of important information. These approaches will be explained in more detail in the next subsections.
Markov Chain Model
The Markov Chain Model is a stochastic process that uses a sequence of random variables based on the Markov property [22]. This property states that, to obtain the
The system changes from one state to the next one, by computing the transition probabilities [22]. These are the probabilities of transitioning, for example, from state i to j and are given by Eq. (4).
This probability can be represented in a matrix, where each pair (
In this algorithm implementation, the transition probabilities are computed with maximum likelihood estimation [23]. In this estimation, in each transaction from i to j, the probability is calculated by dividing the number of occurrences of this transition by the number of occurrences of i:
Applying this to the AKI problem, we have four different states: Normal, Risk, Injury and Failure. These are the RIFLE criteria stages plus the normal stage, where the patient, despite being in an ICU, does not show any symptoms of having AKI (Table 2).
AKI Markov’s transition probabilities
The Markov Chain Model Specialists [24] are an adaptation of the Markov Chain Model algorithm into the medical environment. This approach resulted from the study of the disease (AKI) and its behavior in ICUs, namely the fact that this disease manifests itself mainly in ICUs and behaves differently in each one of them.
These specialists are Markov Chain Models, trained to predict the next AKI stage for a specific type of ICU. In the used dataset, there are four different ICU types:
Coronary care unit Cardiac surgery recovery unit Medical ICU Surgical unit
Each ICU has different patients and specific diseases that influence the AKI progression. Having one Markov Chain Model specialist specific to each ICU type may improve the overall performance.
We can verify that the statement above is correct by analyzing Tables 3–6. For example in Table 3 (transition probabilities for patients admitted in the “coronary care unit”) the Risk stage has a higher probability of maintaining the same stage than to transit to other stage, which differs from what is shown in Table 2, where the whole population is represented.
AKI Markov’s specialists “coronary care unit” transition probabilities
AKI Markov’s specialists “cardiac surgery recovery unit” transition probabilities
AKI Markov’s specialists “medical ICU” transition probabilities
AKI Markov’s specialists “surgical ICU” transition probabilities
The Markov Chain Model Features, as previously mentioned, is a feature engineering technique, that applies the Markov Chain Model algorithm in the creation of new features. These new features represent the probability of transiting to any state, being in a specific state.
This technique is based on thorough study of the behavior and diagnosis of acute kidney injury, especially the fact that the previous state is a key attribute in the diagnosis (as it was explained in Section 4).
This conditional features are created from the transition probabilities in Section 4.1 (Table 2) and are added as new attributes to the dataset. The number of features created corresponds to the number of classes that represent the data. In each row, the added attributes are the transition probabilities associated with the current class. For example, if a patients current state is Normal, then the row of the probability matrix added is the first one of Table 2. Therefore, by adding these new attributes the dataset should look similar to the Table 7.
AKI dataset with the Markov Chain Model Features
AKI dataset with the Markov Chain Model Features
The final proposed approach is also a feature engineering technique and is a mixture of the previous approaches, since it engineers new features using multiple markov models, created by dividing the dataset using a common attribute. However, in the case of Markov Chain Model Conditioned Features is the algorithm that chooses which attribute is going to use to divide the dataset.
We can see this in Eq. (6) which represents the probability of the next state, knowing the previous state
This chosen attribute, named the most discriminant attribute, is selected using the information gain estimation [25]. This evaluator ranks the attributes according to the information gained from the attribute’s value, and is obtained by the difference of the expected information or entropy
In our dataset, the most discriminant attribute is the GFR. This is consistent with what was mentioned in Section 3: the disease state is obtained in part through this rate.
Since the most discriminant attribute is continuous, it is necessary to transform it into a categorical attribute, to be able to divide the dataset. To perform this transformation several discretizations can be applied, such as equal-frequency interval discretization, clustering based discretization, among others [27]. In this case, we applied the equal-frequency interval discretization, since is a simpler way to divide the attribute.
Applying this discretization, the continuous attribute GFR was then divided into three categorical values, as we can see in the Table 8.
GFR categories
After this, the dataset is divided according to these values into three subsets. To each of the subsets is then applied the Markov Chain Model. For example, if a patient’s current stage is stage 0 or Normal and his GFR is 51.0709 (category A), the corresponding transitions probabilities are given by the equation:
These probabilities are then added to the global dataset. The resulting dataset is similar to the previous approach. The only difference being that for each current state there are three different probability rows, one for each categorical value.
With the application of this feature engineering technique, that aggregates information of several attributes in a single one we are concentrating more relevant information into a single node than if these attributes were used in separate. This combination of attributes helps mitigate the myopia in algorithms, since it stops unnecessary splits from occurring, meaning less information loss in each split and consequently more accurate predictions.
To evaluate the proposed approaches and make a comparative study between, we designed the following configuration of experiments: we compared the performance of Markov Chain Model, Markov Chain Model Specialist, Markov Chain Model Features, Markov Chain Model Conditioned Features and Random Forest algorithms using stratified 10-fold cross validation, to divide the data of all 6558 patients. The 10-fold were organized so that all the records belonging to each patient appeared in only one fold.
The Random Forest algorithm used is the standard implementation and consists of five trees. This number was obtained by testing different numbers of trees (2 to 100). Then the results were compared. The set of trees with the highest number of similar results was the chosen one.
Considering that the Markov Chain Model Features and Markov Chain Model Conditioned Features are not algorithms but feature engineering techniques, it is necessary to choose an algorithm to use with these features. Since Random Forest is already implemented as an algorithm to compare our approaches with, in part because it is an algorithm know to have excellent results in the majority of problems [28], as well as for being robust to noise [29] and missing data [30], so we opted to use it also with these features.
Since the problem in question is a multiclass problem, the results from each test were arranged into confusion matrices [31]. After creating the confusion matrices, we analyze the results by computing the performance measures: precision (Eq. (10)), recall (Eq. (11)) and f-measure (Eq. 12)) [32]:
If we analyze Table 9, that represents the performance measures obtained in the tests, we can see that all three proposed approaches show improvement in the results, compared with the Markov Chain Model and Random Forest.
Results comparison
The gains obtained with Markov Chain Model Specialists can be explained by analyzing the Markov’s transition probabilities in Table 2. The probability of a patient transiting from Risk to Normal stage is higher than the other probabilities. However, the probability of maintaining the stage Risk is lower than changing to Normal, but still high, when compared to transitioning to other stages. This is reflected in the results: whenever the current stage is Risk the Markov Chain Model predicts that the next stage will be Normal, since it’s the one with highest probability.
This is the reason why the specialists approach has better results: by splitting the patients by type of ICU, the computed transition probabilities differ in each of the four ICUs and consequently, it restrains patients with more similar diagnostics than in the basic Markov Chain Model.
This minor difference of probabilities between Normal and Risk (this is the only state in which the probability of regressing is higher than that maintaining the same state or even progressing) can be explained from the point of view of the disease (Section 3, Fig. 1): when a patient is at risk it means that the disease has not yet affected the kidneys permanently. For this reason, it is possible to regress more easily than when we speak of injury or failure, which taken literally, means that the kidney have suffered an injury or simply stopped working. This type of damage is much more difficult to regress from, as we can see in the transition probabilities (Table 2).
Besides the Markov Chain Model, the ICU Specialists had also better results than the Random Forest algorithm, which is an algorithm known to have excellent results in the majority of the problems [28]. This was also achieved by only exploring one attribute of the the dataset (Disease Stage), while in Random Forest all the attributes present in the dataset were explored to create the model. Despite obtaining a worse result than the ICU specialists, the Random Forest algorithm still achieves a better result than the traditional approach of the Markov Chains Model. This proves that the next state depends, not only from the previous state, but also from other attributes in the dataset.
Our two feature engineering approaches (Markov Chain Model Features and Markov Chain Model Conditional Features) applied with Random Forest had also better results than the approaches that were using the original dataset, including the first proposed approach (Markov Chain Model Specialists).
As stated in Section 4, this happens due to the fact that decision trees (like other algorithms) have a short sightedness problem. Since Random Forest is composed of several decision trees, this means that it inherits the same problem. By aggregating information from several attributes into one we are giving more information to the trees on a single node, producing more concise trees, avoiding unnecessary splits and loss of information.
We can also see this from this disease’s point of view: as we are giving additional information about the patients current state, we are pointing the algorithm in the direction of the correct diagnosis. In addition, and as already mentioned, we are also optimizing the prediction time of this algorithm, since we are preventing unnecessary splits from happening. However, it is important to emphasize that what is true in this disease may not happen when we study of other diseases that are not so dependent on the previous state.
In the presented results is also evident that the application of feature engineering techniques as part of the pre-processing of the data has a greater positive impact on the performance, in this particular problem, than just using the current state to create the new attributes.
The acute kidney disease or AKI is a disease of rapid progression, that manifests itself mostly in ICUs, as consequence of a previous disease. Since it is a disease of rapid progression, its timely detection is the key to prevent further damage to the kidneys.
In order to help in the early detection of this disease we proposed several new approaches based on the Markov Chain Model algorithm: Markov Chain Model Specialists, Markov Chain Model Features and Markov Chain Model Conditional Features.
The first approach (Markov Chain Model Specialists) was based on the idea that the comorbidities of each ICU influence differently the disease progression. In the tests, this approach showed to have better results than the normal Markov Chain Model and Random Forest, models known to obtain excellent results in the majority of the problems. This fact occurs, as stated before, due to the grouping of the patients by ICU type, that restrains the outcome of the chain. This results in a more accurate prediction, based on common inputs.
The second approach (Markov Chain Model Features), which is a feature engineering technique, was implemented with Random Forest and was based on the idea that the previous state is a key element in the diagnosis. In this approach the Markov Chain Model was applied to the current class in order to create the transition probabilities, that is, conditional global features.
The third and final approach (Markov Chain Model Conditional Features) is also a feature engineering technique and is a mixture of the two previous approaches. In this technique, the features are created using multiple Markov Chain Models and a common attribute, as in the Markov Chain Model Specialists, but instead of choosing this attribute based on previous knowledge of the disease, it is chosen automatically by the algorithm, using the information gain estimation.
Like the previous approach (Markov Chain Model Specialists), these two approaches were able to obtain better results than Markov Chain Model and Random Forest, also managing to beat the first implemented approach. This improvement in the results happens due to the fact that these approaches help mitigate the short sightliness problem of algorithms.
In the future, we are working in new and different approaches to solve the AKI early detection problem and exploring other attributes to create other more specific specialists. We are also discussing with a doctor in a Portuguese hospital the implementation of an alarm system that uses one of these three proposed approaches and cost analysis of such implementation.
Since the acute kidney injury is a time dependent disease, we are also planning to implement methods such as Hidden Semi-Markov Models and Non-stationary Hidden Markov Models (among others) to have into account this attribute in the transition between states.
Finally, we are also planning in applying these approaches to other types of problems, inside and outside of the medical field.
Footnotes
Acknowledgments
This work is supported by the NanoSTIMA Project: Macro-to-Nano Human Sensing: Towards Integrated Multimodal Health Monitoring and Analytics/NORTE-01-0145-FEDER-000016 which is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF).
Appendix
Dependency analysis 1
DiasABP
HR
MAP
GCS
pH
Lactate
PaCO2
BUN
DiasABP
1
0.2
0.8
0
0
0
0.1
0.1
HR
0.2
1
0.1
0.1
0.2
0.2
0
0.1
MAP
0.8
0.1
1
0
0
0.1
0
0
GCS
0
0.1
0
1
0.3
0.2
0.1
0
pH
0
0.2
0
0.3
1
0.1
0.2
0.4
Lactate
0
0.2
0.1
0.2
0.1
1
0.1
0
PaCO2
0.1
0
0
0.1
0.2
0.1
1
0.1
BUN
0.1
0.1
0
0
0.4
0
0.1
1
Creatinine
0.1
0
0.1
0
0.4
0.1
0.1
0.7
Glucose
0
0
0
0.1
0.3
0.2
0
0.1
Na
0
0
0
0.3
0
0.3
0.1
0.3
FiO2
0
0.1
0.1
0.1
0.1
0.2
0.1
0
PaO2
0
0.1
0.1
0.2
0.2
0.1
0.1
0.1
HCT
0.1
0
0.1
0.1
0.1
0
0
0
Age
0.3
0.2
0.1
0
0.1
0
0
0.3
Gender
0.1
0
0
0
0.1
0
0
0.1
NIDiasABP
0.3
0.1
0.3
0
0.2
0
0
0.2
NIMAP
0.3
0
0.3
0
0.3
0
0
0.1
Urine
0.1
0
0.1
0
0.2
0.1
0
0.3
Weight
0.1
0.1
0
0
0.4
0
0.2
0.1
Table 10, continued
DiasABP
HR
MAP
GCS
pH
Lactate
PaCO2
BUN
Albumin
0.1
0.2
0.1
0.2
0.2
0.1
0.1
0.1
ALP
0
0
0
0.1
0.4
0.1
0
0.2
ALT
0.1
0.1
0
0.1
0.2
0.2
0
0
AST
0.1
0.1
0.1
0.2
0.4
0.3
0
0
TroponinI
0.1
0.1
0
0.1
1
0
0.1
0.3
Platelets
0
0
0
0.2
0.2
0.2
0
0
WBC
0
0
0
0
0.6
0
0
0.1
SysABP
0.3
0
0.4
0.1
0.1
0.1
0
0.1
HCO3
0.1
0.1
0.1
0.2
1
0.1
0.4
0.2
Mg
0.1
0.1
0.1
0
0.1
0.1
0
0.2
K
0.1
0.1
0.1
0
0.8
0.1
0.1
0.3
Cholesterol
0
0.1
0
0
0
0.2
0.1
0.1
TroponinT
0
0.1
0
0.1
0
0.1
0.1
0.1
Temp
0.2
0.1
0.1
0.1
0.5
0.2
0.1
0.1
Bilirubin
0.1
0
0.1
0
0
0.2
0.1
0
RespRate
0.1
0.2
0
0.1
0
0.1
0.2
0
SaO2
0
0.1
0
0
0
0.1
0
0.2
Time.Interval
0
0
0
0.1
0.2
0.2
0.1
0
HypertensionState
0.3
0.1
0.3
0
0
0
0
0.1
DiabetesState
0
0
0
0.1
0.3
0.1
0.1
0.1
IMC
0.1
0
0
0
0.3
0
0.2
0.1
TFG
0.1
0
0.1
0
0.4
0.1
0.1
0.8
InitialScr
0.2
0.1
0.1
0
0.2
0
0.1
0.1
ICUType
0.1
0.1
0.2
0.1
0.1
0.1
0
0.1
Stage
0.1
0
0.1
0
0.3
0
0.1
0.6
Dependency analysis 2
Creatinine
Glucose
Na
FiO2
PaO2
HCT
Age
Gender
DiasABP
0.1
0
0
0
0
0.1
0.3
0.1
HR
0
0
0
0.1
0.1
0
0.2
0
MAP
0.1
0
0
0.1
0.1
0.1
0.1
0
GCS
0
0.1
0.3
0.1
0.2
0.1
0
0
pH
0.4
0.3
0
0.1
0.2
0.1
0.1
0.1
Lactate
0.1
0.2
0.3
0.2
0.1
0
0
0
PaCO2
0.1
0
0.1
0.1
0.1
0
0
0
BUN
0.7
0.1
0.3
0
0.1
0
0.3
0.1
Creatinine
1
0.1
0.2
0
0.2
0
0.2
0.2
Glucose
0.1
1
0.1
0
0
0
0
0
Na
0.2
0.1
1
0.1
0.1
0.1
0.1
0
FiO2
0
0
0.1
1
0.1
0.1
0
0.1
PaO2
0.2
0
0.1
0.1
1
0.2
0.1
0
HCT
0
0
0.1
0.1
0.2
1
0
0
Age
0.2
0
0.1
0
0.1
0
1
0.1
Gender
0.2
0
0
0.1
0
0
0.1
1
NIDiasABP
0.1
0
0.2
0
0
0
0.2
0.1
NIMAP
0.1
0
0.2
0.1
0.1
0
0.1
0.1
Urine
0.2
0
0.2
0
0.1
0
0.2
0
Weight
0.1
0.1
0.8
0.1
0.1
0
0.3
0.3
Albumin
0.1
0.1
0.2
0.1
0.1
0.2
0
0.1
ALP
0.1
0
0.5
0
0.1
0
0
0
ALT
0.1
0
0.1
0.1
0
0
0.2
0.1
AST
0.2
0
0.3
0
0
0
0.2
0
TroponinI
0.2
0
1
0.2
0.1
0.1
0
0.1
Table 11, continued
Creatinine
Glucose
Na
FiO2
PaO2
HCT
Age
Gender
Platelets
0
0
0.3
0.1
0
0.1
0
0
WBC
0.1
0.1
0
0
0.1
0.1
0
0
SysABP
0.1
0
0.1
0.1
0
0
0.1
0
HCO3
0.2
0.1
0.3
0
0.1
0
0
0.1
Mg
0.2
0
0
0
0
0
0.1
0
K
0.4
0
0.2
0
0.1
0
0
0.2
Cholesterol
0.1
0.1
0.7
0.1
0
0.2
0.1
0.2
TroponinT
0.1
0.1
0.8
0.1
0.1
0.2
0.1
0
Temp
0.1
0
0.3
0.1
0.1
0.1
0.2
0
Bilirubin
0
0.2
0
0.1
0.1
0.2
0.1
0
RespRate
0
0
0.9
0.2
0.1
0
0.1
0
SaO2
0.3
0
0.6
0.1
0.3
0.1
0.1
0.1
Time.Interval
0
0.1
0.1
0.2
0.2
0.1
0
0
HypertensionState
0
0
0
0.1
0
0
0.1
0
DiabetesState
0.1
0.6
0.2
0
0
0
0
0
IMC
0.1
0.1
0.7
0.1
0.1
0
0.2
0
TFG
1
0.1
0.2
0
0.2
0
0.3
0
InitialScr
0.1
0
0.1
0
0
0
0.6
0.8
ICUType
0.1
0
0.1
0.1
0
0
0.2
0
Stage
0.8
0
0.2
0
0.1
0
0.1
0
Dependency analysis 3
NIDiasABP
NIMAP
Urine
Weight
Albumin
ALP
ALT
AST
DiasABP
0.3
0.3
0.1
0.1
0.1
0
0.1
0.1
HR
0.1
0
0
0.1
0.2
0
0.1
0.1
MAP
0.3
0.3
0.1
0
0.1
0
0
0.1
GCS
0
0
0
0
0.2
0.1
0.1
0.2
pH
0.2
0.3
0.2
0.4
0.2
0.4
0.2
0.4
Lactate
0
0
0.1
0
0.1
0.1
0.2
0.3
PaCO2
0
0
0
0.2
0.1
0
0
0
BUN
0.2
0.1
0.3
0.1
0.1
0.2
0
0
Creatinine
0.1
0.1
0.2
0.1
0.1
0.1
0.1
0.2
Glucose
0
0
0
0.1
0.1
0
0
0
Na
0.2
0.2
0.2
0.8
0.2
0.5
0.1
0.3
FiO2
0
0.1
0
0.1
0.1
0
0.1
0
PaO2
0
0.1
0.1
0.1
0.1
0.1
0
0
HCT
0
0
0
0
0.2
0
0
0
Age
0.2
0.1
0.2
0.3
0
0
0.2
0.2
Gender
0.1
0.1
0
0.3
0.1
0
0.1
0
NIDiasABP
1
0.8
0.1
0.1
0.2
0
0
0
NIMAP
0.8
1
0.1
0.1
0.2
0
0
0.1
Urine
0.1
0.1
1
0.1
0.1
0.1
0
0.1
Weight
0.1
0.1
0.1
1
0
0.1
0
0.1
Albumin
0.2
0.2
0.1
0
1
0
0
0.1
ALP
0
0
0.1
0.1
0
1
0.1
0.1
ALT
0
0
0
0
0
0.1
1
0.8
AST
0
0.1
0.1
0.1
0.1
0.1
0.8
1
TroponinI
0.1
0.1
0.1
0.3
0
0.3
0.3
0
Platelets
0
0
0.1
0.1
0.1
0.1
0.1
0.2
WBC
0
0.1
0.1
0.2
0.1
0.2
0.1
0.1
SysABP
0.3
0.3
0.1
0.2
0
0.1
0
0.1
HCO3
0
0
0.1
0
0.4
0.1
0
0
Mg
0
0
0
0.1
0
0.1
0.1
0.2
Table 12, continued
NIDiasABP
NIMAP
Urine
Weight
Albumin
ALP
ALT
AST
K
0
0.1
0.2
0.3
0.1
0.1
0.1
0.3
Cholesterol
0
0
0
0.4
0.4
0.1
0.1
0.2
TroponinT
0
0.1
0
0.2
0.1
0.1
0.3
0.3
Temp
0.1
0
0.1
0.1
0.1
0
0.1
0
Bilirubin
0.1
0.2
0
0.5
0.1
0
0.4
0.5
RespRate
0
0
0
0.2
0
0.1
0
0.1
SaO2
0
0.1
0
0
0.2
0.2
0.1
0.3
Time.Interval
0
0
0.1
0
0.1
0.1
0
0
HypertensionState
0.3
0.3
0
0
0
0
0.1
0.1
DiabetesState
0
0
0
0.1
0
0
0
0
IMC
0.1
0.1
0.1
0.9
0.1
0.1
0
0
TFG
0.1
0.1
0.2
0
0.1
0.2
0
0.1
InitialScr
0.2
0.1
0.1
0.4
0.1
0
0.2
0.1
ICUType
0.1
0.1
0
0.1
0.2
0
0
0.1
Stage
0.1
0.1
0.2
0
0.1
0.2
0
0.1
Dependency analysis 4
TroponinI
Platelets
WBC
SysABP
HCO3
Mg
K
Cholesterol
DiasABP
0.1
0
0
0.3
0.1
0.1
0.1
0
HR
0.1
0
0
0
0.1
0.1
0.1
0.1
MAP
0
0
0
0.4
0.1
0.1
0.1
0
GCS
0.1
0.2
0
0.1
0.2
0
0
0
pH
1
0.2
0.6
0.1
1
0.1
0.8
0
Lactate
0
0.2
0
0.1
0.1
0.1
0.1
0.2
PaCO2
0.1
0
0
0
0.4
0
0.1
0.1
BUN
0.3
0
0.1
0.1
0.2
0.2
0.3
0.1
Creatinine
0.2
0
0.1
0.1
0.2
0.2
0.4
0.1
Glucose
0
0
0.1
0
0.1
0
0
0.1
Na
1
0.3
0
0.1
0.3
0
0.2
0.7
FiO2
0.2
0.1
0
0.1
0
0
0
0.1
PaO2
0.1
0
0.1
0
0.1
0
0.1
0
HCT
0.1
0.1
0.1
0
0
0
0
0.2
Age
0
0
0
0.1
0
0.1
0
0.1
Gender
0.1
0
0
0
0.1
0
0.2
0.2
NIDiasABP
0.1
0
0
0.3
0
0
0
0
NIMAP
0.1
0
0.1
0.3
0
0
0.1
0
Urine
0.1
0.1
0.1
0.1
0.1
0
0.2
0
Weight
0.3
0.1
0.2
0.2
0
0.1
0.3
0.4
Albumin
0
0.1
0.1
0
0.4
0
0.1
0.4
ALP
0.3
0.1
0.2
0.1
0.1
0.1
0.1
0.1
ALT
0.3
0.1
0.1
0
0
0.1
0.1
0.1
AST
0
0.2
0.1
0.1
0
0.2
0.3
0.2
TroponinI
1
0.3
0
0.1
0.4
0.2
0.2
1
Platelets
0.3
1
0.3
0
0.2
0.1
0.2
0.1
WBC
0
0.3
1
0.1
0
0
0.3
0.3
SysABP
0.1
0
0.1
1
0
0.1
0.1
0.2
HCO3
0.4
0.2
0
0
1
0
0
0
Mg
0.2
0.1
0
0.1
0
1
0.3
0.3
K
0.2
0.2
0.3
0.1
0
0.3
1
0
Cholesterol
1
0.1
0.3
0.2
0
0.3
0
1
TroponinT
0
0.2
0.2
0.3
0.5
0.2
0.1
0.1
Temp
0.2
0
0.1
0
0.1
0.1
0.1
0.1
Bilirubin
0
0.5
0
0.3
0.1
0.1
0.2
0.2
Table 13, continued
TroponinI
Platelets
WBC
SysABP
HCO3
Mg
K
Cholesterol
RespRate
0.7
0
0
0.4
0.1
0
0.1
0.2
SaO2
0
0.2
0.1
0.1
0.1
0
0
0.5
Time.Interval
0.1
0
0
0
0
0
0.1
0
HypertensionState
0.1
0.1
0
0.9
0
0
0
0.1
DiabetesState
0.1
0
0
0
0.1
0
0
0
IMC
0.2
0.1
0.2
0.1
0
0.2
0.2
0.4
TFG
0
0.1
0.1
0.2
0.2
0.4
0.1
InitialScr
0.1
0
0
0
0.1
0.1
0.2
0.1
ICUType
0.1
0.1
0.2
0.2
0.2
0
0.1
Stage
0.1
0
0.1
0.2
0.2
0.2
0.4
0.1
Dependency analysis 5
TroponinT
Temp
Bilirubin
RespRate
SaO2
Time.Interval
DiasABP
0
0.2
0.1
0.1
0
0
HR
0.1
0.1
0
0.2
0.1
0
MAP
0
0.1
0.1
0
0
0
GCS
0.1
0.1
0
0.1
0
0.1
pH
0
0.5
0
0
0
0.2
Lactate
0.1
0.2
0.2
0.1
0.1
0.2
PaCO2
0.1
0.1
0.1
0.2
0
0.1
BUN
0.1
0.1
0
0
0.2
0
Creatinine
0.1
0.1
0
0
0.3
0
Glucose
0.1
0
0.2
0
0
0.1
Na
0.8
0.3
0
0.9
0.6
0.1
FiO2
0.1
0.1
0.1
0.2
0.1
0.2
PaO2
0.1
0.1
0.1
0.1
0.3
0.2
HCT
0.2
0.1
0.2
0
0.1
0.1
Age
0.1
0.2
0.1
0.1
0.1
0
Gender
0
0
0
0
0.1
0
NIDiasABP
0
0.1
0.1
0
0
0
NIMAP
0.1
0
0.2
0
0.1
0
Urine
0
0.1
0
0
0
0.1
Weight
0.2
0.1
0.5
0.2
0
0
Albumin
0.1
0.1
0.1
0
0.2
0.1
ALP
0.1
0
0
0.1
0.2
0.1
ALT
0.3
0.1
0.4
0
0.1
0
AST
0.3
0
0.5
0.1
0.3
0
TroponinI
0
0.2
0
0.7
0
0.1
Platelets
0.2
0
0.5
0
0.2
0
WBC
0.2
0.1
0
0
0.1
0
SysABP
0.3
0
0.3
0.4
0.1
0
HCO3
0.5
0.1
0.1
0.1
0.1
0
Mg
0.2
0.1
0.1
0
0
0
K
0.1
0.1
0.2
0.1
0
0.1
Cholesterol
0.1
0.1
0.2
0.2
0.5
0
TroponinT
1
0.1
0.2
0.2
0.1
0.1
Temp
0.1
1
0.5
0.1
0.6
0.1
Bilirubin
0.2
0.5
1
0.3
0
0
RespRate
0.2
0.1
0.3
1
0.1
0
SaO2
0.1
0.6
0
0.1
1
0.1
Time.Interval
0.1
0.1
0
0
0.1
1
HypertensionState
0.1
0.1
0.1
0
0.2
0.1
DiabetesState
0.2
0
0.2
0
0.1
0.1
Table 14, continued
TroponinT
Temp
Bilirubin
RespRate
SaO2
Time.Interval
IMC
0
0.1
0.8
0.2
0.1
0
GFR
0.1
0.1
0
0
0.2
0
InitialScr
0.1
0.1
0.1
0.1
0
0
ICUType
0.4
0.1
0
0.1
0.2
0
Stage
0.1
0.1
0.1
0
0.2
0.1
Dependency analysis 6
HypertensionState
DiabetesState
IMC
TFG
InitialScr
ICUType
Stage
DiasABP
0.3
0
0.1
0.1
0.2
0.1
0.1
HR
0.1
0
0
0
0.1
0.1
0
MAP
0.3
0
0
0.1
0.1
0.2
0.1
GCS
0
0.1
0
0
0
0.1
0
pH
0
0.3
0.3
0.4
0.2
0.1
0.3
Lactate
0
0.1
0
0.1
0
0.1
0
PaCO2
0
0.1
0.2
0.1
0.1
0
0.1
BUN
0.1
0.1
0.1
0.8
0.1
0.1
0.6
Creatinine
0
0.1
0.1
1
0.1
0.1
0.8
Glucose
0
0.6
0.1
0.1
0
0
0
Na
0
0.2
0.7
0.2
0.1
0.1
0.2
FiO2
0.1
0
0.1
0
0
0.1
0
PaO2
0
0
0.1
0.2
0
0
0.1
HCT
0
0
0
0
0
0
0
Age
0.1
0
0.2
0.3
0.6
0.2
0.1
Gender
0
0
0
0
0.8
0
0
NIDiasABP
0.3
0
0.1
0.1
0.2
0.1
0.1
NIMAP
0.3
0
0.1
0.1
0.1
0.1
0.1
Urine
0
0
0.1
0.2
0.1
0
0.2
Weight
0
0.1
0.9
0
0.4
0.1
0
Albumin
0
0
0.1
0.1
0.1
0.2
0.1
ALP
0
0
0.1
0.2
0
0
0.2
ALT
0.1
0
0
0
0.2
0
0
AST
0.1
0
0
0.1
0.1
0.1
0.1
TroponinI
0.1
0.1
0.2
0.2
0.1
0.2
0.1
Platelets
0.1
0
0.1
0
0
0.1
0
WBC
0
0
0.2
0.1
0
0.1
0.1
SysABP
0.9
0
0.1
0.1
0
0.2
0.2
HCO3
0
0.1
0
0.2
0.1
0.2
0.2
Mg
0
0
0.2
0.2
0.1
0.2
0.2
K
0.1
0
0.2
0.4
0.2
0
0.4
Cholesterol
0.1
0
0.4
0.1
0.1
0.1
0.1
TroponinT
0.1
0.2
0
0.1
0.1
0.4
0.1
Temp
0.1
0
0.1
0.1
0.1
0.1
0.1
Bilirubin
0
0.2
0.8
0
0.1
0
0.1
RespRate
0.1
0
0.2
0
0.1
0.1
0
SaO2
0.2
0.1
0.1
0.2
0
0.2
0.2
Time.Interval
0.1
0
0
0
0
0.1
HypertensionState
1
0
0
0
0.1
0.1
0
DiabetesState
0
1
0.1
0
0.1
IMC
0
0.1
1
0.1
0
GFR
0
1
0.2
0.1
InitialScr
0.1
0
0.1
0.2
1
0.1
ICUType
0.1
0.1
0.1
1
Stage
0
0.1
0
1
