Abstract
Introduction
Telehealthcare is an important trend of health management and care. Most telehealthcare applications focus on care either for patients with chronic diseases or for elderly patients. Min-Sheng General Hospital in Taiwan has offered a telehealthcare service platform, “Smart Care,” specifically for patients discharged from the hospital. In order to most effectively utilize medical resources, doctors allow patients to be discharged from the hospital if those patients have recovered sufficiently that their medical needs can be met by themselves or by a caregiver. However, such patients need to be monitored remotely, given nursing suggestions and instructions, and, if necessary, told to return to the hospital.
Figure 1 illustrates the telehealthcare service platform of Smart Care. Patients discharged from the hospital can join the service on the recommendation of their doctors. Patients regularly measure vital signs at home according to the measurement prescription issued by their doctors. They then upload the measured data and report their current health status and symptoms by the home gateway or interactive voice response system. The nursing team at the call center, which is composed of professional nurses and doctors, phones patients in order to periodically assess patients' health status and to address any patient concerns. The nursing team gives nursing suggestions and instructions to the patient or caregiver by drawing on their professional training and experience and with assistance from information systems.

The telehealthcare service platform of Smart Care.
There are special needs for such a telehealthcare system for patients discharged from the hospital. The nursing team at the call center must make judgments on the need for patients to return to the hospital for further checkup, based on the symptoms described by the patients themselves and the clinical histories of the patients. Symptom data are important keys to assessing recovery progress for patients who were recently discharged from the hospital. However, symptom data are difficult to quantify because each patient case is unique, and the observations and perceptions on symptoms by the patients themselves are subjective.
Several studies currently exist about information systems that aid doctors in making clinical decisions. A clinical decision support system (CDSS) links health observations with health knowledge to influence health choices by clinicians for improving healthcare. A CDSS uses several items of patient data to generate case-specific advice. 1
The first well-known CDSS, AAPHelp, was developed in 1972. It was a computer-aided diagnosis system to support clinical assessment and decision-making for acute abdominal pain and was based on clinical evidence and best practices from the United Kingdom and Europe. It implemented an electronic data collection protocol and prompted clinicians to make a thorough and accurate clinical assessment. It provided definitions of clinical symptoms and signs, access to large databases of information about patients with acute abdominal pain, and a display of real outcomes for patients with clinical presentation similar to that of the patient who was being evaluated. 2
DXplain and Quick Medical Reference were successful and commercialized systems originating in the 1980s. DXplain was developed in 1987 and used a set of clinical findings (signs, symptoms, and laboratory data) to produce a ranked list of diagnoses that might explain (or be associated with) the clinical manifestations. DXplain included 2,200 diseases and 5,000 symptoms in its knowledge base. DXplain provided justification for why each of these diseases should be considered, suggested what further clinical information would be useful to collect for each disease, and listed what clinical manifestations, if any, would be unusual or atypical for each of the specific diseases. 3
Quick Medical Reference, developed in 1989, was a diagnostic decision support system with a knowledge base of diseases, diagnoses, findings, disease associations, and laboratory information. It was designed for three types of purposes: as an electronic textbook, as an intermediate-level spreadsheet for the combination and exploration of simple diagnostic concepts, and as an expert consultant program with information from the primary medical literature on almost 700 diseases and more than 5,000 symptoms, signs, and laboratory values. 4
From reviews of previous studies, CDSSs provide clinical decision support through correlation of patient symptom data, diseases, and patient outcomes by the implementation of information technologies (such as expert system and machine learning). CDSS have been used in medical and healthcare practice for several years, but their application to telehealthcare is rarely found.
This article presents the development of a telehealthcare decision support system (TDSS) for patients recently discharged from the hospital. The TDSS collects symptom data from patients and clinical histories from the hospital information system and uses machine learning algorithms to generate a predictive model to classify patients and provide a degree of urgency for the patient to return to the hospital, which is probably the most important decision to be made by the nursing team at the call center, and has critical impact to the medical cost and recovery progress of the patient in such a telehealthcare application scenario.
Materials and Methods
System Overview
Figure 2 depicts the flowchart of constructing the proposed system. In the training phase, training data are preprocessed data that are composed from patients' raw data and manually validated results (the actual clinical histories of the patient returning to the hospital). The predictive model is built by a machine learning algorithm with the training data as input and can be updated periodically to improve the precision of prediction by increasing the numbers of patient cases. In the prediction phase, test data are preprocessed data that are composed from new patients' raw data. Test data are classified by the predictive model to generate prediction results (a degree of urgency for the patient to return to the hospital). Details of each item in the flowchart are described as follows.

The flowchart of constructing the telehealthcare decision support system.
Input Data
“Patients' raw data” are the combination of symptom data and clinical histories of the patients. Symptom data are observed at home by the patients themselves. Clinical histories are input from the hospital information system. “Data preprocessing” converts the text data of clinical records into numeric options. After data preprocessing, data are organized into 49 parameters for each patient case. Values for each parameter are simplified into positive integers. Parameters that involve a date are input as a calculation of difference in days from the relevant date. These 49 parameters are categorized as follows: 1. Status of the patient, which includes a survey of medication compliance, sleep conditions and emotions, date of discharge from hospital, etc. (Table 1) 2. Symptoms of the patient, which includes a survey of symptoms of pain, redness, swelling, and wound fluid, such as frequency of pain, level of pain, temperature to touch of the wounded area, color of wound fluid, etc. (Table 2) 3. Observations of wound, which includes locations of wounds, biggest size of wounds, etc. (Table 3) 4. Clinical history of surgery, which includes the primary department for diagnosis, surgery method, and surgery date (Table 4)
Parameters for Input: Status of the Patient
Parameters for Input: Symptoms of the Patient
Parameters for Input: Observations of the Wound
Parameters for Input: Clinical History of Surgery
Prediction Result
The prediction result of the TDSS is a single output, the urgency degree for the patient to return to the hospital, with five possible options of assigned degrees of urgency (Table 5). In other words, each patient case will be classified into one of these five groups.
The Prediction Result of the Telehealthcare Decision Support System
Manual Validation of Training Data
The training data of patient cases are validated manually with their actual clinical histories of returning to the hospital (Table 6), which is referred to as “retrospective chart reviews” to provide clinical evidence-based results. Details of how each patient case (instance) is manually validated with his or her actual clinical histories of returning to the hospital are described in Table 7.
Clinical History of Returning to the Hospital
Logic of Manual Validation of Clinical History of Returning to the Hospital
In 1 year of clinical practice of the Smart Care telehealthcare service, data for 1,568 patients were collected. This study was performed in accordance with the ethical standards of the 1964 Declaration of Helsinki. Participants gave informed consent prior to their inclusion in the study. Table 8 shows the distribution for the primary departments for diagnosis of the 1,568 patients.
Distribution for the Primary Departments for Diagnosis of the 1,568 Patients
Patient data from only 1,467 patient cases were used in this study. The remaining 101 patient cases were not included because of lack of clinical histories of returning to the hospital. The distribution of these patients' validated output value is shown in Table 9. For most of the patients, the indication for a need to return to the hospital was validated as either “tracking needed in 1 week” with a degree of urgency of 3 (67.6%) or “no need for advanced tracking” with a degree of urgency of 1 (19.7%). This distribution also shows that the baseline precision of classifying patients is 67.6% because of the biggest group.
Distribution of Validated Output Value of the 1,467 Patients
Machine Learning Algorithms
The core of the TDSS is the predictive model. In this special application scenario for patients discharged from the hospital, the predictive model is built and updated by machine learning algorithms for learning properties from symptom data and clinical histories within the training data to classify patient cases into five groups of different degrees of urgency for the patient to return to the hospital. Supervised learning that has known labels of desired outputs (five degrees of urgency) is used in this study. Five well-known machine learning algorithms—Bayesian network, decision trees, logistic regression, neural networks, and support vector machines, which have proven their effectiveness for classification of complicated data by statistics-based methodologies in previous studies—were evaluated in this research. The predictive model with best performance would be selected and deployed into the TDSS.
Predictive models were generated by these five classic machine learning algorithms. The performance of each predictive model was evaluated by a standard experiment, 10-fold cross-validation, and leave-one-out cross-validation (LOOCV), which are common techniques for estimating the performance of predictive models. In the standard experiment, the original samples are randomly partitioned into 70% for training, 20% for validation, and 10% for testing. In the 10-fold cross-validation, the original samples are randomly partitioned into 10 subsets. A single subset is retained as the validation data for testing the model, and the remaining nine subsets are used as training data. This step is repeated 10 times. Each subset is used exactly once as the validation data. Finally, 10 results from 10 subsets are averaged to produce single performance estimation. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once. In the LOOCV, a single observation from the original sample is used as the validation data, and the remaining observations are the training data. This is repeated such that each observation in the sample is used once as the validation data. The LOOCV is particularly suited for a sparse dataset. It can train on as many examples as possible to increase the precision of predictive model.
Results
The performances of the five classic machine learning algorithms for this application scenario are evaluated and compared. In Table 10, the collected data from the 1,467 patient cases are separated into a training set (70%), a validation set (20%), and a test set (10%). Each machine learning algorithm was trained by the training set of 1,027 instances, and its performance was evaluated by the test set of 147 instances in Table 10. The Bayesian network had the best performance in this application scenario. It correctly classified 112 instances (76.2%) in the test set with 147 instances, which is 8.6% higher than the baseline precision (67.6%).
Performance Comparison Among Machine Learning Algorithms by Using the Training Set (70%), Validation Set (20%), and Test Set (10%)
Each predictive model generated by the different machine learning algorithms was also trained and validated by 10-fold cross-validation and LOOCV in Table 11. Tree J48 had the best performance in both validations. It correctly classified 1,166 instances (79.5%, 11.9% higher than the baseline precision) in the 10-fold cross-validation and 1,174 instances (80.1%, 12.4% higher than the baseline precision) in the LOOCV. The performance of the Bayesian network was also very close.
Performance Comparison Among Machine Learning Algorithms by Using the 10-Fold Cross-Validation and Leave-One-Out Cross-Validation
Note that in the first experiment (Table 10), predictive models are trained by 1,027 instances. In the second and third experiment (Table 11), predictive models are trained by 1,321 instances (10-fold cross-validation) and 1,466 instances (LOOCV). The performance of all machine learning algorithms improves when more patient cases (instances) are involved. The best performance is 80.1% with 1,466 instances learned. The performance is expected to continue improving with increased number of instances and further refinement of the input parameters. In this study, the same 49 input parameters were used for all patients, considering we had to obtain enough patient cases for machine learning. A better performance of the predictive models may result if different input parameters are designed for patients from different departments of diagnosis.
On the other hand, the input parameters may contain many redundant or irrelevant features that provide no useful information. If the number of input parameters can be reduced without apparent loss of precision, the cost of patient data collection and the complication for classification of patient cases will also be reduced, which results in a more practical system. Feature selection is such a technique for selecting a subset of relevant features from the input parameters. Feature selection will also help the understanding of the most important input parameters in the model.
After the trainings were completed, this research used the “best first” technique for feature selection to find which features (parameters) are important for prediction. As shown in Table 12, the most important six features for prediction were suggested. Note that three features are related to the frequency and level of pain reported by the patients. As shown in Table 13, the performances of predictive models that were trained by six important features only were close to the performances of predictive models that were trained by the full 49 features. Again, the Bayesian network had the best all-around performance and finally was selected to be the machine learning algorithm of the TDSS.
The Result of Feature Selection
Performance of Predictive Models That Are Trained by Six Features
During the 1-year period, the nursing team at the call center was also requested to make recommendations (predictions) of degrees of urgency for the patients to return to the hospital, based on symptoms and clinical histories for these 1,467 patients. Table 14 shows the result of this manual classification. This confusion matrix, also known as a contingency table or an error matrix, is a specific table layout that visualizes the performance of classification result. Each column of the matrix represents the instances in a predicted class, and each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if this system is confusing two classes (i.e., commonly mislabeling one as another). The numbers in the diagonal marked in boldface type are the correctly classified patient cases (instances). Nurses correctly classified in 1,173 instances, and the precision is 79.96%, which is slightly lower than that of the TDSS (80.08%). A detailed comparison is shown in Table 15.
The Confusion Matrix of the Nursing Team (Clinical Personnel)
Performance Comparison of the Telehealthcare Decision Support System and the Nursing Team
FN, false negative; FNR, false-negative rate; FP, false positive; FPR, false-positive rate; NPV, negative predictive value; PPV, positive predictive value; TDSS, telehealthcare decision support system; TN, true negative; TP, true positive.
Discussion and Conclusions
This article presents the development of the TDSS for patients recently discharged from the hospital. Symptom data from patients and clinical histories from the hospital information system were collected, and a predictive model was built to provide a degree of urgency for the patient to return to the hospital. The performance of predictive models generated by five classic machine learning algorithms was evaluated, and finally the Bayesian network was selected to implement in the TDSS. Based on the 1,467 patient cases collected over a 1-year period, the performance of correct prediction by the TDSS is comparable to that by the nursing team at the call center.
The predictive model trained by the six most important features has satisfactory performance. Therefore the TDSS can be reduced to six input features only. The performance of the TDSS is expected to improve continuously, as more cases of patient data are collected and input into the TDSS.
This TDSS has been implemented in one of the largest commercialized telehealthcare practices in Taiwan administered by Min-Sheng General Hospital and is currently assisting the nursing team at the call center to make judgments on the need for patients to return to the hospital for further checkup.
Footnotes
Acknowledgments
This research is sponsored by the Department of Industrial Technology, Ministry of Economic Affairs, Taiwan, the National Science Council, Taiwan, and the Ministry of Education, Taiwan. This research is also supported by Smart Care Inc., Taiwan. These supports are gratefully acknowledged.
Disclosure Statement
M.-S.H. and C.-M.C. are employees of Smart Care Inc. H.L. and Y.-L.H. declare no competing financial interests exist.
