Abstract
Introduction
The most common method of evaluating the effectiveness of a new drug or clinical intervention is the randomized control trial, in which subjects are randomly selected and categorized into a treatment and a control group, and the effect is compared between the two groups. The most serious problem of randomized control trials is avoiding bias between the two groups, which is referred to as sample selection bias. Samples are required to be selected randomly without bias and must be as similar as possible between the two groups to obtain an unbiased evaluation. Such unbiased sampling is not always achieved because there are actually many ways in which the material diverges with regard to users and sample subjects. A suitable method of matching the two groups to obviate such bias is therefore required. Traditional ways of coping with this problem include the matched sampling method or matched-pairs analysis. This method selects subjects in the control group to ensure similar criteria, such as age, sex, or health status, to subjects in the treatment group. In the field of telemedicine, Sohn et al., 1 for example, matched each of their treatment subjects with four control subjects having similar demographics and morbidity status. Ambiguity with this method remains, however: what multiple of each treatment subject is sufficient to eliminate bias, and by what degree is bias actually reduced? Moreover, sampling becomes more difficult with an increase in the criteria, and if the number of criteria must remain small, selection bias will remain.
A more rigorous method of overcoming selection bias is the propensity score matching (PSM) method, which enables the inclusion of as many criteria as necessary. A propensity score related to biased characteristics is first calculated for each individual, and then outcome variables, such as medical expenditures, are compared for individuals whose scores are close. One treatment subject is matched to one control subject who has similar characteristics, reducing sample selection bias. Moreover, the actual decrease in bias after matching can be calculated.
PSM use in medical research has been long and varied. Among studies of clinical interventions, for example, Ayanian et al. 2 examined the association of ambulatory visits to cardiologists, internists, and family practitioners after discharge for myocardial infarction and mortality. Using PSM to adjust for the demographic, clinical, and hospital characteristics of patients, they successfully ranked treatments among matched patients in terms of a reduction in morality. Gum et al. 3 analyzed whether aspirin is associated with a mortality benefit in patients with coronary disease. Although simple univariate analysis found no association between aspirin use and mortality, adjustment by PSM for age, sex, and other characteristics, including risk factors, other medications, and coronary disease, did identify a decrease in mortality with aspirin. In drug evaluation, Wang et al. 4 compared conventional and atypical antipsychotic medication for mortality among elderly patients and used propensity-score adjustments to conclude that the former increased the risk of death.
Here, we used the PSM method to estimate the effect of telecare on medical expenditures. Although treatment-effects studies have been widely used in medicine, only a few studies have examined the effects of telemedicine using PSM. One example is the study of Care Coordination/Home Telehealth conducted by the Veterans Health Administration by Chumbler et al., 5 albeit that their analytical methods of PSM and estimation were less rigorous than those of the present study, with outcome variables restricted to hospital admission and days of hospitalization. To our knowledge, no other study has evaluated the effect of telecare on medical expenditures using PSM.
A second goal of this article was to examine the robustness of results. Given that results with PSM might differ by the matching method used, which is a limitation of this method, 6 we evaluated robustness with four matching methods: caliper matching, single nearest-neighbor matching, Epanechnikov kernel matching, and biweight kernel matching.
Subjects and Methods
Selection of Sample and Data Characteristics
The data used in this article were reported in our previous study. 7 From a total of 523 users and 3,528 nonusers in Nishi-aizu Town, Fukushima Prefecture, Japan, respectively, 199 and 209 individuals were selected for each group through a questionnaire survey that asked about individual characteristics and use of telecare. Healthcare receipts for 5 years (2002–2006) were obtained from the National Health Insurance system and checked. The small number of users meant that the sampling was necessarily biased, as detailed in Table 1, which expresses biases by the difference between the averages of the two groups and uses t values to indicate the degree of bias for individual variables. Significant biases were identified in “chronic diseases,” “age,” “number of family,” “income,” “heart disease,” “high blood pressure,” “strokes,” “ophthalmic diseases,” and “anal diseases” and in subjective belief in the value of telecare on health status, termed “Effects 1–4.”
Test of Selection Biases
There were a total of 2,040 subjects (995 users, 1,045 nonusers). The t testing was one-tailed.
Significance level of 1%, 5%, and 10%, respectively.
The number of positive replies to the questionnaire item asking whether the subject had chronic diseases or not was substantially higher for the user than the nonuser group. Substantial corresponding bias was also seen with regard to the presence of heart disease, high blood pressure, and strokes and to the number of users treated for these conditions during the sample period. A question on subjective belief in the value of telecare on health status with respect to the four effects showed that users tended to have higher health consciousness than nonusers, which is consistent with anecdotal impressions expressed by the town's public nurses who manage the system.
PSM
PSM was initially proposed by Rosenbaum and Rubin
8
and developed by Heckman et al.
9
The procedure is as follows: 1. First, subjects in the user (treatment) and nonuser (control) groups are individually matched with one another so that their propensity scores as calculated according to their attributes become closer. The score is calculated by a probit analysis, which is interpreted as the predicted probability of a probit estimation. The model consists of the user dummy as a dependent variable, whereas independent variables are those that have a sample selection bias, as shown in Table 1. 2. Second, subjects in the treatment and control groups are matched based on propensity score. There are several ways of matching—caliper matching is generally considered better than others, such as nearest-neighbor matching, because it can exclude “bad” matches.
6
This article uses caliper matching, in which a value for the maximum distance of predefined propensity scores is fixed at 0.0001, which the PSM literature describes as sufficiently small. The suitability of the matching can be examined by a balancing test, in which the explanatory variables listed above in the treatment and control groups are compared by a t test—when a treatment does not meet its best-matched control, re-sampling by the bootstrapping method with 1,000 replications is conducted. If there is no statistically significant difference, the matching is concluded. 3. Finally, the effect of telecare on outcome variables, which in this article are medical expenditures and number of days required for treatment, is examined based on matched samples by a t test (standard error estimation).
Results
Summary statistics for outcome variables—medical expenditures and days of treatment—are summarized in Table 2.
Summary Statistics for Outcome Variables
Max, maximum; Min, minimum; SD, standard deviation.
Bias Control
PSM thus calculates a propensity score by a probit model in which the dependent variable is the user dummy variable, whereas independent variables are selected based on whether they contain a selection bias. Whether matching based on the propensity score works is examined by a balancing test is shown in Table 3. The column labeled “% of bias” indicates the percentages of bias contained before and after matching for each variable. For example, “age” has 28.3% bias before matching, which is reduced to 4.4% after matching. Similarly, the column labeled “% of reduced bias” shows the percentage of bias actually reduced by PSM, or 81.9% for age. The reduction in sample selection bias is thus successful because no statistically significant variable remains after matching in terms of t values. In particular, biases related to subjective belief in the value of telecare on health status shown by “Effects 1–4” are also substantially reduced.
Result of Balancing Test
Effect of Telecare on Medical Expenditures and Days of Treatment
Finally, Table 4 summarizes the estimation results of PSM on the effect of telecare use on outpatient expenditures and number of outpatient days of treatment for chronic diseases, the main targets for Nishi-aizu's telecare system in terms of reducing chronic diseases. The rows labeled “before” and “after” indicate estimations before and after matching. Four methods of PSM matching are examined: single nearest-neighbor matching, Epanechnikov kernel matching, biweight kernel matching, and caliper (0.0001) matching.
Result of Estimation Based on Propensity Score Matching
Standard errors (SEs) of caliper matching are based on the bootstrapping of 1,000 replications. Medical expenditure was reduced after matching, as indicated in the column “Difference,” and this is measured by “points” of the National Health Insurance system. One point is equivalent to 10 Japanese yen (USD 0.13).
Significance level of 5% and 10%, respectively.
Table 4 shows that both medical expenditures and days of treatment for chronic diseases did not significantly differ between the two groups before matching, whereas after matching all four matching methods showed a significantly negative difference (p<0.05), implying that telecare has an effect on medical expenditures and days of treatment for chronic diseases. The column labeled “difference” indicates the decrease in the amount of expenditure and number of treatment days. Caliper matching provided the greatest effect, namely, cost in Japanese yen (JPY) 39,936 (USD 499.20) and treatment for 3.97 days per year per user, whereas Epanechnikov kernel matching produced the smallest, at JPY 25,538 (USD 319.23) and 2.60 days, respectively. Differences among the four methods are not large, however, and Table 4 shows that the results obtained are accordingly robust.
Discussion and Conclusions
Table 4 thus demonstrates that telecare does not contribute to a reduction in medical expenditures for all diseases, but only for chronic diseases, 10 because user expenditures for chronic diseases are larger than those of nonusers before matching but not significantly different after matching.
Table 5 compares the effects obtained by the other estimation methods, such as simple ordinary least squares 10 and System GMM, 7 used in our previous articles. The effects of telecare are underestimated when sample selection biases are not controlled for.
Comparison of Results Using Alternative Estimation Methods
Ordinary least squares (OLS) was from Akematsu and Tsuji, 10 System GMM was from Minetaki et al., 7 and propensity score matching was from the present study.
JPY, Japanese yen.
This article proposes a new estimation method for solving sample selection bias, a problem inherent in survey data. When bias is not eliminated, the estimation obtained deviates from the true value. These findings indicate the value of PSM in evidence-based research and the rigorous scientific methodologies required to conduct it.
Although PSM offers major benefits in the evaluation of telecare projects, it has its own limitations. First, it requires a large number of samples, and several previous studies have in fact used samples in the several tens of thousands range. Second, its results are not always robust, which is why our present article examines four matching methods. These limitations have been described, 8,9 but one limitation specific to telecare has not. In this article, PSM successfully demonstrated that the user group had lower medical expenditures than the nonuser group under the condition that all subjects were closely similar except in their use of telecare. Our previous study 10 concluded that these results were due to the difference in health consciousness between the groups. By checking heath data transmitted by the telecare system, users became more concerned with health and had an incentive to change their behavior to be more health-promoting. These findings are not consistent with those of the present analysis, however, which found different expenditures despite a closely similar degree of health consciousness, which could only be due to telecare use. PSM thus provides little explanation of why and how telecare leads to these results, and identification of these mechanisms requires the use of other empirical methods together with PSM.
Footnotes
Disclosure Statement
No competing financial interests exist.
