Measuring the Effect of Telecare on Medical Expenditures Without Bias Using the Propensity Score Matching Method

Abstract

Objective: This article examines the effect of telecare on medical expenditures for chronic diseases using survey data from Nishi-aizu Town, Fukushima Prefecture, Japan. The study uses the propensity score matching (PSM) method, a rigorous analytical method that overcomes sample selection bias, a common problem when using survey data. Subjects and Methods: One hundred ninety-nine users (treatment) of telecare and 209 nonusers (control) were selected from residents, and their medical expenditures were obtained from the National Health Insurance scheme for comparison. Individual characteristics of the two groups, including age, sex, income, and health conditions, were compared, and variables that contained biases were specified by a t test. After calculation of their propensity scores and elimination of biases, the effect of telecare on medical expenditures was estimated. To obtain robust results, four different matching methods were applied: caliper matching, single nearest-neighbor matching, Epanechnikov kernel matching, and biweight kernel matching. Results: No independent variable showed significant differences between the two groups after matching, indicating that selection biases were successfully eliminated using PSM. Using PSM, we saw a decrease in medical expenditures in Japanese yen of 25,538–39,936 (USD 319.23–499.20) per year per user and a decrease in the number of treatment days of 2.6–4.0 days. In comparison, our previous analyses using the same data underestimated the effects of telecare. PSM provides greater effects by reducing bias. Conclusions: Using PSM to compare subjects in two groups with similar characteristics except for their use or nonuse of telecare, we demonstrated that the treatment group has lower medical expenditures for chronic diseases than the control group. Proper matching is important in evaluating the impact of telecare interventions. Limitations of PSM include its requirement for a large number of samples and the limited ability to explain why and how telemedicine produces these effects. Other empirical methods are required to identify the mechanism of how telemedicine works.

Introduction

The most common method of evaluating the effectiveness of a new drug or clinical intervention is the randomized control trial, in which subjects are randomly selected and categorized into a treatment and a control group, and the effect is compared between the two groups. The most serious problem of randomized control trials is avoiding bias between the two groups, which is referred to as sample selection bias. Samples are required to be selected randomly without bias and must be as similar as possible between the two groups to obtain an unbiased evaluation. Such unbiased sampling is not always achieved because there are actually many ways in which the material diverges with regard to users and sample subjects. A suitable method of matching the two groups to obviate such bias is therefore required. Traditional ways of coping with this problem include the matched sampling method or matched-pairs analysis. This method selects subjects in the control group to ensure similar criteria, such as age, sex, or health status, to subjects in the treatment group. In the field of telemedicine, Sohn et al.,¹ for example, matched each of their treatment subjects with four control subjects having similar demographics and morbidity status. Ambiguity with this method remains, however: what multiple of each treatment subject is sufficient to eliminate bias, and by what degree is bias actually reduced? Moreover, sampling becomes more difficult with an increase in the criteria, and if the number of criteria must remain small, selection bias will remain.

A more rigorous method of overcoming selection bias is the propensity score matching (PSM) method, which enables the inclusion of as many criteria as necessary. A propensity score related to biased characteristics is first calculated for each individual, and then outcome variables, such as medical expenditures, are compared for individuals whose scores are close. One treatment subject is matched to one control subject who has similar characteristics, reducing sample selection bias. Moreover, the actual decrease in bias after matching can be calculated.

PSM use in medical research has been long and varied. Among studies of clinical interventions, for example, Ayanian et al.² examined the association of ambulatory visits to cardiologists, internists, and family practitioners after discharge for myocardial infarction and mortality. Using PSM to adjust for the demographic, clinical, and hospital characteristics of patients, they successfully ranked treatments among matched patients in terms of a reduction in morality. Gum et al.³ analyzed whether aspirin is associated with a mortality benefit in patients with coronary disease. Although simple univariate analysis found no association between aspirin use and mortality, adjustment by PSM for age, sex, and other characteristics, including risk factors, other medications, and coronary disease, did identify a decrease in mortality with aspirin. In drug evaluation, Wang et al.⁴ compared conventional and atypical antipsychotic medication for mortality among elderly patients and used propensity-score adjustments to conclude that the former increased the risk of death.

Here, we used the PSM method to estimate the effect of telecare on medical expenditures. Although treatment-effects studies have been widely used in medicine, only a few studies have examined the effects of telemedicine using PSM. One example is the study of Care Coordination/Home Telehealth conducted by the Veterans Health Administration by Chumbler et al.,⁵ albeit that their analytical methods of PSM and estimation were less rigorous than those of the present study, with outcome variables restricted to hospital admission and days of hospitalization. To our knowledge, no other study has evaluated the effect of telecare on medical expenditures using PSM.

A second goal of this article was to examine the robustness of results. Given that results with PSM might differ by the matching method used, which is a limitation of this method,⁶ we evaluated robustness with four matching methods: caliper matching, single nearest-neighbor matching, Epanechnikov kernel matching, and biweight kernel matching.

Subjects and Methods

Selection of Sample and Data Characteristics

The data used in this article were reported in our previous study.⁷ From a total of 523 users and 3,528 nonusers in Nishi-aizu Town, Fukushima Prefecture, Japan, respectively, 199 and 209 individuals were selected for each group through a questionnaire survey that asked about individual characteristics and use of telecare. Healthcare receipts for 5 years (2002–2006) were obtained from the National Health Insurance system and checked. The small number of users meant that the sampling was necessarily biased, as detailed in Table 1, which expresses biases by the difference between the averages of the two groups and uses t values to indicate the degree of bias for individual variables. Significant biases were identified in “chronic diseases,” “age,” “number of family,” “income,” “heart disease,” “high blood pressure,” “strokes,” “ophthalmic diseases,” and “anal diseases” and in subjective belief in the value of telecare on health status, termed “Effects 1–4.”

Table 1.

Test of Selection Biases

VARIABLE	NONUSER	USER	T VALUE
Chronic diseases	0.388	0.466	−3.46^c
Sex	0.568	0.546	0.98
Age	68.894	71.629	−6.80^c
Education	1.579	1.571	0.21
Employment	0.532	0.520	0.53
Number of family members	2.401	2.945	−6.29^c
Income	3.274	2.96	2.61^c
Heart diseases	0.064	0.144	−6.03^c
High blood pressure	0.367	0.469	−4.61^c
Diabetes	0.081	0.087	−0.48
Stroke	0.045	0.059	−1.45^a
Respiratory diseases	0.129	0.116	0.92
Cancer	0.068	0.078	−0.86
Gastropathy	0.157	0.164	−0.40
Lumbago, arthritis	0.147	0.159	−0.7
Ophthalmic diseases	0.211	0.297	−4.43^c
Kidney diseases	0.029	0.021	1.16
Anal diseases	0.014	0.005	1.95^b
Effect 1: reduced anxiety in day-to-day life	0.962	1.076	−3.15^c
Effect 2: stabilization of illness	0.824	0.977	−5.00^c
Effect 3: enhancement of health consciousness	0.911	0.980	−2.19^b
Effect 4: decrease in medical expenditures	1.026	1.361	−7.75^c
Year 2002	0.243	0.135	5.97^c
Year 2003	0.206	0.191	0.84
Year 2004	0.206	0.191	0.84
Year 2005	0.175	0.238	−3.47^c
Year 2006	0.170	0.245	−4.15^c

There were a total of 2,040 subjects (995 users, 1,045 nonusers). The t testing was one-tailed.

a–c

Significance level of 1%, 5%, and 10%, respectively.

The number of positive replies to the questionnaire item asking whether the subject had chronic diseases or not was substantially higher for the user than the nonuser group. Substantial corresponding bias was also seen with regard to the presence of heart disease, high blood pressure, and strokes and to the number of users treated for these conditions during the sample period. A question on subjective belief in the value of telecare on health status with respect to the four effects showed that users tended to have higher health consciousness than nonusers, which is consistent with anecdotal impressions expressed by the town's public nurses who manage the system.

PSM

PSM was initially proposed by Rosenbaum and Rubin⁸ and developed by Heckman et al.⁹ The procedure is as follows:

1. First, subjects in the user (treatment) and nonuser (control) groups are individually matched with one another so that their propensity scores as calculated according to their attributes become closer. The score is calculated by a probit analysis, which is interpreted as the predicted probability of a probit estimation. The model consists of the user dummy as a dependent variable, whereas independent variables are those that have a sample selection bias, as shown in Table 1.

2. Second, subjects in the treatment and control groups are matched based on propensity score. There are several ways of matching—caliper matching is generally considered better than others, such as nearest-neighbor matching, because it can exclude “bad” matches.⁶ This article uses caliper matching, in which a value for the maximum distance of predefined propensity scores is fixed at 0.0001, which the PSM literature describes as sufficiently small. The suitability of the matching can be examined by a balancing test, in which the explanatory variables listed above in the treatment and control groups are compared by a t test—when a treatment does not meet its best-matched control, re-sampling by the bootstrapping method with 1,000 replications is conducted. If there is no statistically significant difference, the matching is concluded.

3. Finally, the effect of telecare on outcome variables, which in this article are medical expenditures and number of days required for treatment, is examined based on matched samples by a t test (standard error estimation).

Results

Summary statistics for outcome variables—medical expenditures and days of treatment—are summarized in Table 2.

Table 2.

Summary Statistics for Outcome Variables

VARIABLE	OBSERVATION	MEAN	SD	MIN	MAX
Medical expenditures of all samples
All diseases	2,040	16,997.22	25,083.83	0	469,632
Chronic diseases	2,040	6,836.33	10,266.33	0	76,573
Days for treatment of all samples (chronic diseases)	2,040	6.09	8.37	0	85
Medical expenditures of users
All diseases	995	19,269.49	29,611.96	0	469,632
Chronic diseases	995	6,439.34	9,315.61	0	59,942
Days for treatment of users (chronic diseases)	995	5.79	7.60	0	54
Medical expenditures of nonusers
All diseases	1,045	14,833.68	19,605.78	0	323,317
Chronic diseases	1,045	7,214.32	11,087.41	0	76,573
Days for treatment of nonusers (chronic diseases)	1,045	6.38	9.04	0	85

Max, maximum; Min, minimum; SD, standard deviation.

Bias Control

PSM thus calculates a propensity score by a probit model in which the dependent variable is the user dummy variable, whereas independent variables are selected based on whether they contain a selection bias. Whether matching based on the propensity score works is examined by a balancing test is shown in Table 3. The column labeled “% of bias” indicates the percentages of bias contained before and after matching for each variable. For example, “age” has 28.3% bias before matching, which is reduced to 4.4% after matching. Similarly, the column labeled “% of reduced bias” shows the percentage of bias actually reduced by PSM, or 81.9% for age. The reduction in sample selection bias is thus successful because no statistically significant variable remains after matching in terms of t values. In particular, biases related to subjective belief in the value of telecare on health status shown by “Effects 1–4” are also substantially reduced.

Table 3.

Result of Balancing Test

VARIABLE	TREATMENT	CONTROL	% OF BIAS (BEFORE→AFTER)	% OF REDUCED BIAS	T VALUE
Chronic disease	0.466	0.486	15.6→−4.1	73.8	−0.81
Age	71.629	72.124	28.3→4.4	81.9	−1.13
Number of family members	2.945	2.860	26.3→4.5	84.4	0.87
Income	2.961	2.968	6.5→−2	97.5	−0.06
Heart diseases	0.144	0.131	−9.3→1.8	83.0	0.79
High blood pressure	0.469	0.466	54.2→−2.1	96.6	0.14
Stroke	0.059	0.063	30.9→−1.9	69.5	−0.37
Ophthalmic diseases	0.297	0.275	15.5→−0.1	74.8	0.97
Anal diseases	0.005	0.003	15.6→−4.1	80.7	0.55
Effect 1: reduced anxiety in day-to-day life	2.443	2.448	28.3→4.4	99.3	−0.08
Effect 2: stabilization of illness	2.548	2.573	26.3→4.5	96.2	−0.50
Effect 3: enhancement of health consciousness	2.650	2.635	6.5→−2	98.1	0.35
Effect 4: decrease in medical expenditures	1.842	1.866	−9.3→1.8	93.8	−0.41
Year 2002	0.135	0.136	54.2→−2.1	99.2	−0.05
Year 2005	0.238	0.238	30.9→−1.9	99.3	−0.02
Year 2006	0.245	0.254	15.5→−0.1	88.7	−0.39

Effect of Telecare on Medical Expenditures and Days of Treatment

Finally, Table 4 summarizes the estimation results of PSM on the effect of telecare use on outpatient expenditures and number of outpatient days of treatment for chronic diseases, the main targets for Nishi-aizu's telecare system in terms of reducing chronic diseases. The rows labeled “before” and “after” indicate estimations before and after matching. Four methods of PSM matching are examined: single nearest-neighbor matching, Epanechnikov kernel matching, biweight kernel matching, and caliper (0.0001) matching.

Table 4.

Result of Estimation Based on Propensity Score Matching

OUTCOME VARIABLE, MATCHING	TREATMENT	CONTROL	DIFFERENCE	SE	T VALUE
Medical expenditure for all diseases
Before	19,448.5	15,376.3	4,072.21	1,131.26	3.60^c
After caliper (0.0001)	16,417.1	21,692.9	−5,275.81	4,862.42	−1.09
After single nearest-neighbor	19,448.5	21,486.4	−2,037.90	2,922.18	−0.70
After Epanechnikov kernel	19,448.5	21,898.5	−2,449.94	2,024.69	−1.21
After biweight kernel	19,448.5	21,930.7	−2,482.13	2,113.90	−1.17
Medical expenditure for chronic diseases
Before	6,888.4	6,801.9	86.58	464.47	0.19
After caliper (0.0001)	5,410.5	9,404.1	−3,993.63	1,781.31	−2.24^b
After single nearest-neighbor	6,888.4	10,097.0	−3,208.59	830.94	−3.86^c
After Epanechnikov kernel	6,888.4	9,442.3	−2,553.82	582.83	−4.38^c
After biweight kernel	6,888.4	9,443.6	−2,555.15	585.91	−4.36^c
Days of treatment for chronic diseases
Before	6.03	6.13	−0.10	0.38	−0.26
After caliper (0.0001)	4.80	8.78	−3.97	1.55	−2.56^b
After single nearest-neighbor	6.03	10.12	−4.09	0.72	−5.67^c
After Epanechnikov kernel	6.03	8.63	−2.60	0.48	−5.47^c
After biweight kernel	6.03	8.64	−2.61	0.48	−5.45^c

Standard errors (SEs) of caliper matching are based on the bootstrapping of 1,000 replications. Medical expenditure was reduced after matching, as indicated in the column “Difference,” and this is measured by “points” of the National Health Insurance system. One point is equivalent to 10 Japanese yen (USD 0.13).

Significance level of 5% and 10%, respectively.

Table 4 shows that both medical expenditures and days of treatment for chronic diseases did not significantly differ between the two groups before matching, whereas after matching all four matching methods showed a significantly negative difference (p<0.05), implying that telecare has an effect on medical expenditures and days of treatment for chronic diseases. The column labeled “difference” indicates the decrease in the amount of expenditure and number of treatment days. Caliper matching provided the greatest effect, namely, cost in Japanese yen (JPY) 39,936 (USD 499.20) and treatment for 3.97 days per year per user, whereas Epanechnikov kernel matching produced the smallest, at JPY 25,538 (USD 319.23) and 2.60 days, respectively. Differences among the four methods are not large, however, and Table 4 shows that the results obtained are accordingly robust.

Discussion and Conclusions

Table 4 thus demonstrates that telecare does not contribute to a reduction in medical expenditures for all diseases, but only for chronic diseases,¹⁰ because user expenditures for chronic diseases are larger than those of nonusers before matching but not significantly different after matching.

Table 5 compares the effects obtained by the other estimation methods, such as simple ordinary least squares¹⁰ and System GMM,⁷ used in our previous articles. The effects of telecare are underestimated when sample selection biases are not controlled for.

Table 5.

Comparison of Results Using Alternative Estimation Methods

	OLS	SYSTEM GMM	PSM
Medical expenditures for chronic diseases	JPY 15,302 (USD 191.28)	—	JPY 25,538-39,936(USD 319.23–499.20)
Days of treatment for chronic diseases	1.6 days	2.0 days	2.6–4.0 days

Ordinary least squares (OLS) was from Akematsu and Tsuji,¹⁰ System GMM was from Minetaki et al.,⁷ and propensity score matching was from the present study.

JPY, Japanese yen.

This article proposes a new estimation method for solving sample selection bias, a problem inherent in survey data. When bias is not eliminated, the estimation obtained deviates from the true value. These findings indicate the value of PSM in evidence-based research and the rigorous scientific methodologies required to conduct it.

Although PSM offers major benefits in the evaluation of telecare projects, it has its own limitations. First, it requires a large number of samples, and several previous studies have in fact used samples in the several tens of thousands range. Second, its results are not always robust, which is why our present article examines four matching methods. These limitations have been described,^8,9 but one limitation specific to telecare has not. In this article, PSM successfully demonstrated that the user group had lower medical expenditures than the nonuser group under the condition that all subjects were closely similar except in their use of telecare. Our previous study¹⁰ concluded that these results were due to the difference in health consciousness between the groups. By checking heath data transmitted by the telecare system, users became more concerned with health and had an incentive to change their behavior to be more health-promoting. These findings are not consistent with those of the present analysis, however, which found different expenditures despite a closely similar degree of health consciousness, which could only be due to telecare use. PSM thus provides little explanation of why and how telecare leads to these results, and identification of these mechanisms requires the use of other empirical methods together with PSM.

Footnotes

Disclosure Statement

No competing financial interests exist.

References

Sohn

, Helms

, Pelleter

et al. Costs and benefits of personalized health care for patients with chronic heart failure in the care and education program. Telemed J E Health, 2012; 18:198–204.

Ayanian

, Landrum

, Guadagnoli

, Gaccione

. Specialty of ambulatory care physicians and mortality among elderly patients after myocardial infarction. N Engl J Med, 2002; 347:1678–1686.

Gum

, Thamilarasan

, Watanabe

, Blackstone

, Lauer

. Aspirin use and all-cause mortality among patients being evaluated for known or suspected coronary artery disease: A propensity analysis. JAMA, 2001; 286:1187–1194.

Wang

, Schneeweisse

, Avorn

, Fischer

, Mogun

, Solomon

, Brookhart

. Risk of death in elderly users of conventional vs atypical antipsychotic medications. N Engl J Med, 2005; 353:2335–2341.

Chumbler

, Garel

, Qin

et al. Health services utilization of a care coordination/home-telehealth program for veterans with diabetes: A matched-cohort study. J Ambulatory Care Manage, 2005; 28:230–240.

Dehejia

, Wahba

. Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat, 2002; 84:151–161.

Minetaki

, Akematsu

, Tsuji

. Effect of e-health on the medical expenditures of outpatients with lifestyle-related diseases. Telemed J E Health, 2011; 17:591–595.

Rosenbaum

, Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika, 1983; 70:41–55.

Heckman

, Ichimura

, Todd

. Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Rev Econ Stud, 1997; 64:605–654.

10.

Akematsu

, Tsuji

. An empirical analysis of the reduction in medical expenditures by e-health users. J Telemed Telecare, 2009; 15:109–111.