Abstract
Introduction
Telepsychiatry is the application of video teleconferencing technology for psychiatric evaluation and therapy and has achieved a firm place in mental health delivery systems. With this technique, it is possible to provide care to areas and populations who otherwise have limited access to qualified providers. Although some authors have claimed that important information can be missed using telepsychiatry, 1 controlled studies have repeatedly found that long-distance examinations are equivalent to face-to-face interviews. Two early reviews summarized the studies published until 2002. 2,3 Since then, studies demonstrated the equivalence of psychiatric consultations, 4 structured clinical interviews, 5,6 and outcomes in depression 7 using telepsychiatry compared with face-to-face treatment. Further, Manguno-Mire et al. 8 recently expanded the investigated assessment instruments into the field of forensic psychiatry.
However, most of these assessments rely mainly on auditory input rather than visual observation, raising the question whether similar results could not also be obtained using plain old telephone service. Depression in primary care has been assessed successfully over the telephone, 9 and a recent meta-analysis of telephone-administered psychotherapy 10 attested its effectiveness. This important difference between questionnaire items rating a patient's self-report and items based on visual observation was previously described by Jones et al. 11 They found that observational items of the brief psychiatric rating scale had lower reliability whether assessed face-to-face or remotely. They expressed the opinion that visual observation of behavior is crucial in the clinical assessment of psychogeriatric conditions and recommended that future studies should emphasize the accuracy of telemedicine ratings that require visual observation. Similarly, Darkins 12 pointed out that telepsychiatry may be called upon to detect Parkinson's disease–like conditions in psychiatric patients. A study of nine patients with Parkinson's disease has indeed been conducted. 13 The authors concluded that valid motor assessments can be made via interactive video conferencing. Such examinations can obviously not be accomplished over the telephone.
We decided to use the Abnormal Involuntary Movement Scale (AIMS) as a test instrument requiring visual observation. It is a standard instrument to assess tardive dyskinesia (TD), an involuntary movement disorder most often associated with long-term administration of antipsychotic medications. More than three million patients in the United States received this class of medications in 2003. 14 However, not all dyskinesias are related to antipsychotic medications, and these medications can also cause other movement disorders that need to be distinguished from TD. Therefore, a positive score on the AIMS scale is not synonymous with antipsychotic-induced TD. This adverse drug reaction is the most serious of these movement disorders because it can persist after the discontinuation of the causative agent and can result in serious long-term disabilities. The Veterans Administration healthcare system has chosen regular AIMS testing as a quality measurement in behavioral health. The Veterans Healthcare Administration in the upstate New York area (VISN2) has performed more than 6,000 AIMS tests during last year. This study can establish if these tests can be performed reliably using telepsychiatry.
Materials and Methods
We recruited 50 patients who had been exposed to antipsychotic medications for at least 10 years. After a complete description of the nature of the study, written informed consent was obtained in a format approved by the Institutional Review Board of the Veterans Administration Medical Center of Western NY. The study group consisted of 47 male and 3 female veterans. Nine were African Americans and the remaining were caucasian. The age ranged from 40 to 72 years, and 36 carried a chart diagnosis of schizophrenia, 10 had schizoaffective disorder, and 4 were found to have a mood disorder.
We administered the AIMS 15 following the examination procedure described by Munetz and Beniamin. 16 It consisted of two parallel procedures. The examination procedure, in the form of a standardized script, gave the patient simple instructions such as, “sit in a chair with your hands on your knees.” One procedure (#9) asks the examiner to check for muscle rigidity in the arms, but it does not lead to an item on the score sheet. This item was omitted. The scoring procedure told the examiners how to rate what he or she observed, such as frowning and grimacing of the forehead, puckering of the lips, and movement of the tongue in and out of the mouth. Each researcher scored the AIMS form independently. The scoring sheet listed the movements in seven different body regions on a scale from 0 = “none,” 1 = “minimal, may be extreme normal,” 2 = “mild,” 3 = “moderate” to 4 = “severe.” A Global Severity Score (highest of ratings 1–7) as well as a Total Score (sum of scores 1–7) were determined and reported here.
Before the beginning of the study we established inter-rater reliability between the 5 participating researchers in a more informal process using discussion and consensus. Pearson r correlations between two raters ranged from 0.67 to 0.82. If one researcher had previous contact with the subject, he or she obtained the informed consent, but the AIMS form was not used for analysis.
Each study patient was examined by four raters simultaneously. Two were located in the room with the subject, whereas two were observing via audio-visual equipment from a nearby room. One of the remote examiners directed the interview and asked the subject to perform the AIMS tasks. The location of each rater was rotated between sessions. The video conferencing equipment consisted of two Tandberg 880 set-top units with 384 kilobits-per-second bandwidth connected on an Internet protocol network. The image was displayed on a conventional 27" TV screen. Movement and zoom of the remote camera was used to focus on a particular body region as needed.
Results
We used multilevel modeling (MLM) to assess the Intraclass Correlation Coefficient and to determine the influence of the specific raters (the five psychiatrists), the rater condition (in room versus remote), and the interaction of rater by rater condition on the global and total scores.
In essence, this approach assessed the extent to which the total variability in the ratings was due to differences among the different raters rating the same patient, differences between in room and remote raters rating the same patient, and differences from one patient to another. In addition, it allowed an assessment of whether some of the variability is due to specific raters systematically differing when in the in room versus remote condition.
In the MLM approach, the Intraclass Correlation Coefficient (ICC) indicated the “proportion of observed variance in ratings that is due to systematic between-target differences compared to the total variance in ratings” 17 (p. 822) and reflected both absolute and relative consensus of the raters. The ICC for the global score was 0.72, and the ICC for the total score was 0.75. These ICCs were similar to a literature report, 18 demonstrating that our raters were in general agreement in making AIMS assessments. This indicated that 72% of the variability in global scores and 75% of the variability in total scores were attributed to differences among the patients.
In addition to assessing the variability in the scores due to the patient, the multilevel model assessed whether significant variance was accounted for by the raters, that is, whether certain raters provided systematically higher or lower scores.* The rater effect was significant, Δχ 2 = 16.99; Δdf = 4; χ 2(4) = 16.99, p < 0.01. Further examination of the data revealed that this rater effect could be attributed to a significant difference between Rater 1 and Rater 2, z = −2.57, p = 0.010 (indicating that across the patients, Rater 2 had a lower global score than Rater 1). The condition effect, which assessed whether there were systematic differences in the in room versus remote raters was not significant, Δχ 2 = 0.80; Δdf = 1, p = non-significant (ns). The mean global rating for raters in the room was 1.35 (SD = 1.03), whereas the mean rating for remote raters was 1.37, (SD = 0.92). Finally, the addition of rater-by-condition interactions that assessed whether any specific raters differed in their in room versus remote rating was not significant, Δχ 2 = 2.77; Δdf = 4; p = ns.
The results with respect to the total score were essentially the same, with a significant effect of rater, but no significant effects for condition or for the condition-by-rater interaction.
Discussion
In contrast to other investigations, the presence of TD was not an inclusion requirement for this study. Our only inclusion criterion was that the patient had to be exposed to antipsychotic medications for 10 years or more. We made no attempt to quantify the amount of exposure for example by calculating the cumulative dosage of the medications or to distinguish between first- and second-generation antipsychotics. However, our inclusion criterion was based on the understanding that each year on an antipsychotic increases risk of TD by about 5%. 19,20 In this study we observed a lower rate. Most often our subjects had only a few minimally or mildly positive findings. In three patients, all raters agreed that there was not even one minimal finding. However, the criteria of Schooler and Kane 21 required the presence of at least moderate abnormal involuntary movements in one or more body areas or at least mild movements in two or more body areas. We applied these stringent criteria to our sample. Rater #1 found that 14 cases were positive. Seven out of these 14 were judged positive by all raters. Rater #1determined that 36 cases were negative and 32 of these were negative according to all raters. Second-generation antipsychotics were generally viewed as having less liability for TD, which was confirmed by a recent literature review. 22 We had high inter-rater reliability, dispersing any doubts about the consistency of the raters, despite the fact that our patients had few abnormal movements.
A previous study 23 demonstrated that the bandwidth of the video conferencing equipment is an important factor in the evaluation of visual clues in the course of the psychiatric interview. During an assessment of patients with schizophrenia at 128 kilobits per second, the Scale for the Assessment of Negative Symptoms was less reliably evaluated in the low-bandwidth condition as compared with the in-person interview. In addition, fewer patients seemed to like the low-bandwidth condition. The authors felt that movement artifact of the equipment may have been misinterpreted as motor slowing of the subject. Therefore, our results may be influenced by the transmission quality that we employed.
We conclude that most, if not all, aspects of routine psychopharmacological care can be provided via videoconferencing. It is not necessary to intersperse live interviews to complete the AIMS ratings. It may be possible to generalize these results to other assessments that use primarily visual inputs. This clearly distinguishes telepsychiatry from telephone interviews.
With the shortage of psychiatric care in rural areas and the availability of better, faster transmission technology, one can anticipate wider use of telepsychiatry services.
Footnotes
Acknowledgments
The authors thank Kenneth Leonard, Ph.D., Vice Chairman for Research, Department of Psychiatry, Paul F. Visintainer, Ph.D., Director of Epidemiology and Biostatistics Baystate Medical Center Springfield, MA, and Jaye Derrick, Ph.D., Research Institute on Addictions, for help with the data analysis and statistics.
Disclosure Statement
No competing financial interests exist.
*
In the context of MLM for inter-rater agreement, raters, condition, and the interaction are considered Level 1 variables; patients, because there are multiple raters and conditions for each patient, are considered a Level 2 variable. The analysis examined the fit of an intercept model in which the observed data are predicted from the overall mean of the all ratings, with the difference between the observed data and the predicted data expressed as a chi-square statistic. Raters were included as random dummy-coded variables with Rater 1 as the reference rater.
