A systematic review of studies concerning observer agreement during medical specialist diagnosis using videoconferencing

Abstract

We conducted a systematic review of studies of observer agreement for medical specialist diagnosis via videoconferencing. The review was based on searches of electronic databases and a hand search of relevant journals and reference lists between 1966 and June 2010. There were 20 studies comparing videoconferencing diagnosis with a non-telemedicine alternative by reporting a measure of agreement. Half of the studies were in the field of dermatology; these studies provided solid support for the reliability of videoconferencing. The other 10 studies were in psychiatry, geriatrics, minor injuries, neurology and rheumatology. Reliability of diagnosis via videoconferencing was confirmed in all studies. In the studies where physical examination was an element of the diagnostic process, results were reliable but authors recommended greater care during the diagnostic process (e.g. good equipment, onsite support, additional camera angles). Four studies incorporated a second group to measure agreement in paired face-to-face assessments. Although useful evidence for the reliability of diagnosis via videoconferencing was provided by the studies in the review, the range of medical specialties was small. The variation in research methodology and statistical analysis suggests a lack of clarity about which research design is appropriate for measuring observer agreement in relation to diagnostic reliability.

Introduction

Medical specialists provide expert diagnosis or advice regarding complex or challenging health matters. The most common method of providing advice is by face-to-face (FTF) appointment between specialist and patient. Specialist advice is sought for a range of problems and the depth of information required by the specialist during the decision-making processs can differ depending on the matter being addressed. Not all patients require a full FTF consultation.

Specialists are limited in number which means that either the patient or the doctor may have to travel long distances to enable consultations to take place. Videoconferencing may allow a more timely and convenient response. However, videoconferencing is not appropriate in all situations, particularly when expert physical examination is required. The aim of the present paper was to provide a summary of comparative studies of medical specialist diagnostic agreement using videoconferencing.

Methods

An electronic search was carried out of the MEDLINE, CINAHL and PubMed databases using the Medical Subject Headings (MeSH) terms listed in Table 1. The search was completed in June 2010. A hand search of the table of contents of the Journal of Telemedicine and Telecare and Telemedicine and E-health was also carried out to identify relevant papers. Reference lists of telemedicine reviews were hand searched for relevant papers.^1–10 Papers were excluded if the sample size was less than 20, based on the protocol used in the Cochrane review of telemedicine.¹¹ The inclusion/exclusion criteria are listed in Table 2. An attempt was made to contact the authors if additional information was required.

Table 1

Search terms for MEDLINE search strategy

Step in search	Search term
1	MeSH terms: Videoconferencing; Teleconsultation; Remote consultation. General terms: Remote consultation; Teleconsult; Tele-consult; videoconsult; video consult; VC; remote assessment; teleass
2	MeSH terms: Health, Diagnosis; Diagnosis, Differential; Rural Health; Rural Health Services General term: Assessment
3	Limit to English, abstract available, peer reviewed
4	1 and 2

Table 2

Inclusion and exclusion criteria

Inclusion criteria
Studies with-
a videoconference between a health professional and a patient for the purpose of diagnosis
an assessment interview with the patient which included an unstructured assessment component
usual clinical practice which involved the patient seeing the health professional FTF
a comparison of diagnostic agreement between FTF and VC assessment with relevant reporting of statistical data
sample size equal to or greater than 20 (for each study group)
Exclusion criteria
Studies which-
evaluated the technical specifications of telemedicine technologies (such as bandwidth)
evaluated educational or administration applications
evaluated the economic impact or patient satisfaction
evaluatied diagnostic agreement where the patient was not present to interact with the specialist, i.e. transmission of images or pathology results
evaluated monitoring devices or patient management, in which case a diagnostic assessment had already occurred
evaluated agreement of the administration of standardised assessment tools
evaluated a FTF assessment with an added VC element, but the VC diagnosis was not carried out independently, i.e. FTF assessment information by a junior specialist was given to a senior consultant via VC for verification, but no other assessments were carried out
evaluated telemedicine technologies other than VC equipment, i.e. telephone, videophone, fax

Results

A total of 1707 papers was identified from the computerized literature search. Initial screening of these articles reduced the total to 23. An additional nine papers were identified from the hand search. The full-text of 32 papers was read. Nine papers were excluded, six because the sample size was less than 20. In all, there were 22 relevant papers (Figure 1). Papers that discussed the same study were grouped together and counted as one study, which brought the final total to 20 studies. A summary of the levels of agreement for each study is provided in Table 3.

Figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of study election process³⁵

Table 3

Studies examining observer agreement of diagnostic agreement via VC

Author	Year	Objective	No of patients			Specialty	Variation in assessment	Kappa	Overall percentage agreement (%)
			VC/FTF	FTF/FTF	Var
Warren¹²	1995	Evaluate the usefulness of current telecommunications technology in the diagnosis of cutaneous diseases	22			Dermatology	Different Same		96
Oakley¹³	1997	Determine the accuracy of a VC system (telemedicine) in the diagnosis of dermatological disorders	104			Dermatology	Same (n = 82) Different (n = 21)	Same 0.91 Different 0.62	75
Phillips¹⁴	1997	Measure the degree of concordance between a dermatologist seeing a patient in a clinic and another dermatologist seeing the same patient using a commercially available VC unit	60			Dermatology	Different		77
Phillips¹⁵	1998	Determine the reliability of VC technology in evaluating skin tumours, the impact of the technology on the clinician's degree of suspicion that a skin tumour is malignant, and the recommendation to do a biopsy	51			Dermatology	Different	0.32	59
Lesher¹⁶	1998	Determine the percentage of encounters in which two different dermatologists, one using telemedicine and one on-site, could independently arrive at the same primary diagnosis	60	36		Dermatology	Different		VC 78 FTF 94
Lowitt¹⁷	1998	Compare physician and patient impressions and interphysician diagnostic agreement between live teledermatology and in-person examinations	102	29		Dermatology	Different		80
Loane¹⁸	1998	Determine diagnostic accuracy and management recommendations of real-time teledermatology consultations using low-cost telemedicine equipment	351			Dermatology	Same (n = 125) Different (n = 226)		Same 71 Different 60
Gilmour¹⁹	1998	Assess the diagnostic reliability and accuracy of consultations with a dermatologist over a VC link compared with (conventional) FTF interviews	126			Dermatology	Same (n = 65) Different (n = 61)		Same 62Different 57
Nordal²⁰	2001	Evaluate teledermatology in a comparative study of VC versus FTF consultations	112			Dermatology	Different		72
Baba²¹	2005	Compare the diagnostic accuracy of conventional asynchronous teledermatology (SAF) with a combined technique, in which SAF methodology was used first, followed by a VC using low-cost web cameras			228	Dermatology	Different Same	SAF 0.71SAF-VC 0.79	8290
Loh^26,27	2005/ 2007	Develop a telemedicine protocol for diagnosis of Alzheimer's Disease	20			Geriatrics	Different	VC/FTF 0.8	90²⁷
Martin-Khan²⁸	2007	Investigate the level of agreement between specialists conducting a cognitive assessment via videoconference compared with a FTF assessment.	20	22		Geriatrics	Different	VC/FTF 0.63FTF/FTF 0.53	VC/FTF 65FTF/FTF 64
Tachakra²⁹	2000	Compare the accuracy of teleconsultations for minor injuries with FTF consultations	200			Minor Injuries	DifferentSame		9999
Benger³⁰	2004	Determine the safety of minor injuries telemedicine compared with on-site specialist care, current practice, and a robust gold standard, and assess the clinical effectiveness of this new technique	579165	1200		Minor Injuries	DifferentGP; Onsite specialist; Telemedicine		VC/FTF 97.8FTF/FTF 99
Craig³¹	2000	Evaluate the feasibility of interactive VC as a means by which neurologists might assess patients admitted with neurological symptoms to hospitals distant from a neurological centre	22			Neurology	Different		59
Craig³²	2000	Examine whether it is feasible and safe for neurologists using telemedicine to assess neurological outpatients at a distance	25			Neurology	Different		96
Baigent²²	1997	Compare interviews by psychiatrists in FTF settings with those performed by VC using a split interviewer/observer configuration	20	22	21	Psychiatry	Different	VC/FTF 0.69FTF/FTF 0.85VC/VC 0.70
Singh²³	2007	Compare the accuracy of findings from telepsychiatry with those from FTF interview	37			Psychiatry	Different	VC/FTF 0.86
Elford²⁵	2000	Establish a telepsychiatry system for children and conduct an assessment of its utility	23			Psychiatry – child	Different		96
Leggett^33,34	2001	Examine the diagnostic accuracy and acceptability of telemedicine in the field of rheumatology			100	Rheumatology	Same	TEL/FTF0.62TEL/VC/FTF 0.96	TEL/FTF 71TEL/VC/FTF 97

The observer agreement studies in the review included dual assessment by two doctors. These doctors were often specialists, and in limited supply, which makes this type of study difficult to conduct. For this reason many studies were carried out in large teaching hospitals, far removed from the actual population that would normally be serviced by telemedicine. Recognition in the summary below is given to those studies which were able to draw on a rural population. A ‘telemedicine population‘ is defined as a sample of patients who would, if a telemedicine service was available, be the usual recipients of such a service because of geographical or other limitations in obtaining clinic-based care. There is now greater recognition of the role of urban telemedicine to support people who are isolated in well populated areas, such as frail older residents in aged care facilities. These people might be described as an urban telemedicine population.

Studies in dermatology

Warren et al. conducted a teledermatology study for patients referred by their general practitioner (GP).¹² The study sample was drawn from a remote (telemedicine) population. All patients were seen via videoconferencing (VC) for their first assessment, followed by a FTF consultation (the time between assessments was 1–3 days). Often the GP was present for the assessment. Two dermatologists shared the VC assessments, but only one travelled to the site to see the patients FTF. They decided that in some instances a physical examination would have been helpful, but that the differential diagnosis was similar despite lack of palpation.

The study by Oakley and colleagues in a dermatology clinic required the treatment group to undergo two assessments, first via VC then FTF.¹³ The sample included the usual patient referrals for a dermatology clinic. Usually both assessments were by the same dermatologist (79% of patients). The results indicated that agreement between two different dermatologists was significantly lower than when the same dermatologist saw the patients for both assessments. They found that the clinician's level of confidence in the diagnosis was a good indicator of actual agreement. Diagnosis was easier when historical factors played a key role in understanding the disorder. Specialists were least confident when diagnosing pigmented lesions because of the implications for the patient if a melanoma was misdiagnosed. The authors noted that additional equipment, not available for the study, would have improved the visual quality of physical examination. They concluded that telemedicine was adequate for the majority of consultations, although a proportion of patients would always require FTF assessment for diagnosis.

Phillips and colleagues reported two teledermatology studies. The first involved 60 new referrals to a dermatology clinic.¹⁴ Two dermatologists saw each patient, one via VC and the other FTF. There was non-random allocation to either VC or FTF assessment as the first assessment. There was a similar percentage agreement in this study as in the Oakley study. Phillips et al saw teledermatology as a feasible option, particularly with improvements in camera quality.

A second study by Phillips et al. specifically addressed the question of screening for skin tumours and the identification of malignancy. The study enrolled 51 patients for a FTF assessment followed by a VC assessment.¹⁵ Two dermatologists were involved and they shared the VC and FTF assessments equally (divided by group session). The population was a telemedicine population. There was less diagnostic agreement than expected, but the level of concern about a lesion was similar between both assessors. The VC specialist was more likely to be unsure about a lesion and order a biopsy. The study sample was not large enough to include sufficient malignant lesions to assess differences in recognition or ordering of biopsies. The authors felt that teledermatology was suitable for identifying suspicious skin lesions.

Two studies introduced a control group (FTF/FTF) as a baseline indication of clinician agreement in usual clinical practice. Lesher and colleagues assessed diagnostic agreement in teledermatology in a telemedicine population.¹⁶ Sixty patients were randomly recruited at the health centre where the teledermatology clinic operated. One dermatologist saw the first 30 patients via VC, while the other did the FTF assessments. For the second group of 30, they reversed the allocation of assessments. The interviews were unstructured and no patient history was provided to the specialist prior to the VC or FTF assessment. The control group patients (n = 36) were assessed individually FTF by the two teledermatology specialists and an independent dermatologist. Agreement via VC was significantly lower than for the control group. There was only one case of complete disagreement and this occurred in the VC group. The majority of non-agreement cases were defined as partial agreement. The authors indicated that improved assistance at the remote site for physical examination might increase the levels of agreement.

In a study conducted by Lowitt and colleagues, 102 patients underwent assessment via VC followed by an in-person assessment by two separate dermatologists.¹⁷ Agreement in dual FTF assessments was also examined (n = 29). As in previous studies, the need to touch the skin as a part of diagnosis in some instances was identified as interfering with agreement of assessment during VC. After diagnosis doctors were asked to rate their level of confidence in the accuracy of the diagnosis. Agreement on diagnosis was higher in the VC/FTF group when specialists had a high level of confidence in the accuracy of their own diagnosis. In most cases where disagreement existed, the VC specialist had identified less confidence in the diagnosis. This outcome supports the findings of the study by Oakley et al. ¹³

In three studies the patient participated in a VC assessment in the presence of their GP who assisted with physical examination, provided additional information or addressed their own questions to the specialist. Diagnostic agreement and management plans were the focus of this study by Loane and colleagues.¹⁸ Patients were referred by their GP. Consenting patients then returned to their health centre and, accompanied by their GP, were seen via VC by a dermatologist. On the same day the patient attended the nearby outpatient clinic and was seen FTF without the GP present. A total of 351 patients were enrolled in the study, of whom 125 were seen by two different dermatologists, one via VC and the other, FTF. The level of agreement on diagnosis was similar to previous studies. This was the first study to consider the effect of diagnostic agreement on management of a skin lesion. The agreement on management of a skin lesion increased if there was agreement on the diagnosis, and was significantly affected by whether both assessments were carried out by the same dermatologist.

In the study by Gilmour and colleagues (1 colleague being Loane from two previous studies^13,18), the age range of the patients was three months to 83 years.¹⁹ Patients and their GP were seen by the specialist via VC first, followed by a FTF assessment. The study involved two sites and five dermatologists. At one site, the patient was seen by the same dermatologist for both assessments on the same day (n = 65). At the second site, two different specialists saw each patient (n = 61). The issue of missing a life-threatening skin lesion was raised again, and the significance of picture quality was provided as a solution. This study confirmed that specialists using VC were more cautious in making a definitive diagnosis. The authors concluded that effective use of VC includes being able to recognise its limitations.

The study conducted by Nordal et al. evaluated teledermatology in a comparative study of VC and FTF consultations.²⁰ Each patient received two assessments, by two different specialists with equivalent experience. Generally, the first assessment was via VC with the GP present, and the second assessment was FTF. The study was carried out in a telemedicine population with the dermatologist flying to the local site. Twenty percent of the cases were identified as unsuitable for teledermatology. These were unusual cases or those requiring skin palpation or specialist equipment for diagnosis. These findings supported a previous study by Lowitt et al. ¹⁷

The final study in dermatology involved the consecutive enrolment of 228 patients at a dermatology outpatient clinic for three phases of assessment by two specialists.²¹ Both specialists reviewed digital photographs and clinical information using a store-and-forward methodology and recorded their diagnosis for each patient. This was followed by each doctor separately interviewing each patient using VC. They recorded their diagnosis for a second time. In the last step, one of the specialists interviewed each patient FTF and recorded a final diagnosis. Agreement between the two specialists improved when a VC component was added to the store-and-forward data.

Studies in mental health

The five studies in mental health were from the fields of psychiatry (n = 3) and geriatrics (n = 2).

Psychiatry

The study by Baigent et al. utilised three interview settings: interviewer and observer in the same room (n = 22); interviewer via VC with observer in the same room (n = 20); and both interviewer and observer via VC (n = 21).²² Two psychiatrists used a semi-structured interview which followed the format of a standard psychiatric history. The authors concluded that there were some measurable differences in VC, but they were not sufficient to cause errors in interpretation.

Consecutive new referrals to a community mental health service in New Zealand were enrolled in a study to examine diagnostic agreement, patient risk, drug and non-drug interventions.²³ Two psychiatrists interviewed each of the patients (n = 37) either via VC or FTF, in random order. Diagnoses were made using the criteria from the Diagnostic and Statistical Manual of Mental Disorders.²⁴ The authors concluded that telepsychiatry was a dependable method of assessment for new routine outpatient psychiatric referrals.

Child psychiatry was the focus in the study by Elford et al. ²⁵ Patients with their parents participated in a FTF assessment and a VC assessment. The patients were divided into two groups. The first group had VC assessment then FTF assessment, and order of assessment was reversed for the second group. Each patient was seen by two different psychiatrists, who both concluded that videoconferencing did not interfere with diagnosis.

Geriatrics

Loh et al. assessed patients FTF using one of eight physicians and via VC using one of two physicians.^26,27 The patient group was a telemedicine population. The order of assessment was by alternate allocation. Each assessment involved administering a series of standardised assessments, reviewing laboratory and imaging results and conducting an unstructured interview with the patient. The patients were carefully selected as being sufficiently mobile to travel to the telemedicine site, remain for the duration of the assessment and to have no hearing or vision impairment. There was a high level of agreement in the diagnosis of Alzheimer's Disease via VC.

A study by Martin-Khan and colleagues extended the work by Loh et al. and focused on diagnostic agreement for cognitive assessment in a unselected patient population with complex cognitive impairment issues.²⁸ Each patient received both a VC and a FTF assessment by alternate allocation, using two different specialists. A second set of patients were seen FTF by two different specialists, providing a baseline indication of clinician agreement in standard practice. Each specialist had access to the results of a series of standardized assessments prepared beforehand by the clinic nurse, as well as laboratory and imaging results. Diagnostic agreement was in a similar range to other studies, but significantly lower than the study by Loh et al. ^26,27 This may be the consequence of more complex cases or less stringent assessment protocols.

Studies in minor injuries

Tachakra and colleagues combined clinical examination and the use of radiology to evaluate diagnostic agreement using 200 patients in a hospital accident and emergency department.²⁹ Patients were seen via VC with an emergency room nurse relaying the images from the patient's room. The patient was then seen by the same specialist FTF, and a second specialist who only saw the patient FTF. Key aspects of the clinical examination (such as colour change, instability, swelling and decreased movement) could be seen well enough to allow VC diagnosis. Processes to identify the presence of increased tenderness improved levels of agreement.

Current practice for the assessment of minor injuries at a peripheral hospital in the UK, at the time of publication, was a review by a GP. Benger et al. compared the diagnostic agreement of minor injuries in 600 patients using three different scenarios: a telemedicine emergency medicine specialist, an onsite emergency medicine specialist and an onsite GP.³⁰ Radiographs were available for all assessors, and additional requests could be made as required. Discrepancies in diagnosis were reviewed by an independent panel of 10 specialists who were blinded to the format of the assessments. The authors found that the safety of minor injuries' telemedicine was similar to conventional practice.

Studies in neurology

Craig et al. described an interactive VC to assess patients admitted to a hospital with neurological symptoms.³¹ A junior physician (1-3 years experience) made a diagnosis FTF which was followed by a VC with a neurologist. All patients were seen FTF by a consultant neurologist within four weeks of the VC assessment. This study was conducted in a telemedicine population, with the consultant neurologist travelling to the hospital site for a neurology clinic and ward round. The neurologist was not blinded to the VC outcome (it would have been unethical to withhold the diagnosis or treatment plan for the duration required for a blinded study given this methodology). The authors concluded that a specialist neurological assessment via VC could provide reliable support to a junior physician working in a remote hospital.

A second study by Craig and colleagues, reported in the same year, involved consecutive enrolment of 25 unselected patients who were referred to a neurological outpatient clinic by their GP.³² All patients were seen first by one specialist via VC and then by a different specialist, FTF. A junior doctor was present with the patient during the VC and provided support to the neurologist by carrying out a guided neurological examination and summarising the patient's details and referral letter. The methodology of this study, where the two specialists were blinded to each other's diagnosis, provided confirmation of the findings from the previous Craig study.³¹.

Studies in rheumatology

A non-randomised prospective study by Leggett et al. assessed diagnostic agreement of telephone conferencing and VC.^33,34 A GP took the history of each patient in the study, followed by a three-way telephone conference between the GP, patient and the rheumatologist. This was followed by a VC between the three participants. Finally the same specialist met the patient for a FTF interview. It was observed that VC was highly reliable for diagnosis.

Discussion

The studies identified in the present review provide reasonable evidence of the reliability of VC for diagnosis. Although there were only two studies of teleneurology, these studies form part of a larger body of work in this field which did not meet the strict inclusion criteria for the review. More robust evidence was confined to a few medical specialities where studies have been carried out with suitably powered samples and in a range of settings. Studies in the field of teledermatology accounted for half of those in the review. In many cases evidence was provided by only a few preliminary studies (e.g. geriatrics, psychiatry, rheumatology).

A ‘telemedicine population’ is a study sample derived from locations where telemedicine would form a potentially useful part of the local health service. These might be remote hospitals, rural health services or high care residential facilities where residents are unable to travel even short distances. Many telemedicine studies, particularly reliability studies where agreement between two doctors is required, are carried out in large metropolitan hospitals where the additional staff are readily available for research projects. Reliability studies in telemedicine populations are challenging because of the extra cost of travel shared between two clinicians at the host and remote sites. Several studies in the present review used patients from a telemedicine population, which is commendable in view of the challenges.^{12,15,20,26,27,31} Reliability studies are only one element of a suite of methodologies (economic analysis, cost analysis, satisfaction studies) which provide evidence of the benefit of VC. While not essential for reliability studies, the use of a telemedicine population is important for studies such as feasibility, economic or satisfaction studies because ‘real world settings’ have an impact on implementation.

Assessment of new technologies is often challenging. Preliminary studies which aim to test feasibility or to gather initial data to calculate sample sizes or refine research protocols for larger studies may be restricted as a result of limited funding. These preliminary studies are important because the results provide data for refining the protocols of larger, more expensive and comprehensive studies. Costly mistakes are avoided through this process. While the value of these studies is recognised, their results for the reliability of VC should be interpreted cautiously. This is because pilot studies with careful patient selection and small sample sizes may artificially inflate levels of agreement. Studies of less than 20 participants were excluded from the present review.

The predominant study design was comparing agreement between a diagnosis reached following an assessment via VC and the diagnosis from a FTF assessment. This design assumes that the FTF assessment is correct and takes no account of variation between doctors. Bias based on the between-doctor variation in diagnostic technique is reduced by ensuring that the two doctors complete both FTF and VC assessments. Levels of agreement are generally compared with those achieved in other studies. For diseases where diagnosis is complex, such as borderline dementia diagnosis, agreement may be low because of the progressive nature of the disease rather than the use of VC. For this reason, incorporating a second sample of dual FTF assessments provides a better baseline comparison for estimating the extent to which reliability is affected by the VC element of the diagnostic process. In the present review, four studies used a second sample of paired FTF assessments.^16,17,22,30 Two studies used a baseline comparison sample of almost equal size to the VC sample,^22,30 which enabled statistical comparison of outcomes.

A study of reliability not only needs to report measures of agreement but also to report influences on agreement which may positively or negatively affect the outcome. High levels of variation in the clinic model, study design – including the clinical reference standard – and reporting methods were evident in the present review. The variation in statistical analysis of studies of observer agreement is summarised in Table 4. Such variation makes it more difficult to compare outcomes across studies.

Table 4

Statistical methods reported in the studies reviewed

	Number of studies	First author
Overall proportional agreement Overall percentage agreement	19	All studies
Cohen's Kappa Weighted Kappa	8	Oakley,¹³ Phillips,¹⁵ Baba,²¹ Loh,^26,27 Martin-Khan,²⁸ Baigent,²² Singh,²³ Leggett^33,34 Sensitivity/Specificity
Positive or Negative Predictive Values	3	Lowitt,¹⁷ Loh,²⁷ Tachakra²⁹
McNemar Test	3	Gilmour,¹⁹ Baba,²¹ Benger³⁰
Chi Square Test	2	Phillips,¹⁴ Loane¹⁸
Cochrane's Q Test	1	Benger³⁰
Spearman Rank ,break/Wilcoxon Test	1	Baigent²²
Risk DifferenceAccuracy Ratio	1	Singh²³

Conclusion

Communication via videoconference is becoming more commonplace. The consistent good to excellent diagnostic agreement across the specialities suggests that VC is likely to be a reliable tool to communicate with patients for the purpose of making a diagnosis, in situations where the diagnostic process lends itself to this format.

The majority of studies identified speciality-specific recommendations for improving the reliability of diagnosis when using VC. This reinforces the need, despite general confidence in the use of VC, for continued attention to individual diseases or specialities. Dermatology showed good to high levels of agreement consistently across the numerous studies in the review. This supports the general recommendation of reliability for teledermatology. While under-represented in the review, work in teleneurology is also quite advanced. Observer agreement in the remaining studies, across the specialities, was also good. The limited number of studies, or the small sample sizes, suggests that more work is still required before recommendations can be made for these additional specialities.

Footnotes

Acknowledgements

The study was funded by the National Health and Medical Research Council (NHMRC), grant no 456135. MM-K was funded by an NHMRC PhD scholarship. JW was supported by a US Department of Veterans Affairs grant (HSR&D IIR 05-278). The views expressed in this article are those of the authors and do not necessarily represent those of the US Department of Veterans Affairs.

References

Ohinmaa

, Hailey

, Roine

. The Assessment of Telemedicine: General Principles and a Systematic Review. International Network of Agencies for Health Technology Assessment Joint Project. Helsinki and Edmonton: STAKES and Alberta Heritage Foundation for Medical Research, 1999

Hersh

, Helfand

, Wallace

, Kraemer

, Patterson

, Shapiro

, Greenlick

. A systematic review of the efficacy of telemedicine for making diagnostic and management decisions. J Telemed Telecare 2002;8:197–209

Hersh

, Hickam

, Severance

, Dana

, Pyle Krages

, Helfand

. Diagnosis, access and outcomes: update of a systematic review of telemedicine services. J Telemed Telecare 2006;12 (Suppl. 2):3–31

Hersh

, Wallace

, Patterson

, Telemedicine for the Medicare Population: pediatric, obstetric, and clinician-indirect home interventions in telemedicine. Rockville: Agency for Healthcare Research and Quality, 2001

Hersh

, Wallace

, Patterson

, Telemedicine for the Medicare Population. Rockville: Agency for Healthcare Research and Quality, 2001

Hailey

, Ohinmaa

, Roine

. Study quality and evidence of benefit in recent assessments of telemedicine. J Telemed Telecare 2004;10:318–24

Hailey

, Ohinmaa

, Roine

. Recent Studies on Assessment of Telemedicine: systematic review of study quality and evidence of benefit. Edmonton, Alberta, Canada: Institute of Health Economics, 2003

Hailey

, Roine

, Ohinmaa

. Systematic review of evidence for the benefits of telemedicine. J Telemed Telecare 2002;8(Suppl. 1):1–30

Hailey

, Roine

, Ohinmaa

. Evidence of Benefit from Telemental Health Applications: a systematic review. Alberta, Canada: Institute of Health Economics, 2007

10.

Roine

, Ohinmaa

, Hailey

. Assessing telemedicine: a systematic review of the literature. CMAJ 2001;165:765–71

11.

Currell

, Urquhart

, Wainwright

, Lewis

. Telemedicine versus face to face patient care: effects on professional practice and health care outcomes. Nurs Times 2001;97:35

12.

Warren

, Lesher

, Hall

, Ward

, Sanders

, Tison

. Telemedicine. J Fam Pract 1995;41:17, 20

13.

Oakley

, Astwood

, Loane

, Duffill

, Rademaker

, Wootton

. Diagnostic accuracy of teledermatology: results of a preliminary study in New Zealand. N Z Med J 1997;110:51–3

14.

Phillips

, Burke

, Shechter

, Stone

, Balch

, Gustke

. Reliability of dermatology teleconsultations with the use of teleconferencing technology. J Am Acad Dermatol 1997;37:398–402

15.

Phillips

, Burke

, Allen

, Stone

, Wilson

. Reliability of telemedicine in evaluating skin tumors. Telemed J 1998;4:5–9

16.

Lesher

, Davis

, Gourdin

, English

, Thompson

. Telemedicine evaluation of cutaneous diseases: a blinded comparative study. J Am Acad Dermatol 1998;38:27–31

17.

Lowitt

, Kessler

, Kauffman

, Hooper

, Siegel

, Burnett

. Teledermatology and in-person examinations: a comparison of patient and physician perceptions and diagnostic agreement. Arch Dermatol 1998;134:471–6

18.

Loane

, Corbett

, Bloomer

, Diagnostic accuracy and clinical management by realtime teledermatology. Results from the Northern Ireland arms of the UK Multicentre Teledermatology Trial. J Telemed Telecare 1998;4:95–100

19.

Gilmour

, Campbell

, Loane

, Comparison of teleconsultations and face-to-face consultations: preliminary results of a United Kingdom multicentre teledermatology study. Br J Dermatol 1998;139:81–7

20.

Nordal

, Moseng

, Kvammen

, Løchen

. A comparative study of teleconsultations versus face-to-face consultations. J Telemed Telecare 2001;7:257–65

21.

Baba

, Seçkin

, Kapdağli

. A comparison of teledermatology using store-and-forward methodology alone, and in combination with Web camera videoconferencing. J Telemed Telecare 2005;11:354–60

22.

Baigent

, Lloyd

, Kavanagh

Telepsychiatry: 'tele' yes, but what about the 'psychiatry'?

J Telemed Telecare 1997;3(Suppl. 1):3–5

23.

Singh

, Arya

, Peters

. Accuracy of telepsychiatric assessment of new routine outpatient referrals. BMC Psychiatry 2007;7:55

24.

American Psychiatric Association, American Psychiatric Association Task Force on DSM-IV (eds.) Diagnostic and Statistical Manual of Mental Disorders (DSM-IV). Washington, DC: American Psychiatric Association, 1994

25.

Elford

, White

, Bowering

, A randomized, controlled trial of child psychiatric assessments conducted using videoconferencing. J Telemed Telecare 2000;6:73–82

26.

Loh

, Maher

, Goldswain

, Flicker

, Ramesh

, Saligari

. Diagnostic accuracy of telehealth community dementia assessments. J Am Geriatr Soc 2005;53:2043–4

27.

Loh

, Donaldson

, Flicker

, Maher

, Goldswain

. Development of a telemedicine protocol for the diagnosis of Alzheimer's disease. J Telemed Telecare 2007;13:90–4

28.

Martin-Khan

, Varghese

, Wootton

, Gray

. Successes and failures in assessing cognitive function in older adults using video consultation. J Telemed Telecare 2007;13:60–2

29.

Tachakra

, Lynch

, Newson

, A comparison of telemedicine with face-to-face consultations for trauma management. J Telemed Telecare 2000;6(Suppl. 1):178–81

30.

Benger

, Noble

, Coast

, Kendall

. The safety and effectiveness of minor injuries telemedicine. Emerg Med J 2004;21:438–45

31.

Craig

, Patterson

, Russell

, Wootton

. Interactive videoconsultation is a feasible method for neurological in-patient assessment. Eur J Neurol 2000;7:699–702

32.

Craig

, Chua

, Wootton

, Patterson

. A pilot study of telemedicine for new neurological outpatient referrals. J Telemed Telecare 2000;6:225–8

33.

Graham

, Leggett

, Steele

Do all outpatients need a face-to-face consultation in rheumatology?

J Telemed Telecare 2000;6:198–9

34.

Leggett

, Graham

, Steele

, Telerheumatology – diagnostic accuracy and acceptability to patient, specialist, and general practitioner. Br J Gen Pract 2001;51:746–8

35.

Moher

, Schulz

, Altman

, The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Ann Intern Med 2001;134:657–62