Abstract
Purpose:
Strabismus is a common ocular condition requiring precise quantification of gaze deviation and qualification of strabismus category. Telemedicine refers to the use of technology to remotely diagnose and treat medical conditions. This narrative review aimed to assess the efficacy of a variety of telemedicine modalities for the assessment of strabismus. A secondary objective was to quantify overall accuracy, sensitivity, and specificity of automated methods using meta-analysis of available data.
Methods:
A literature search was conducted using the Ovid MEDLINE, Embase, and Cochrane Library data libraries. Keywords, including “strabismus,” “phoria,” “telemed*,” and “telehealth,” were used to locate relevant studies, with Medical Subject Headings terms, free text, and synonyms. No year restrictions were applied. Studies not in English were excluded. Risk of bias was assessed using the QUADAS-2 tool.
Results:
Thirty-four studies were included. All outcomes relating to accuracy and reliability of telemedicine versus a reference standard were extracted, as well as qualitative observations. High sensitivity, specificity, accuracy, and agreement were consistently shown across studies. Meta-analysis of two subsets featuring automated methods, for which relevant data were available, revealed a pooled accuracy of 0.877 (0.806–0.949), sensitivity of 0.856 (0.805–0.907), and specificity of 0.900 (0.845–0.954). Subcategories “remote standard assessment,” “digital image analysis,” “wearable devices,” “mobile health (mHealth),” and “artificial intelligence” were independently examined.
Conclusions:
The majority of systems achieved parity with standard physician assessment, with the added benefit of eliminating subjectivity. Meta-analysis results suggest potential introduction of remote automated assessment where conventional assessment is unavailable, although accuracy of current technologies remains limited compared to in-person examination. Telemedicine modalities described offer convenience for patients, shorter examination times, and the potential to go beyond in-person assessments. The evidence gathered in this review supports the beginning of telemedicine integration into the world of strabismus diagnosis.
Introduction
Major advances in telemedicine over recent decades have created a plethora of opportunities to enhance patient care. 1 Strabismus is an ocular condition arising from misalignment of the eyes. Standard assessment of strabismus involves the observation of the eyes in different positions of gaze, providing the potential for remote assessment and automated analysis of images/videos. 2 The term “telemedicine” was interpreted in its broadest sense as any use of technology removing the need for in-person examination. The purpose of this narrative review was to evaluate the scope for involvement of telemedicine to facilitate strabismus assessment, as well as quantify the accuracy of automated methods without human input.
Methods
A literature search was conducted using the medical databases Embase Ovid (Fig. 1), MEDLINE (Fig. 2), and the Cochrane Library (Fig. 3). The Cochrane Library was searched for related reviews, and the reference lists of said reviews were scanned for any studies that could help answer the research question. The keywords used included “strabismus,” “squint,” “phoria*,” “telemed*,” and “telehealth,” among synonyms. No filters or limits were applied. All abstracts picked up through the search were screened according to the predefined inclusion criteria:

Embase search.

Medline search.

Cochrane search.
Primary research—randomized controlled trials, nonrandomized controlled trials, and observational studies.
Quantify outcome measures relating to the validity of telemedicine, including reliability measures, sensitivity, and specificity.
Studies not in the English language were excluded. Reviews were excluded after checking their reference list for relevant articles. Following this, included studies were fully extracted onto a Microsoft Excel spreadsheet by one reviewer extracting platform, location, and agreement metrics.
Supplementing the primary literature search, a gray literature search was carried out with the Ovid Global Health database and the British Library e-thesis online service. EndNote was used throughout for organization of selected literature and deduplication. Risk of bias was judged with the QUADAS-2 tool. Meta-analysis of automated diagnosis was subsequently performed using statistical analysis software. Investigation was subdivided into five approaches: remote standard assessment, digital image analysis, wearable devices, mobile health (mHealth), and artificial intelligence.
Results
Thirty-four studies complied with the specified inclusion criteria and were fully extracted (Table 1). Twenty out of 34 studies had median age data available. Of these, the average median age was 20.82 years, and average number of participants was 79.15 (Table 2). A PRISMA flow diagram was generated (Fig. 4).
Full Results Table
APCT, alternate prism cover test; AUC, area under the curve; CCC, concordance correlation coefficient; ICC, intraclass correlation coefficient; LOA, limit of agreement; PCT, prism cover test; PD, prism diopters; ROC, receiver operating characteristics; VR, virtual reality.
Study Characteristics

PRISMA flow diagram.
REMOTE ASSESSMENT
Cheung et al. first described the potential for strabismus examination remotely in 2000. 3 Forty-two patients were examined both in-person and remotely, compared with 43 examined in-person only. Agreement on ocular muscle action was lower in the telemedicine study for all muscle groups. Odds of disagreement were consistently increased for all measurements (category, angle, ocular muscle movements) by two to three times.
Dawson iterated upon this initial integration of telemedicine into strabismus assessment. 4 A total of 30 patients were examined by different ophthalmologists in person and through telemedicine, who were presented by the same orthoptist. Qualitatively, ophthalmologists noted straightforward diagnoses of large limitations, for example, Duane’s syndrome. Conversely, latent squint was more difficult to diagnose.
Mobile applications facilitated convenient remote assessment as well. In 2017 Phanphruk et al. developed the StrabisPIX tool, a mobile application allowing processing of images in the nine cardinal positions of gaze independently taken by the patient. 5 The comparison was with images taken professionally by an orthoptist. Concurring with the findings of Cheung et al., significantly more acceptability of images for horizontal versions was found over vertical versions and head posture.
Ho et al. in 2021 introduced high-definition video smart glasses to simultaneously record a strabismus examination while performing one in person. 6 The recorded videos were then assessed later in a store-and-forward manner. Agreement was classified as excellent for both vertical deviations, bucking the trend of inferior vertical deviation agreement noted in earlier studies. Equivalent agreement was found between the gold-standard in-person examinations and the store-and-forward videos.
Li et al. expanded on this work by examining real-time video feeds obtained with video glasses. 7 In-person assessment was compared with store-and-forward review of recorded videos three years later. Strabismus category, degree manifest, angle measurement, and extraocular motility agreement were high.
Synchronous streaming was evaluated by Stewart et al., who analyzed the agreement between examinations streamed to ophthalmologists remotely and in-person re-examinations of the patients by the same doctor on the same day. 8 Of the families, 98.5% were comfortable with quality of telemedicine examination, and 97.1% agreed that they would be happy to participate in another similar study in the future. Changes in management plan and discrepancies in diagnosis were lower than a reasonable noninferiority threshold set at 1.5% and 15%, respectively.
DIGITAL IMAGE ANALYSIS
Almeida et al. formulated a multistage computational methodology for automatic strabismus detection first in 2012, using digital automation of the Hirschberg test. 9 Cover test was performed in all patients, with division into strabismic and nonstrabismic control groups. Reference alternate prism cover test (APCT) was applied for strabismic patients.
Yang et al. then validated the effectiveness of a novel 3D Strabismus Photo Analyzer to estimate ocular alignment through images. 10 The analyzer tool calculated the angle k (between corneal light reflex and pupil center) from a primary 2D image captured. The tool requires minimal operator input and does not rely on constant Hirschberg ratio due to its 3D, rather than 2D, nature. Adjustment for age and angle-k ophthalmical biometry was performed.
This study was followed up by Yang et al. again, assessing the efficacy of a selective wavelength filter with an infrared camera followed by the 3D Strabismus Photo Analyzer previously described. 11 Functionality with infrared images allowed the measurement of latent strabismus only discernible following disruption of fusion.
Almeida et al. iterated upon their previous work in another analysis of images previously diagnosed by a specialist. 12 An important change in protocol was introduced by including patients with deviations up to 90 prism diopters (PD), facilitating both initial checking and diagnosis. Low accuracy noted in the diagnosis of orthotropic patients was attributed to the disconnect in precision between the tool and the specialist unaccustomed to working with such small shifts and precision.
Techniques used in static image analysis were transferred to digital videos in work by Valente et al., who proposed a computational methodology involving data extracted from a cover test video. 13 Utilizing the eye-region detection and search-space delimitation technology introduced by Almeida et al., eye-tracking software was integrated to facilitate the classification of strabismus through selecting the highest average of deviation measure. Compared with previous relevant studies, Valente claimed superiority in affordability of equipment, classifying multiple directions of deviation, and diagnosis of nonapparent strabismus, although only exotropic patient videos were available.
Using a different method to quantitively measure the extent of horizontal strabismus, Dericioglu et al. observed and clinically validated the ratio between the geometrical corneal center and light reflex to calculate gaze angle and imaging distance. 14 A high correlation was reported between real and estimated gaze angles, as well as imaging distance. The error rate was not correlated with patient age or deviation angle.
The functionality of dedicated occlusion glasses with eye-tracking software was investigated by Yehezkel et al. 15 No significant interexaminer variability for APCT and automated ACT was detected. The average automated test duration was 46 s. The repeatability of automated test was significantly higher with just under a twofold reduction in average standard deviation for horizontal and vertical deviations.
Zrinscak et al. introduced eye-tracking software to detect manifest strabismus without the need for a skilled examiner. 16 The novel Strabiscope system was described, to calculate strabismus diagnosis parameters. Strabismic participants were shown to have higher values for all measurements compared with the nonstrabismic control group.
Kang et al. reviewed an automated mathematical algorithm to quantitively measure strabismus from analysis of cardinal gaze position images, validated with confusion matrices. 17 Through direct application to clinical scenarios, this study was an improvement on De Figueiredo’s previous work on a convolutional neural network (CNN) web application. 18 Having many categories of strabismus was an improvement on Zheng’s proposed algorithm. 19
WEARABLE DEVICES
Capo-Aponte examined the effectiveness of a fixed computerized oculomotor vision screening system. 20 While Pearson correlation was strong, Bland–Altman analysis showed moderate discrepancies to fusional vergence and monocular accommodative facility measurements. Persistent overestimation of left hyperdeviation and underestimation of right hyperdeviation were observed.
Upgraded components included in other hardware devices offer additional advantages. Bakker et al. introduced the Delft Assessment Instrument for Strabismus in Young Children combining infrared light-emitting diodes and a high-resolution stereo system, allowing unrestrained head movement for the quick and reliable quantification of strabismus angles in young children. 21
Novel video goggles to deliver a simple noninvasive test for strabismus were innovated by Weber et al. 22 The video goggles used infrared light with liquid crystal display shutters and projection of a laser target to measure ocular deviations on a nine-point target grid. Patients with visual suppression and comitant strabismus, who are not able to be examined by Hess chart, were able to be examined by the strabismus video goggles.
Virtual reality (VR) also has a role to play in facilitating assessment. Miao et al. integrated a VR system into the measurement of ocular deviation for strabismus patients, alternating fixation targets between the eyes to emulate a standard cover test. 23 The study contrasted a direct measurement (DM) strategy with stepwise approximation (SA, measuring ocular deviation through feedback calibration). While both DM and SA had excellent performance in orthotropia and exotropia, SA displayed more stable results than DM on 95% limit of agreement (LOA) of difference.
Following on from Miao et al’s VR work., Yeh et al. researched the viability of an eye-tracking VR headset for strabismus measurement simulating APCT. 24 Eye-tracking software recorded patient eye movements between two screens with alternating fixation targets. A large standard deviation of 5.77 PD was revealed between VR and APCT, which was attributed to large degrees of strabismus and 5-PD increments in the prism set.
mHealth
A quantitative strategy first pioneered by Pundlik et al. applied an automated photographic Hirschberg test for the measurement of ocular deviation using the EyeTurn mobile application, automatically processing corneal reflection position relative to the globe center. 25 Consistency of app measurements with cover test was further increased following correction of cover test values for near fixation.
Ma et al. continued exploring the scope of a mobile device’s potential for automated strabismus diagnosis, formulating a “one-step streamlined screening solution,” aiming for comprehensive examination of children in resource-limited areas such as low-skilled technicians. 26 The procedure proved highly scalable, calculating values in only ten seconds, leveraging artificial intelligence (AI) algorithms for a potential throughput of 200 children an hour.
Following Pundlik et al., EyeTurn was also evaluated for feasibility in a cross-sectional trial by Cheng et al. in the context of routine vision screening in a school performed by an untrained school nurse, using the photographic Hirschberg method. 27 The optimal threshold for strabismus detection was 3.0PD. From regression analysis, a high positive correlation was observed.
To tackle the public health problem of amblyopia, Mesquita et al. designed a trial looking at the concordance between a low-cost mHealth application for instant strabismus diagnosis and expert clinical evaluation by ophthalmologists. 28 Difference between measurements of horizontal and vertical deviations was attributed to the tangential observation of deviation to the mobile camera.
Racano et al. proceeded to validate the 2WIN corneal reflexes app using the AAPOS 2013 guidelines of a >8 PD threshold. 29 Difference with the standard calculated from Wilcoxon signed rank sum test was significant for vertical deviations (poor correlation) although not for eso- and exodeviations. Fair correlation was observed for esotropia compared to exotropia correlation.
Finally, VR was integrated by Bindiganavale et al., who validated a VR approach for the measurement of torsional strabismus. 30 A virtual double Maddox rod (DMR) examination was implemented using a commercially available smartphone and VR viewer. At higher degrees of cyclodeviation, more biasing of VR-DMR measurements was observed. Of the participants, 54.8% found VR-DMR to be easier to use than standard DMR, and all participants were optimistic about using smartphone applications for testing.
ARTIFICIAL INTELLIGENCE
Fisher et al. first described an artificially intelligent expert system featuring a backpropagation learning method to progressively categorize strabismus. 31 This study was the first attempt to use real clinical data rather than a parametric biomechanical model. The StrabNet system was shown to be effective for diagnosis. Outside expert input showed potential for teaching and learning as well.
Chandna et al. further investigated the same StrabNet system simplifying the system to six directions. 32 Matching between StrabNet and expert diagnosis occurred in a majority of cases, including those for which the StrabNet tool had not been explicitly trained.
In 2018, Chen et al. introduced the application of eye-tracking CNNs for recognition of strabismus using nine-point gaze acquisition data, reducing labor costs and increasing diagnosis efficiency. 33 The highest accuracy was achieved by the Visual Geometry Group-S model, only misclassifying one strabismic and one nonstrabismic participant.
An application for automated strabismus detection using telemedicine was described by Lu et al. in 2018 through the establishment of a tele-strabismus dataset. 34 Past a threshold of 1500 training examples, detection results improved significantly. Compared with previous studies such as Valente and Chen, the system described by Lu is superior in that it does not require on-site assistance of specialists. 13,33
De Figueriredo et al. designed an early prototype mobile app integrating AI technology with the aim of a fully objective automated classification of eye versions using the programming language Python. 18 Results were limited due to several classes with few or no observations, decreasing overall global quality metrics.
The pediatric aspect of AI for strabismus was investigated by Zheng et al. in a study applying a deep learning approach to screening of photos of children’s eyes for horizontal strabismus. 19 Promisingly, a difficult case of epicanthal pseudostrabismus was diagnosed correctly. Relative to the previous literature, this system proved superior in providing complete end-to-end learning without need for manual adjustment.
An automatic strabismus screening method utilizing CNNs was also validated by Huang et al., in a pretrained model for detection of facial landmarks to extract eye region for measurement of positional similarity. 35 Novel use of Otsu’s binarization and the hue saturation value model, mitigating confusion from lashes and canthi, was described. In comparison to Zheng, manual adjustment was not needed for extraction of eye region due to Huang’s use of facial landmark detector. Superiority was also achieved over Almeida’s work by obviating the need for image acquisition on-site and additional labor.
Huang et al. followed up this study a year later with an improvement on the proposed method, adding an image processing component to the meta-learning MetaOptNet architecture. 36 A data-scarce environment was suggested as a particular use-case for said supplementation.
META-ANALYSIS
For studies utilizing remote automated methods (defined as not involving human input, excluding remote assessment), for which data were available, a meta-analysis was conducted to determine their overall effectiveness. Statistical calculations were performed with the analysis software Stata (StataCorp. 2023. Stata Statistical Software: version 18.0. College Station, TX). A random effects meta-analysis model with the restricted maximum likelihood method was selected to compute raw effect sizes for estimating the relevant overall proportions. Eight studies using automated methods had available data on accuracy. From these studies, the overall accuracy computed was 0.877 (0.806–0.949) (Fig. 5). A different subset of 11 studies contained data on sensitivity and specificity. The overall sensitivity calculated was 0.856 (0.805–0.907) (Fig. 6), and specificity 0.900 (0.845–0.954) (Fig. 7).

Forest plot accuracy.

Forest plot sensitivity.

Forest plot specificity.
RISK OF BIAS
Risk of potential study bias was assessed using the QUADAS-2 tool, across the four domains patient selection, index test, reference test, and patient flow. Two studies were deemed to have high risk of bias, seven unclear, and 25 low. High risk of bias was attributable to patient selection and matching review question.
GRAY LITERATURE REVIEW
No relevant articles were identified from review of the surrounding gray literature.
Discussion
There is a large clinical need for technological innovations permitting timely diagnosis and treatment of strabismus, especially for children below the age of seven for whom amblyopia is still reversible. 37 Overall, the selected studies demonstrate a high level of confidence in the safety, reliability, and feasibility of utilizing telemedicine technology for the purpose of strabismus diagnosis across various metrics. Success of telemedicine modalities was also described qualitatively in several domains relating to clinical efficacy, such as short duration of examination time, patient satisfaction, and scalability.
Assuming that the gold standard of full eye examination has perfect sensitivity and specificity for strabismus diagnosis, the computed meta-analysis values may be high enough for automated methods to start to be considered in clinical practice, especially where access to conventional assessment is limited. However, considerable room for improvement remains before widespread adoption, as values equal to or less than 0.9 mean that at least one in 10 patients with strabismus is missed and one in 10 without strabismus incorrectly diagnosed. Interestingly, the calculated specificity was marginally higher than the sensitivity, indicating that the automated methods were slightly better at ruling out, rather than ruling in. This could speak to an overly cautious tendency in automated methods, incorrectly categorizing aberrant biomechanical parameters within the spectrum of physiological variance where a human would not. It will be useful to follow whether this disparity remains as technologies are iterated upon further, as a consistently higher specificity could implicate a greater utility for automated methods as a second-line confirmatory test, following a more sensitive human-mediated triaging system. In addition, it could be worthwhile for scientists designing said methods to be cognizant of this by modifying the in-built parameters of their algorithms to be less conservative.
LIMITATIONS OF INCLUDED EVIDENCE
Study designs and settings were heterogenous in nature, with wide variance in outcome measures used. Participant numbers in many studies were low, reducing statistical power. 30 Confirmation bias was introduced where blinding was absent. 5,8 Selection methods were highly biased through the arbitrary exclusion of patients due to young age and disability. 3,7,8 A subset of studies also excluded problematic diagnoses from the beginning, artificially inflating efficacy. 11,16 Systems utilizing population Hirschberg ratio averages were not controlled for ethnicity or sex, disproportionately affecting reliability for some demographic groups. 25
LIMITATIONS OF THE REVIEW
The validity of the meta-analysis conducted in this review is limited by high heterogeneity, as well as the calculations lacking several relevant included studies due to variation in study outcomes (lacking data on accuracy, sensitivity, and specificity). As only one reviewer was available, no consensus could be reached during the abstract and full article review process.
Conclusions
The future of strabismus assessment will likely involve integration of advanced, highly accurate systems into convenient platforms such as smartphone apps. Beyond diagnosis, rehabilitation could also be delivered through telemedicine, such as visual exercises for amblyopia. For clinical validation, studies should be designed directly comparing outcomes of discrete modalities, guiding the direction of future research. This review strengthens the argument that advancements in telemedicine technology have significant potential to improve the accuracy and availability of strabismus assessment.
Footnotes
Acknowledgment
Many thanks to University College London for supporting this work.
Authors’ Contributions
Dominic Wong: Conceptualization, Methodology, Software, and Writing—Original draft preparation. Malik Alsaif: Writing—Reviewing and Editing. Lloyd Bender: Supervision
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
