Abstract
Background:
The diagnostic accuracy of cystoscopy varies according to the knowledge and experience of the performing physician. In this study, we evaluated the difference in cystoscopic gaze location patterns between medical students and urologists and assessed the differences in their eye movements when simultaneously observing conventional cystoscopic images and images with lesions detected by artificial intelligence (AI).
Methodology:
Eye-tracking measurements were performed, and observation patterns of participants (24 medical students and 10 urologists) viewing images from routine cystoscopic videos were analyzed. The cystoscopic video was captured preoperatively in a case of initial-onset noninvasive bladder cancer with three low-lying papillary tumors in the posterior, anterior, and neck areas (urothelial carcinoma, high grade, and pTa). The viewpoint coordinates and stop times during observation were obtained using a noncontact type of gaze tracking and gaze measurement system for screen-based gaze tracking. In addition, observation patterns of medical students and urologists during parallel observation of conventional cystoscopic videos and AI-assisted lesion detection videos were compared.
Results:
Compared with medical students, urologists exhibited a significantly higher degree of stationary gaze entropy when viewing cystoscopic images (p < 0.05), suggesting that urologists with expertise in identifying lesions efficiently observed a broader range of bladder mucosal surfaces on the screen, presumably with the conscious intent of identifying pathologic changes. When the participants observed conventional and AI-assisted lesion detection images side by side, contrary to urologists, medical students showed a higher proportion of attention directed toward AI-detected lesion images.
Conclusion:
Eye-tracking measurements during cystoscopic image assessment revealed that experienced specialists efficiently observed a wide range of video screens during cystoscopy. In addition, this study revealed how lesion images detected by AI are viewed. Observation patterns of observers' gaze may have implications for assessing and improving proficiency and serving educational purposes. To the best of our knowledge, this is the first study to utilize eye tracking in cystoscopy.
University of Tsukuba Hospital, clinical research reference number R02-122.
Introduction
Eye tracking has gained attention to expedite the training of physicians. Evaluating the differences in gaze patterns between less experienced and seasoned physicians during surgeries and examinations has the potential to assess a physician's proficiency and identify challenges in skill improvement. In addition to studies on eye tracking in gastrointestinal surgeries, 1 –5 eye tracking has been reported in training with robot surgery simulators. 6 In urology, reports of eye tracking include its use in training with a ureteroscopy simulator 7 and the use of an interface with eye tracking during surgery to zoom the monitor screen. 8 However, few studies have investigated eye tracking in the context of cystoscopy and transurethral bladder tumor resection.
Cystoscopy is essential in urologic practice, particularly for nonmuscle-invasive bladder cancer, which has a high rate of intravesical recurrence requiring thorough preoperative evaluation, precise observation, and tumor resection during surgery, followed by vigilant postoperative monitoring. Reports indicate that up to 30% of small bladder tumors or flat lesions may go unnoticed, 9,10 suggesting variability in the diagnostic accuracy because of differences in physicians' knowledge and experience.
We have previously focused on reducing the oversight of lesions during cystoscopy and are conducting research to develop an artificial intelligence (AI)-assisted cystoscopy support system. 11,12 However, the specific gaze patterns of urologists during cystoscopy and how they perceive AI-detected lesions displayed on the monitor during the examination remain unknown. Herein, we evaluated the differences in the observation patterns of medical students and experienced physicians by conducting eye-tracking measurements during the viewing of cystoscopic images of the same cases. In addition, we performed eye-tracking measurements when the participants simultaneously observed conventional cystoscopic images and images with AI-detected lesions.
Materials and Methods
In this study, we used a cystoscopic video (3 minutes 40 seconds) obtained from a case of primary nonmuscle-invasive bladder cancer that was captured at the Yamaguchi University School of Medicine Affiliated Hospital. The case featured low-height papillary tumors (urothelial carcinoma, high grade, and pTa) in the bladder neck, posterior wall, and anterior wall, each of which occurred at three different locations. The participants consisted of 10 urologists (median experience, 10 [range, 4–33] years) and 24 medical students from the department of urology at University of Tsukuba Hospital who provided written informed consent.
A noncontact eye-tracking and gaze measurement system (Tobii Pro Nano/60 Hz, Tobii, Stockholm) was employed, utilizing a 16-inch (40.64 cm) laptop as the display for the monitoring screen (Fig. 1). The eye-tracking system recorded gaze coordinates and dwell time during the observation. Tobii Pro Lab was used for the data analysis.

The experimental setup with the eye-tracking and measurement systems for noncontact monitoring (left: conventional cystoscopic video, right: artificial intelligence-detected lesion video).
The study was divided into two sections: Section 1 was the eye tracking validation for cystoscopic gaze location, presented only regular examination footage. In contrast, Section 2 was the eye tracking validation with an AI detection map, presenting regular footage and images indicating AI-detected lesions (Supplementary Video S1). Section 1 was followed by Section 2. In Section 1, the analysis focused on stationary gaze entropy (SGE) 13 as an indicator of gaze dispersion and compared the values of medical students and physicians. The display area was divided into 16:9 segments, and the total dwell time for each segment was used to calculate the SGE (Fig. 2).

Example heat maps of the gazes according to each segment of the presented video divided into 16:9 cells.
In addition, participants' gaze behaviors were divided into three intervals based on the video duration, and SGEs were measured for each interval. Section 2 included 5 urologists and 12 medical students as participants. In Section 2, the participants were asked to compare conventional endoscopy images with AI-detected lesion images. The display area was divided as described in Section 1, and clustering was performed using K-means clustering with k = 3, enabling pattern analysis.
Statistical analyses
The normality of SGE distributions was assessed using the Shapiro–Wilk test, with a p-value of ≥0.05, indicating a normal distribution. SGEs of students and physicians were compared using a t-test. Further comparison of SGEs was performed using analysis of variance when the entire video was divided into three intervals; each SGE was compared by Tukey's test. Statistical analyses were performed using Python version 3.10.12.
Ethics
This study was approved by the ethics committee of the University of Tsukuba Hospital (No. R02-122), and written informed consent was obtained from all the participants. The cystoscopic video used in this study was captured after obtaining written informed consent.
Results
Section 1: Eye tracking validation for cystoscopic gaze location
Figure 3 shows SGE trends for the medical students and physicians; the SGE of each group was tested by the Shapiro–Wilk test, and the result was p ≥ 0.05, confirming normality. In addition, physicians had a significantly higher SGE (p < 0.001; Fig. 4). It was observed that more experienced urologists viewed the cystoscopic image more extensively than medical students. Comparison of SGEs of students and physicians after the division of the video into three intervals showed that students had a significantly wider range of SGEs (p < 0.05 at 3/3, the end of the video, compared with that at 1/3 and 2/3, the beginning and middle of the video, respectively). For physicians, there were no significant differences in SGEs according to intervals.

Comparison of stationary gaze entropy of students and physicians.

Comparison of each stationary gaze entropy when the video is divided into three intervals
Section 2: Eye tracking validation with an AI detection map
The results of the pattern classification using K-means clustering are illustrated in Figure 5. Three primary gaze patterns were identified: primarily referencing the conventional footage with occasional glances at AI-detected lesion images (Pattern A: Fig. 5A), evenly referencing both types of images (Pattern B: Fig. 5B), and predominantly referencing AI images without focusing on the conventional footage (Pattern C: Fig. 5C). Among the medical students, the distribution was as follows: 25% (three individuals) for Pattern A, 17% (two individuals) for Pattern B, and 58% (seven individuals) for Pattern C.

Differences in observation patterns when observing conventional cystoscopy and artificial intelligence-detected lesion videos simultaneously
Among physicians, the distribution was as follows: 40% (two individuals) for Pattern A, 40% (two individuals) for Pattern B, and 20% (one individual) for Pattern C. The observation patterns differed significantly between medical students and physicians, with a higher proportion of medical students prioritizing AI-detected lesion images as observed per their gaze patterns.
Discussion
The study revealed that urologists had greater SGE and observed a wider range of images when assessing cystoscopic images compared with medical students. This is presumably because urologists are empirically aware of the need to reliably identify lesions and efficiently observe the bladder mucosal surface on the screen. In addition, there were differences in the observation patterns when normal images and images with AI-detected lesions were simultaneously observed.
Currently, the research and development of a cystoscopy support system using AI is underway to reduce tumor oversight in cystoscopy. 14,15 We compared the tumor recognition rate of a cystoscopy system using static endoscopic images and that of physicians, and the AI and urologists showed almost identical diagnosis and diagnostic accuracy. 12 In particular, expert urologists showed high specificity. Physicians focus on identifying lesions and normal tissue accurately, suggesting a particular focus on locations that are difficult to identify.
In addition, research is being conducted on where AI looks to make a diagnosis, with the possibility of displaying a probability map of potential tumor lesion sites in real time. 16 The conventional cystoscopy and AI-assisted lesion detection methods used in Section 2 were designed based on the results of this research. AI-assisted diagnosis, in which a medical doctor makes the final diagnosis, is termed computer-aided detection/diagnosis; current AI-assisted medical devices follow this principle. 17 However, discussions regarding how to effectively present the AI-generated diagnostic results to physicians, increase the diagnostic efficiency of inexperienced physicians, and improve the overall accuracy are lacking.
It is important that physicians do not become dependent on the AI's judgment and inhibit their spontaneous growth. In the case of cystoscopy, real-time diagnostic assistance may be effective in preventing missed lesion detection. However, excessive reliance on AI increases the risk of overmedication. In this study, Section 2 was conducted after Section 1. This implies that Section 2 was performed with the knowledge acquired from Section 1, and if we apply this to the actual situation of cystoscopy, it would be analogous to utilizing AI information during the second double-check observation after conducting the examination.
In other words, during the second double check, performed while reviewing the actual images, experienced physicians, who can capture AI results in their peripheral vision, draw upon their memory of the real images. In contrast, less-experienced students, who focus on AI results during the second double check, may lack the experience of simultaneously observing actual images. Physicians primarily examine the actual images and make their diagnosis, utilizing AI as Supplementary Video S1 for observation.
In contrast, inexperienced students may be prone to neglecting the observation of actual images as they concentrate on the new information. To enhance diagnostic accuracy, it is valuable to apply insights from gaze measurement to the presentation of AI information, user interface design, and similar aspects.
The limitation of this study is its desk-based design, which may differ from actual cystoscopies performed in a clinical setting. In practice, many factors affect the accuracy of the examination because physicians must effectively observe the bladder while manipulating the cystoscope, diagnose the presence of lesions, and record results. Nevertheless, this study showed that eye tracking could be used to reveal areas of interest in cystoscopic images and develop criteria for objectively assessing the skill levels of experienced physicians.
For example, quantifying the direction, path, and frequency of gaze toward a region of interest during tumor examination and resection may reveal patterns of physicians' gaze movements. Comparing these patterns with eye-tracking data obtained from medical students and inexperienced physicians could be a useful tool for educational and clinical purposes. Furthermore, incorporating these patterns into AI-based cystoscopy support systems may enable the development of next-generation AI that reflects physician behavior.
Conclusions
Gaze measurements of cystoscopic videos revealed that experienced specialists effectively observed large areas of video images during cystoscopy. It also provided suggestions regarding how to perform lesion detection using AI. Observation patterns of an observer's gaze can be used for educational purposes to assess and improve proficiency. To the best of our knowledge, this study is the first to report eye-tracking measurements and evaluations in the context of cystoscopy.
Footnotes
Authors' Contributions
A.I. had full access to all the study data and was responsible for the integrity and accuracy of the data analysis. K.I. contributed equally to this study. Study concept and design were carried out by K.I., A.I., and H.N. Acquisition of data was by K.I., K.K., S.S., and A.I. Analysis and interpretation of data were by K.I. and H.N. Drafting of the article was done by K.I. and A.I.
Critical revision of the article for important intellectual content was carried out by K.I., A.I., K.K., S.K., and H.N. Statistical analysis was done by K.I. Funding was obtained by A.I. and H.N. Administrative, technical, or material support was taken care by K.K., H.N., and M.S. Supervision was carried out by Y.O. and H.N.
Author Disclosure Statement
Atsushi Ikeda owns stock in Vesica Corporation, a bladder endoscopy AI startup. All other authors declare no conflicts of interests.
Funding Information
This study was supported by the Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research (JP21K16751 and JP23K08710). Part of this research was funded by the New Energy and Industrial Technology Development Organization under project JPNP20006.
Supplementary Material
Supplementary Video S1
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
