Virtual Reality vs Dry Laboratory Models: Comparing Automated Performance Metrics and Cognitive Workload During Robotic Simulation Training

Abstract

Background:

This study compares surgical performance during analogous vesico-urethral anastomosis (VUA) tasks in two robotic training environments, virtual reality (VR) and dry laboratory (DL), to investigate transferability of skill assessment across the two platforms. Utilizing computer-generated performance metrics and pupillary data, we evaluated the two environments to distinguish surgical expertise and ultimately whether performance in the VR simulation correlates with performance in live robotic surgery in the DL.

Materials and Methods:

Experts (≥300 cases) and trainees (<300 cases) performed analogous VUAs during VR and DL sessions on a da Vinci robotic console following an Institutional Review Board (IRB) approved protocol (HS-16-00318). Twenty-two metrics were generated in each environment (kinematic metrics, tissue metrics, and biometrics). The DL included 18 previously validated automated performance metrics (APMs) (kinematics and event metrics) captured by an Intuitive system data recorder. In both settings, Tobii Pro Glasses 2 recorded the task-evoked pupillary response (reported as Index of Cognitive Activity [ICA]) to indicate cognitive workload, analyzed by EyeTracking cognitive workload software. Pearson correlation, Mann–Whitney, and independent t-tests were used for the comparative analyses.

Results:

Our study included six experts (median caseload 1300 [interquartile range 400–3000]) and 11 trainees (25 [0–250]). A total of 8/9 metrics directly comparable between VR and DL showed significant positive correlation (r ≥ 0.554, p ≤ 0.032); 5/22 VR metrics distinguished expertise, including task time (p = 0.031), clutch usage (p = 0.040), unnecessary needle piercing (p = 0.026), and suspected injury to the endopelvic fascia (p = 0.040). This contrasts with 14/22 APMs in DL (p ≤ 0.038), including linear velocities of all three instruments (p ≤ 0.038) and dominant-hand instrument wrist articulation (p = 0.013). Trainees experienced higher cognitive workload (ICA) in both environments when compared with experts (p < 0.036).

Conclusions:

Most performance metrics between VR and DL exhibited moderate to strong correlations, showing transferability of skills across the platforms. Comparing training environments, APMs during DL tasks are better able to distinguish expertise than VR-generated metrics.

Introduction

Simulation training is used during the initial phase of the surgical learning curve to improve psychomotor skills without risking patient safety.¹ This is particularly true for robot-assisted surgeries since they require not only complex psychomotor skills but also the ability to fluently operate a robotic surgical system. Several different training modalities have been specifically developed for robotic surgery, including virtual reality (VR), dry laboratory (DL) (synthetic), and wet laboratory (human/animal tissue) models.²

Both VR and DL simulation environments have been shown to be effective for minimally invasive surgery training.³ Due to the complexity of the multijoint angulation of robotic instruments, DL simulation of robot-assisted surgery cannot simply be accomplished using a training kit or a training box. Instead, it requires a setup of the full robotic system in the operating room, making this training modality less accessible to trainees. For this reason, various VR simulators for robot-assisted surgery (Mimic dV-Trainer, da Vinci^® SimNow^®, and Simbionix RobotiX Mentor) have been developed. VR simulations are no longer limited to basic robotic surgical skills. They offer training modules for specific robotic procedures, both in part and in their entirety. These new simulation modules offer a realistic surgical setting (anatomic, tissue structure, and interaction) and require a combination of robotic skill elements to complete each task. The question remains whether skills gained using virtual simulator training ultimately transfer to the operating room.^4,5

In our previous works, we developed and validated objective measurements of efficiency derived from robot-assisted surgeries to assess surgical performance. These objective, computer-generated motion-tracking measurements, introduced as automated performance metrics (APMs), have been linked to perioperative and long-term patient outcomes.^6
–8 In particular, APMs during the vesico-urethral anastomosis (VUA), the key reconstructive step of a robot-assisted radical prostatectomy, have been highlighted as important features for prediction of urinary continence recovery.⁸ We have also demonstrated that surgeon cognitive workload while operating, measured by pupillary response, can effectively differentiate surgical expertise.⁹

In this study, we evaluate performance during VUA training tasks using analogous VR and DL training models. We utilize computer-generated performance metrics provided by feedback in the VR environment and APMs generated in the DL. We also track pupillary response to measure the cognitive workload of the participant during the exercises. Using these metrics, we investigate whether a surgeon's performance in the VR setting correlates with their performance on the full da Vinci robot in the DL setting, as well as compare which set of metrics can better distinguish between experts and trainees. Ultimately, we aim to show that surgical skills acquired with the VR simulator transfer to live robotic surgery.

Materials and Methods

Study design

After obtaining institutional review board approval, participants of varying surgical experience at our institution performed a VR VUA task on the da Vinci SimNow (Intuitive Surgical, Sunnyvale, CA, USA) (Fig. 1A) and a DL VUA exercise on the da Vinci Xi System (Intuitive Surgical) (Fig. 1B). Participants were faculty surgeons, urology fellows, and urology residents stratified into two groups based on experience level: trainees (fellows and residents, <300 robotic console cases), and experts (attending surgeons, ≥300 robotic console cases) (cases defined by the number of robotic surgeries where the participant performed a significant portion of the surgery as the primary surgeon). We divided groups by participant training level based on our previous work distinguishing surgeon performance, and for this study, 300 cases provided that cutoff.⁹ Participants were randomized to perform either the VR simulation or DL first.

FIG. 1.

(A) Guided urethrovesical anastomosis by 3D Systems (on SimNow [Intuitive Surgical]). (B) VUA model by 3D Med (on a live da Vinci Xi Surgical System [Intuitive Surgical]). 3D = three-dimensional; VUA = vesico-urethral anastomosis.

VR VUA exercise

A guided VR simulation of VUA (3D Systems, Rock Hill, SC, USA) was performed on the da Vinci SimNow (Fig. 1A). A total of 12 stitches were guided by the VR system to complete the entire VUA. Participants were oriented to standardized directions to ensure baseline understanding on the use of the da Vinci SimNow. A total of 22 metrics were collected during the VR simulation, including 20 derived by an automated evaluation provided by the simulator. Metrics were categorized into five metric families: kinematic metrics (i.e., distance and path length), event metrics (i.e., console events [clutch usage]), tissue metrics (i.e., tissue handling technique [unnecessary needle piercing and wound separation]), duration (i.e., total task time), and one biometric (cognitive mental workload).

DL VUA exercise

Participants conducted the DL task on a live da Vinci Xi System using a VUA kit produced by 3-DMed^® (https://www.3-dmed.com/product/vesico-urethral-anastomosis-kit/) (Fig. 1B). The participants were instructed to use the same anastomosis technique as the simulation to ensure correlation between the 2 tasks, with a double-armed suture and 12 stitches in total. The positions of the 12 stitches were premarked on the models to faithfully mirror the location and order dictated by the VR simulation. During each task, robot system data were collected at a sampling rate of 50 Hz using a custom data recorder provided by Intuitive Surgical (Fig. 2A). In this study, a total of 22 metrics were collected in the DL (including 18 previously validated APMs) and examined for analysis.^6
–8 Metrics were categorized into four metric families: kinematic metrics (i.e., instrument movement efficiency), event metrics (i.e., clutch usage and unnecessary needle piercing), duration (i.e., total task time and active time), and one biometric (cognitive mental workload).

FIG. 2.

(A) System event data recorder from Intuitive Surgical, Inc., a device that records synchronized kinematic/system event data and endoscopic video through direct connection to a da Vinci Surgical System. (B) Tobii Pro Glasses 2 from Tobii Technology, Inc., a wearable eye-tracking system.

Cognitive mental workload—biometric

Cognitive mental workload was a biometric assessed in both the DL and VR environments using task-evoked pupillary response (TEPR). Participants wore the Tobii Pro Glasses 2 wearable eye-tracking system (Tobii Technology, Inc.), which recorded TEPR by measuring pupil dilation at a sampling rate of 100 Hz, while performing both VR simulation and DL (Fig. 2B). These eye-tracking recordings were anonymized and sent to EyeTracking, Inc., for data processing through their EyeWorks™ software. The software's algorithms produced the Index of Cognitive Activity (ICA), a scaled metric from 0 to 1, reflective of TEPR and real-time cognitive workload, with greater values indicating higher cognitive workload. Measuring cognitive mental workload is an objective measurement of the surgeon's impression of each task's difficulty level and gives insight into the physiologic state of the surgeon in real time during each task.⁹

Statistical analysis

All data, including VR simulation computer-generated metrics, DL metrics (APMs), and biometrics (TEPR), were compared between experts and trainees using an independent t-test or Mann–Whitney test depending on whether the variable was normally distributed. We selected eight directly comparable metrics from the VR simulation and DL along with ICA and conducted a Pearson correlation analysis. Statistical analysis was done using IBM^® SPSS^® 24, with p < 0.05 considered statistically significant. The median and range were used to report performance metrics.

Results

Seventeen participants were enrolled in this study, six experts and eleven trainees. The median robotic console case experience was 1300 (interquartile range [IQR] 475–2625) for experts and 25 (IQR 0–113) for trainees. Two trainees and one expert are left-handed, while the remaining participants are all right-handed.

Translatability of metrics between training environments

We selected nine corresponding metrics from both the VR simulation and the DL session. Eight of the nine metrics generated from VR simulation had statistically significant associations with those captured from the DL session, including all three kinematic metrics, three of the four event metrics, duration, and the biometric (ICA). The total task time, dominant and nondominant instrument traveling distance, camera traveling distance, number of unnecessary needle piercings, number of times the clutch was used, and ICA showed significant correlation between the simulation and DL (Table 1). Cognitive mental workload (ICA) had strong to very strong associations (0.86–0.93, p < 0.001). Kinematic metrics (total task time, dominant and nondominant instrument traveling distance, and camera traveling distance) showed moderate to strong correlations (0.65–0.77, p < 0.008). The clutch use and tissue handling metric (unnecessary needle piercing) showed moderate associations (0.55, p = 0.032; 0.65, p = 0.004; respectively).

Table 1.

Correlation Between Comparable Virtual Reality Simulation-Generated Metrics and Automated Performance Metrics

Metric type	Metric	DL (n = 17), median (min–max)	SIM (n = 17), median (min–max)	r	p
Duration	Total task time (minutes)	12.2 (7.2–35.0)	19.1 (10.3–32.3)	0.768	<0.001
Biometric	Average ICA	0.42 (0.18–0.67)	0.46 (0.20–0.66)	0.927	<0.001
Kinematic metrics	Path length: DH (m)	10.06 (6.48–21.27)	10.41 (6.37–16.40)	0.677	0.006
	Path length: NDH (m)	9.36 (6.04–17.78)	10.38 (4.83–14.03)	0.653	0.008
	Camera distance (m)	0.65 (0.10–1.54)	1.14 (0.39–2.45)	0.671	0.006
Event metrics	Out of view: DH (no.)	11 (1–62)	15 (2–43)	0.470	0.057
	Out of view: NDH (no.)	15 (0–38)	17 (3–47)	0.483	0.050
	Unnecessary needle piercing (no.)	16 (3–33)	29 (12–77)	0.653	0.004
	Clutch usage (no.)	9 (0–38)	12 (1–34)	0.554	0.032

DH = dominant hand; DL = dry laboratory; ICA = Index of Cognitive Activity; NDH = nondominant hand; SIM = simulation.

Distinguishability between expert and trainee surgeon performance

Virtual reality

The VR simulation computer-generated metrics could be grossly categorized into five metric families: kinematic metrics, event metrics, tissue metrics, duration, and the biometric ICA. A total of 5/22 metrics collected during the VR VUA simulation could distinguish surgeon expertise, which included none of the four kinematic metrics, one event metric, two tissue metrics, the duration metric, and the biometric (ICA). During VR VUA simulation, experts were more efficient than trainees, showing shorter task completion time (15.8 minutes vs 22.4 minutes, p = 0.031) (Table 2). For console operating skills, trainees used the clutch more than experts (17 vs 5, p = 0.040). For the tissue handling technique, less injury to the endopelvic fascia/urethral sphincter (0.5 vs 2, p = 0.040) and less unnecessary needle piercing (35.5 vs 52, p = 0.026) were reported for experts. In consideration of the cognitive mental workload, experts had a lower ICA, indicating less mental stress (0.29 vs 0.53, p = 0.036).

Table 2.

Comparing Virtual Reality Simulator-Generated Metrics Between Experts and Trainees

Metric type	Metric	Experts; n = 6, median (min–max)	Trainees; n = 11, median (min–max)	p
Duration	Total task time (minutes)	15.9 (10.4–21.7)	22.4 (13.5–46.9)	0.031
Biometric	Average ICA	0.29 (0.20–0.46)	0.53 (0.30–0.66)	0.036
Kinematic metrics	Total path length of instruments traveled out of view (m)	1.58 (9.01–29.48)	1.19 (0.19–36.34)	0.571
	Distance by camera (m)	1.13 (0.61–2.18)	1.12 (0.39–2.45)	0.661
	Path length: DH (m)	7.71 (7.04–11.54)	10.49 (6.37–13.64)	0.280
	Path length: NDH (m)	8.78 (5.87–10.96)	10.64 (4.83–13.66)	0.280
Event metrics	Clutch usage (no.)	5 (1–13)	17 (3–28)	0.040
	No. of movements: DH (no.)	816 (469–1034)	1008 (595–1455)	0.226
	No. of movements: NDH (no.)	789 (465–897)	943 (463–1186)	0.138
	Suture break count (no.)	0 (0–0)	0 (0–0)	1.000
	Instrument collisions (no.)	22 (10–41)	30 (9–49)	0.571
	Instruments out of view (no.)	99 (52–122)	67 (2–177)	0.412
	Total time instruments are out of view (minutes)	2.3 (1.3–3.4)	1.8 (0.1–6.4)	0.753
Tissue metrics	Suspected injury to the endopelvic fascia/urethral sphincter (no.)	1 (0–2)	2 (1–11)	0.040
	Unnecessary needle piercing (no.)	36 (19–42)	52 (29–97)	0.026
	Injury to the urethra (no.)	2 (1–9)	4 (1–14)	0.280
	Wound separation (mm)	0.53 (0–2.09)	1.05 (0–6.28)	0.661
	Suspected injury to the neurovascular bundle (no.)	0 (0–1)	0 (0–11)	1.000
	Improper suturing technique (no.)	0 (0–0)	0 (0–2)	0.226
	Injury to the bladder neck (no.)	0 (0–5)	1 (0–4)	0.412
	Suspected injury to the ureteral orifices (no.)	2 (0–3)	1 (0–3)	0.412
	Percentage of stitches within optimal depth (%)	91.7 (83.3–95.8)	79.2 (52.0–100)	0.138

DH = dominant hand; ICA = Index of Cognitive Activity; NDH = nondominant hand.

Dry laboratory

Similarly, APMs generated during the DL could be grossly categorized into four groups: kinematic metrics, event metrics, duration, and the biometric ICA; tissue handling metrics are not currently able to be assessed during the DL. A total of 14/22 metrics collected during the DL VUA task could distinguish surgeon expertise and included ten of the 14 kinematic metrics, none of the four event metrics, all three duration metrics, and the biometric (ICA). Experts consistently demonstrated more efficient movement: shorter task completion time (10 minutes vs 15 minutes, p = 0.005); less distance traveled by the nondominant instrument (7.55 m vs 9.67 m, p = 0.013); greater movement velocity of both instruments (dominant and nondominant); and greater movement velocity of the camera (p ≤ 0.038) (Table 3). Experts also had less EndoWrist^® instrument wrist articulation in the dominant instrument while performing the VUA (693 radians vs 863 radians, p = 0.013). Cognitive mental workload measurements again showed that expert participants had lower ICA (0.29 vs 0.43, p = 0.024).

Table 3.

Comparing Dry Laboratory Automated Performance Metrics Between Experts and Trainees

Metric type	Metric	Experts (n = 6), median (min–max)	Trainees (n = 11), median (min–max)	p
Duration	Total task time (minutes)	10.1 (7.2–12.2)	15.4 (9.4–35.0)	0.005
	Active time: DH (minutes)	9.3 (6.7–10.7)	14.8 (9.0–30.5)	0.009
	Active time: NDH (minutes)	8.0 (6.1–8.8)	14.5 (9.0–29.4)	0.005
Biometric	Average ICA	0.29 (0.18–0.41)	0.43 (0.23–0.67)	0.024
Kinematic metrics	Path length: DH (m)	9.53 (6.48–10.91)	10.09 (8.39–21.27)	0.180
	Path length: NDH (m)	7.55 (6.04–9.24)	9.67 (8.07–17.78)	0.013
	Camera distance (m)	1.29 (1.46–1.54)	0.61 (0.10–0.96)	0.441
	Linear velocity: DH (dm/s)	1.59 (1.47–1.72)	1.16 (0.78–1.64)	0.019
	Linear velocity: NDH (dm/s)	1.47 (1.13–1.56)	1.08 (0.86–1.49)	0.038
	Linear velocity: Camera (dm/s)	2.91 (2.21–3.45)	1.86 (1.43–2.32)	0.001
	Wrist articulation: DH (radian)	692.9 (583.7–758.8)	863.2 (671.4–2445.1)	0.013
	Wrist articulation: NDH (radian)	483.6 (376.3–810.2)	740.6 (556.0–1776.3)	0.069
	Pitch: DH (radian)	184.3 (160.0–221.0)	243.5 (190.9–671.2)	0.009
	Pitch: NDH (radian)	134.2 (105.9–226.3)	202.5 (149.4–477.2)	0.038
	Yaw: DH (radian)	203.6 (168.4–233.4)	275.7 (212.4–810.0)	0.005
	Yaw: NDH (radian)	151.3 (121.1–270.1)	237.1 (176.8–605.4)	0.038
	Roll: DH (radian)	305.0 (255.4–317.8)	349.1 (259.3–963.9)	0.027
	Roll: NDH (radian)	198.1 (149.3–313.9)	299.7 (229.8–693.7)	0.090
Event metrics	Clutch usage (no.)	8 (0–16)	9 (2–38)	0.377
	Out of view: DH (no.)	10 (4–27)	13 (1–62)	0.961
	Out of view: NDH (no.)	10.5 (4–38)	19 (0–37)	0.733
	Unnecessary needle piercing (no.)	8 (3–31)	17 (6–33)	0.301

DH = dominant hand; ICA = Index of Cognitive Activity; NDH = nondominant hand.

Discussion

This comparative study sought to evaluate the surgical performance of experts and trainees during analogous VR and DL VUA tasks and primarily investigate whether surgical performance in a simulated training environment correlates with surgical performance on the da Vinci robot. APMs have previously been validated to assess performance on the surgical robot during live surgery but have yet to be used to assess performance in simulated training environments, including VR environments, which provide their own set of metrics. This is the first time that APMs have been recorded in a training environment. We see that not only are APMs highly correlated with metrics from VR, but (in this study) APMs are also shown to be better distinguishers of expertise between analogous exercises in different training environments. Consequently, this highlights the value of APMs in the training environment and lays the foundation for further studies relating DL APMs to live APMs, which again have been predictive of patient outcomes. Proving that surgical skills gained in training environments transfer to live surgical procedures is instrumental to the future of training programs. In this study, we were able to correlate performance across two training environments with different methods of collecting performance metrics. Ideally, we could measure the same APMs in the DL setting and the VR simulation environment. Showing correlation of performance across training environments allows for confirmation that surgical skill improvement in the VR simulation faithfully correlates with improvement in live robotic surgery. The ability to seamlessly track performance and progression of skills during training, regardless of training medium, is vital to measuring progress. Correlating performance on a VR simulation is necessary given its usefulness as a training tool, being less expensive than a full robot and thus more available to training programs.

Our results indicate that some VR-generated metrics could distinguish the expertise of the participating surgeons. A total of 5/22 metrics collected were significantly different among the two groups, including total task completion time, clutch use, injury to the endopelvic fascia/urethral sphincter, and unnecessary needle piercing. When assessing tissue handling metrics, which were unique to the VR platform, we did not see statistically significant differences in tissue injuries (other than the endopelvic fascia/urethral sphincter), tissue approximation, or stitches within optimal depth. The ability to assess instrument interaction with tissue is a potential advantage of VR over DL environments, but perhaps the value of these metrics is currently limited by the technology's ability to truly mimic these interactions.

On the other hand, surgical performance measured by APMs during the DL session showed significant difference between the experts and trainees more often than the metrics on the VR simulator. This suggests that APMs potentially provided more value in terms of assessing instrument movement efficiency. As APMs are time-based metrics generated using data from the robotic instruments and camera, they provide more granular kinematic data and thus more robustly distinguish motion differences between the two groups, especially during complex techniques such as suturing.

While the metrics provided in the DL and VR simulator are extrinsic factors that affect surgeon performance, the cognitive mental workload measures an intrinsic factor showing a real-time measurement of surgical stress levels. TEPR was measured in both training environments to assess and compare participant cognitive mental workloads through ICA values. During both VR and DL exercises, cognitive workload was able to distinguish between experts and trainees. Expert surgeons consistently demonstrated lower ICA values and therefore less mental stress. Between the two environments, all participants exhibited higher cognitive workloads during VR than during DL tasks. Our previous study has shown that under high mental workload conditions, experts and trainees display inverse relationships with kinematic metrics.⁹ In particular, the study had illustrated that experts with high ICA values show a decrease in instrument velocity, while trainees display an increase. The perceived difficulty of the VR task and possibly related consequential increase in cognitive workload may have contributed to the inability to distinguish experts and novices based on kinematic measures alone.

Our study has a few limitations. The sample size was relatively low and from a single institution. Future studies should validate these findings at other centers. The metrics generated by the VR simulation and the APMs captured during the DL session were not completely identical, which limited the ability for perfect comparison. We utilized only one VR simulation model of the VUA. While there are other models, at present, the authors felt that the 3D Systems was the most developed VR VUA simulation. We could not assess tissue handling metrics in the DL settings based on current technical limitations. Future studies with a larger sample size, homogeneous participants with comparable robotic surgical experience, the ability to measure identical APMs in the simulated environment and thus directly compare with surgical data from the operating room, and the use of DL models made with material of measurable deformity may improve the ability to correlate VR simulation performance with live robot performance.

Studies to confirm predictive ability (how performance on VR simulation anticipates future performance in DL settings or even in live surgery) in the future would further provide evidence in favor of robotic surgical training in a simulated environment. The current form of this VR simulator for VUA is limited to development of robotic control skills and instrument movement skills. Currently, it is not as well suited for assessing tissue handling. Further development of this VR platform with more realistic tissue deformation may augment its usefulness for advanced training.

Conclusions

Our study indicates a strong correlation of surgeon performance, as measured by computer-generated metrics, during training exercises in VR and DL environments, highlighting the transferability of skills between the two domains. However, when comparing VR-generated metrics and APMs in isolation relative to their domain, DL metrics were more capable of distinguishing expertise. Cognitive workload and surgeon surveys revealed that VR tasks are more difficult and less realistic than DL tasks.

Footnotes

Authors' Contributions

A.C. was involved in acquisition of data and drafting of the manuscript; J.C. was involved in drafting of the manuscript; S.M. was involved in acquisition of data; S.S.R. performed the critical revision of the manuscript; R.M. performed the analysis and interpretation of data and statistical analysis; S.M. performed the analysis and interpretation of data; J.H.N. was involved in conception and design, acquisition of data, and critical revision of the manuscript; and A.J.H. was involved in conception and design, critical revision of the manuscript, supervision, and obtaining funding.

Acknowledgment

The authors would like to acknowledge Anthony Jarc (Intuitive Surgical, Inc., Clinical Research, Norcross, Georgia, USA) for processing of automated performance metrics.

Author Disclosure Statement

A.J.H. has financial disclosures with Quantgene, Inc. (consultant), Mimic Technologies, Inc. (consultant), and Johnson & Johnson (consultant). All other authors have no conflicts to disclose.

Funding Information

This study was supported, in part, by an Intuitive Surgical Clinical Research Grant.

Abbreviations Used

References

Issenberg

, McGaghie

, Hart

, et al. Simulation technology for health care professional skills training and assessment. JAMA, 1999; 282:861–866.

Lovegrove

, Elhage

, Khan

, et al. Training modalities in robot-assisted urologic surgery: A systematic review. Eur Urol Focus, 2017; 3:102–116.

Schreuder

, Oei

, Maas

, Borleffs

, Schijven

. Implementation of simulation in surgical practice: Minimally invasive surgery has taken the lead: The Dutch experience. Med Teach, 2011; 33:105–115.

Abboudi

, Khan

, Aboumarzouk

, et al. Current status of validation for robotic surgery simulators—A systematic review. BJU Int, 2013; 111:194–205.

Moglia

, Ferrari

, Morelli

, et al. A systematic review of virtual reality simulators for robot-assisted surgery. Eur Urol, 2016; 69:1065–1080.

Hung

, Chen

, Jarc

, et al. Development and validation of objective performance metrics for robot-assisted radical prostatectomy: A pilot study. J Urol, 2018; 199:296–304.

Hung

, Chen

, Che

, et al. Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J Endourol, 2018; 438–444.

Hung

, Chen

, Ghodoussipour

, Oh

, Liu

, Nguyen

, et al. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int, 2019; 124:487–495.

Nguyen

, Chen

, Marshall

, et al. Using objective robotic automated performance metrics and task-evoked pupillary response to distinguish surgeon expertise. World J Urol, 2020; 38:1599–1605.