Abstract
Introduction:
Robotic surgical performance, in particular suturing, has been linked to postoperative clinical outcomes. Before attempting live surgery, virtual reality (VR) simulators afford opportunities for training surgeons to learn fundamental technical skills. Herein, we evaluate the association of suturing technical skill assessments between VR simulation and live surgery, and functional clinical outcomes.
Materials and Methods:
Twenty surgeons completed a VR suturing exercise on the Mimic™ Flex VR simulator and the anterior vesicourethral anastomosis during robot-assisted radical prostatectomy (RARP). Three independent and blinded graders provided technical skill scores using a validated assessment tool. Correlations between VR and live scores were assessed by Spearman's correlation coefficients (ρ). In addition, 117 historic RARP cases from participating surgeons were extracted, and the association between VR technical skill scores and urinary continence recovery was assessed by a multilevel mixed-effects model.
Results:
A total of 20 (6 training and 14 expert) surgeons participated. Statistically significant correlations for scores provided between VR simulation and live surgery were found for overall and needle driving scores (ρ = 0.555, p = 0.011; ρ = 0.570, p = 0.009, respectively). A subanalysis performed on training surgeons found significant correlations for overall scores between VR simulation and live surgery (ρ = 0.828, p = 0.042). Expert cases with high VR needle driving scores had significantly greater continence recovery rates at 24 months after RARP (98.5% vs 84.9%, p = 0.028).
Conclusions:
Our study found significant correlations in technical scores between VR and live surgery, especially among training surgeons. In addition, we found that VR needle driving scores were associated with continence recovery after RARP. Our data support the association of skill assessments between VR simulation and live surgery and potential implications for clinical outcomes.
Introduction
Surgeon technical skill has been linked to postoperative clinical outcomes. 1 In particular, suturing performance during robot-assisted radical prostatectomy (RARP) has been linked to urinary continence recovery after surgery. 2 –4 Before attempting live surgery, virtual reality (VR) simulators afford opportunities for training surgeons to practice with surgical instruments and learn fundamental technical skills.
Simulators have been developed for use in many different surgical fields, including general surgery, 5 spine surgery, 6 neurosurgery, 7 colorectal surgery, 8 and urology. 9 Procedure-specific simulators have also been created. For example, simulated exercises exist for laparoscopic cholecystectomy, 10 appendectomy, 5 antromastoidectomy, 11 carotid artery stenting, 12 and transurethral resection of bladder tumors, 13 as well as those specific to steps of surgical procedures such as the vesicourethral anastomosis (VUA) during RARP. 14 –16
VR simulators have been found to be more cost-effective and efficient compared to live animal training models. 17,18 Validation assessments show that surgical residents feel they would benefit from practicing on simulators and they would increase their comfort in the operating room. 19 In addition to skill practice, VR simulators afford opportunities for training surgeons to receive formative feedback on their technical skills. 20 Furthermore, adjunctive training utilizing a VR simulator has been shown to be effective in reducing surgical operating times. 16
Our previous work has assessed associations of automated performance metrics (APMs) between two surgical settings. That study found many computer-generated measures of surgeon performance could be translated between a VR simulator and laboratory-based surgical robot. 21 These computer-generated metrics provide valuable insight into hand movements during suturing (measure of efficiency) but are not readily translated into direct actionable feedback.
Given the practical utility and proven benefit of VR simulators, we were interested in studying how suturing technical skills assessed in the VR simulation environment and live surgery correlated with each other. Herein, we evaluate the ability to transfer suturing technical skill assessments between VR simulation and live surgery. We further assess the ability of VR performance to anticipate a functional clinical outcome after RARP.
Materials and Methods
Training and expert urologic surgeons at our institution completed a VR suturing exercise on the Mimic™ Flex VR simulator. A priori, training surgeons were defined as those having robotic surgery caseload ≤100 (e.g., junior urologic residents), and expert surgeons were defined as having a robotic surgery caseload >100 (e.g., senior urologic residents, urologic robotic surgery fellows, urology faculty). A console caseload cutoff of 100 was based on a learning curve meta-analysis and previous study from our laboratory. 22,23
The simulator exercise, “Tubes,” consisted of 2 opposed open-ended tubular segments with 16 alternating targets, 8 on each segment end (Fig. 1). A surgeon progressed through the exercise by throwing stitches at illuminating targets, which could appear from within the lumen of the tubes segment or from the exterior. Surgeons completed one trial attempt followed by the graded study attempt. Participants had no prior experience on this VR simulator.

A representative image of the simulator exercise. Color images are available online.
Historical videos of the same surgeons performing the VUA during RARP were also obtained. To ensure consistency between participants, the last eight stitches (four urethral and four bladder) of VUA were included for assessment. The most contemporary video sample with associated clinical data from a surgeon was utilized. For training surgeons, where clinical data were not available, we collected the videos within 2 months before completing the VR exercise.
After collection, live surgery and VR simulation video samples were de-identified and time annotated for each substep of a suturing motion (needle positioning, needle entry angle, needle driving, and needle withdrawal) as previously described. 24,25 Three independent and blinded graders received standardized training and provided technical skill scores guided by the validated assessment tool Robotic Anastomosis Competency Evaluation (RACE). 26 Scores were assigned for each substep of the suturing process. Each substep score was then summated to provide an overall score. Discrepant scores were discussed until group consensus was reached. Each skill domain score ranged from 0 to a maximum of 5; three domains were assessed allowing for a maximum overall score of 15. A high RACE skill score was defined as a 4 or 5, whereas a low score was a 3 or below.
RARP cases from 2016 to 2019, which were performed by participating surgeons in this study, were extracted from a prospectively collected database. Follow-up data at 3, 6, 12, and 24 months were obtained by chart review or telephone by an independent research coordinator utilizing patient-reported outcomes, including urinary continence recovery status. Continence was defined as zero pads or one safety pad per day. 27
Suturing technical skill scores between training and expert surgeons were compared by Wilcoxon rank sum test. Correlations between VR and live scores were assessed by Spearman's correlation coefficients (ρ). A multilevel mixed-effects model was used to test the association between technical skill and urinary continence recovery, although adjusting for the clustering of surgeon data.
Our study complied with protocols approved by the University of Southern California's Institutional Review Board. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This article does not contain any study with animals performed by any of the authors. Informed consent was obtained from all individuals included in the study.
Results
A total of 6 training (median [IQR] caseload 18 [2–45]) and 14 expert surgeons (400 [150–725]) completed a VR simulation exercise and a live VUA.
In live surgery, expert surgeons averaged greater skill scores in all assessed domains, individually and overall. Statistically significant differences were seen for overall RACE score (median [IQR] 14 [14–15] vs 14 [10–14], p = 0.045) as well as the needle driving domain (5 [4–5] vs 4 [3–4], p = 0.009) (Fig. 2A). In VR simulation, experts again averaged greater individual and overall scores. We found that needle driving scores were significantly higher for experts (4 [4–5] vs 3 [3–4], p = 0.045) (Fig. 2B).

Across all experience groups (training and experts), statistically significant correlations between VR simulation and live surgery were found for overall and needle driving scores (ρ = 0.555, p = 0.011; ρ = 0.570, p = 0.009, respectively) (Fig. 3A). A subanalysis performed using only our training surgeon cohort found significant correlations for overall scores between VR and live surgery (ρ = 0.828, p = 0.042) (Fig. 3B). Other RACE skill domains did not reach significance, although needle driving was trending significance (ρ = 0.783, p = 0.066) (Fig. 3B). When studied alone, association between VR and live surgery in the expert surgeon cohort did not reach significance (ρ = 0.367, p = 0.178).

One hundred seventeen RARP cases with 2-year patient continence recovery data from 10 expert surgeons were available for analysis of continence recovery. Cases were not available from the other four expert surgeons as they were urologic fellows with limited case load at our institution. Average caseloads per surgeon available for analysis were 12.2 (IQR 3.5–16). Median follow-up time was 517 days (IQR 335–784). At every interval after RARP, we observed that High VR needle driving scores had greater continence recovery rates with a significant difference observed at 24 months compared to low RACE scores (98.5% vs 84.9%, p = 0.028) (Table 1). For this cohort of surgeons, other skill domains in VR and live surgery did not reveal a similar relationship with continence recovery.
Continence Recovery at 3, 6, 12, and 24 Months After Robot-Assisted Radical Prostatectomy
A high skill score was defined as a 4 or 5, whereas a low score was a 3 or below.
Discussion
VR simulators are an increasingly utilized surgical training tool. They provide opportunities for training surgeons to practice with surgical instruments and learn fundamental technical skills before attempting live surgery. With studies showing that VR simulators are more cost-effective and efficient compared to live animal models, along with surgical residents supporting their implementation for surgical training, VR simulation is a technology that will undoubtedly continue to expand in its use. 16 –20 This foundational study further demonstrates the potential benefit VR may offer, showing that the same skill assessments made in VR simulation correlate with those made in live surgery. We also demonstrate a correlation between VR performance and a functional clinical outcome.
Not surprisingly, our study found that expert surgeons outperformed training surgeons in both assessed surgical settings. In live surgery, we saw significant differences for overall scores as well as in the needle driving domain. Other domains had a greater average score for expert surgeons, but the differences were trending or did not reach significance. In the VR setting, we again demonstrated superior performance in our expert surgeon cohort with differences reaching significance in the needle driving domain. Our findings align with previously published results demonstrating experts outperforming training surgeons on VR simulators. 24
Recently, greater emphasis has been placed on investigating suturing at the more granular substep level (i.e., needle positioning, needle entry, needle driving). 28 This increased level of detail allows for more precision when providing feedback to training surgeons as well as better accuracy when anticipating urinary continence recovery after RARP. 3,24 Our finding that expert surgeons outperformed training surgeons in specifically the needle driving domain aligns with previous studies' findings demonstrating greater expert performance overall, but more specifically, in the needle driving skill. 24
In addition, our study found technical scores provided in VR and live surgery were significantly correlated. We saw moderately strong correlations for overall scores as well as in the needle driving domain. We were particularly interested in assessing correlations in our training surgeon cohort as we envision training surgeons utilizing VR simulators during their surgical education, although preparing for live surgery. We demonstrated that training surgeons had a particularly strong association in overall scores provided between the two surgical settings. When assessed, expert surgeons' scores did not significantly correlate between surgical settings. Expert surgeons, with greater live surgical experience, may have a cognitive dissonance between the simulated (less real) and live environments, detracting from their performance in VR. This may explain the lack of correlation between surgical settings for our expert cohort, whereas our training cohort with limited experience in both environments showed a strong overall correlation.
To our knowledge, this is the first article to study the transfer of suturing technical skill assessments between VR simulation and live surgery. We saw significant correlations in the needle driving domain, which has been demonstrated as one of the robust factors that can be used in the prediction of continence recovery after RARP. 3 A previous article from our group assessed the transfer of APMs (measures of efficiency) between two surgical settings. That study found many computer-generated measures of surgeon performance could be translated between a VR simulator and laboratory-based surgical robot. 21 This study compares two similar platforms, but a key difference is our use of manually derived skill assessments.
In comparison to APMs, manually derived assessments can be more informative and a source of directly actionable feedback. Knowing that skill assessments can be transferred between a training and real-life surgical setting can allow for development of training curricula that incorporate VR simulation training and workshops alongside live surgical training in the operating room.
This study also investigated how technical skill assessments made in live surgery and VR correlate with functional clinical outcomes. Prior work has found that suturing performance during RARP, and specifically the VUA, has been linked to urinary continence recovery after surgery. 2,3 Furthermore, suturing studied at the substep level was found to be more predictive of continence recovery. 3 Using urinary continence recovery data from our expert surgeon cohort, we found that VR needle driving scores correlated with continence recovery after RARP. As previously discussed, our findings suggest that this substep (needle driving) of suturing is particularly relevant to assess and to train for. VR simulation exercises can focus on specific aspects of the suturing process (i.e., wrist angulation in needle driving) to help surgeons improve their skill. This initial correlative finding in our foundational study warrants further evaluation in a larger multi-institutional cohort.
It should be noted that there are two distinct, but equally important, ways of interpreting our data. The first is that one common assessment tool demonstrated construct validity (i.e., detecting differences in expert and learning surgeon performance) in two different surgical settings for assessment of technical skill. We found that expert surgeons outperformed training surgeons in both live surgery and VR simulation as graded using our clinically validated assessment tool. The second is that a surgeon demonstrated similar skills in both settings. Our data show technical skill for participants, particularly training surgeons, strongly correlates between VR simulation and live surgery. These data support the ability to implement a common assessment tool to provide skill assessments to surgeons, although supporting the ability to use VR simulation as a potential directly translatable modality for early surgery training.
Our foundational study is not without limitations. Our analyses of clinical outcomes were limited by a relatively small training surgeon cohort from which we had no clinical data. The training surgeons included in the study performed live surgical cases within the last year (2021), preventing us from studying long-term continence recovery at this time. In addition, to ensure standardization, our study only assessed part of a VUA, which may limit predictions of urinary continence recovery attributable to a single surgeon. Further research can expand our training and expert surgeon cohorts, include a full VUA with multiple examples from each surgeon, and allow greater time to collect data on clinical outcomes. Our study is also limited by not finding a correlation between live surgery RACE scores and urinary continence recovery.
A potential explanation is that we did not have enough variation when using RACE scores from only our expert surgeons (average was 14/15 [IQR 14–15]). With similar scores, we were not able to effectively correlate RACE scores with continence recovery. We predict that if we had greater variance in RACE scoring (i.e., training surgeons included), we could have achieved better accuracy in predicting continence recovery. Finally, it should be noted that our results demonstrate associations and not causative effects. We do not necessarily believe that suturing skills improve urinary continence outcomes. More likely, suturing skills are a good global assessment of surgeon performance, and urinary continence recovery is, in part, determined by surgeon performance. Our findings in this introductory study warrant a large-scale, multi-institutional study to further evaluate the associations demonstrated in this article.
Although these limitations exist, to our knowledge, this study is the first to correlate manually derived ratings of technical skill between VR simulation and live surgery. Future work can expand this study to further evaluate whether using VR can actually improve surgeon performance in real surgery and ultimately improve patient outcomes.
Conclusions
Our data support the transferability of skill assessments between VR simulation and live surgery and potential implications for clinical outcomes. We also show the skill of a surgeon, specifically training surgeons, strongly correlates between two different surgical settings. Our work strengthens the value of VR simulation as a tool for preparing training surgeons for live surgery. We support the further development and implementation of training curricula utilizing VR simulators to shorten the learning curve of training surgeons.
Footnotes
Authors' Contributions
D.I.S. was in charge of project development, data collection and preparation, and article writing and editing.
R.M. performed data collection, data analysis, and article editing.
A.G. performed data collection and article editing.
T.F.H. performed data collection and article editing.
J.H.N. did article editing and project management.
A.J.H. was in charge of project development, data management, and article writing and editing.
Acknowledgments
An abstract from this article was presented at the 2021 American Urologic Association meeting, which was held virtually. The abstract can be accessed using the following link: doi.org/10.1097/JU.0000000000002092.07.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
Research reported in this publication was supported, in part, by the National Cancer Institute under award number R01CA251579-01A.
