Abstract
Background and Purpose:
Tracking the progression of technical skill acquisition during urology residency training is an essential yet challenging task that has been mostly based on anecdotal and subjective performance assessment. We evaluated five surgical tasks used at our institution to assess skill acquisition among residents over 4 consecutive years in an effort to determine appropriate skill testing for resident proficiency relative to level of training for future performance testing.
Methods:
Urology residents were tested yearly throughout the course of their residency with five surgical tasks in an open, laparoscopic, and robotic format. The five tasks were: (1) rings on a peg, (2) thread the rings, (3) cut the line, (4) hexagonal suturing, and (5) suture and knot tying. Evaluation was performed by a trained instructor to assess quantity and quality of the skill task performance.
Results:
The highest scores were obtained on all open tasks regardless of training level. Residents performed second best on robotic and lowest on the laparoscopic skill tasks. The score difference among surgery platforms was statistically significant P<0.0005 across all tasks. It was Tasks 2 and 5, however, that showed a statistically significant difference in overall quantity×quality score between different postgraduate year (PGY) residents (P=0.03 and P=0.02). In addition, the quantity score for Task 5 also showed a statistically significant difference among PGY residents (P=0.04). There was no statistically significant difference in time to perform tasks among PG years.
Conclusions:
The high-level Tasks 2 and 5 were the most useful in differentiating different levels of skill task competency among urology residents and appear to be most useful in assessing the degree of improvement among residents during training. These tasks have subsequently been worked into our institution's testing curriculum.
Introduction
It has been suggested that these predetermined proficiency levels be expert determined; what universally constitutes an expert, however, remains to be established. 8 It is important that performance scores be established at a level that adequately distinguishes the novice from the experienced resident and can be mastered with sufficient practice of the particular skill. 9 Other researchers have suggested that the ideal predetermined proficiency level should be one in which performance in the OR ceases to increase. 10 Not only must a good surgical curriculum clearly define the predetermined proficiency levels determined by construct and predictive validity testing, but equally important, it should provide the opportunity for practice that is distributed over time rather than a short concentrated session. Others have confirmed this concept of improved competence in skill acquisition on laparoscopic trainers with distributed practice. 11,12
The introduction of work-hour regulations in surgical training has created the potential for gaps in resident education that may be addressed through simulated training, thereby leading to an era in which surgeon level of expertise may increasingly be determined based on skill task performance of specific simulated tasks. 13 This is especially true for certain procedures, such as open and laparoscopic techniques, because exposure to these types of training may not be sufficient to attain proficiency. 14,15
We have instituted regularly scheduled surgical skill practice and testing of our urology residents during their entire training program. Residents participate in four all-day skill training sessions per year in robotic and laparoscopic surgery as well as two visiting professor courses. The resident curriculum also includes five to six robotic and laparoscopic skill training sessions in our surgical skills laboratory per year. We sought to evaluate urologic resident skill task performance for five specific surgical tasks using open, laparoscopic, and robotic surgical approaches over 4 postgraduate years of training. It was our goal to use the results of this study as a formative assessment to determine which tasks were appropriate to use on an ongoing basis for evaluating skill acquisition.
Materials and Methods
The resident technical skills testing protocol was performed under Institutional Review Board approval and after informed consent was obtained from each participant. Seventeen urology residents at the University of California, Irvine, were recruited to participate in the study between 2008 and 2011. All urology residents at the University of California, Irvine, participated in the study. Data were evaluated for a limited number of participants from the beginning to end of their residency training, but a comparison was performed for all postgraduate year 2 (PGY-2) to PGY-5 level residents, in an attempt to track differences in performance scores and determine whether our testing could effectively and reliably differentiate between levels of training. Of those who participated in the study, two residents were consistently evaluated every year throughout the 4-year course of their training, and 15 were evaluated beginning at different levels throughout their training. Each resident had the opportunity to practice the various skill tasks throughout the year during multiple training sessions as mentioned previously and then participated in formal testing at least once a year with all five tasks during each evaluation session.
The five skill tasks were performed by each resident using open surgical instruments, in a laparoscopic box trainer using actual laparoscopic instruments, and with the Standard da Vinci robotic system. For the robot, residents were not required to set up the equipment in any way as part of the testing. All three components of the testing were completed consecutively during the same day, but in no specific order of testing. Residents were tested on different days during a given period because of scheduling conflicts. The five skill tasks used are shown in Figure 1. Although each skill task had specific maximum time limits, the time to complete was recorded for each and was usually less than the allotted time.

The five surgical skill tasks included: Task 1, the plastic rings must be transferred from the pegs on the left-hand side to the right-hand side and then back again; Task 2, a 2-0 prolene suture is threaded through the small metal rings; Task 3, the paper is cut along the solid dark line without cutting outside the solid area; Task 4, needle and suture are passed between the dots from outside to inside around the hexagonal figure; Task 5, a suture and surgeon's knot tie is performed with three executed knot ties.
Task 1, rings on a peg, involved removing and replacing rings on a peg within a 2-minute time limit. Task 2, thread the loops, involved threading a 12-cm long 2-0 prolene suture through 10 3-mm metal rings. The number of rings threaded during a 2-minute time limit was recorded. Skill Task 3, cut the line, involved grasping a piece of paper, on which a curved image was marked, and cutting along the central line within a 2-minute time limit. Task 4, hexagonal suture, involved suturing around a hexagon in a 3-minute period. The final task, Task 5, suture and knot tying, required performance of a surgeon's knot followed by two additional knots within a 1-minute time limit. All tasks were scored by trained evaluators overseen by the minimally invasive surgical education fellow, using a validated scoring checklist 16 shown in Appendix 1. Although only the score sheet for laparoscopic surgery is seen in Appendix 1, the same Objective Structured Assessment of Technical Skill scoring system was used for each surgical type.
Each skill task was assessed for quantity (amount of the task accomplished) and quality (how accurately the skill was performed) of performance. Quality values varied depending on the task being performed (see Appendix 1 for detailed account). The quantity was multiplied by the quality score to provide an overall score, a method previously used. 16 Statistical analysis was performed using repeated measures analysis of variance with surgery type as the repeated measure and resident year (PGY) as a grouping factor. The significance of the interaction effect for surgery type by PGY was used to test whether the improvement across PGY differed between surgery types. Pairwise comparisons between surgical techniques were made after adjustment for multiple comparisons using the Bonferroni method. Results were considered significant for P<0.05.
Results
Residents were tested at five time points, over the course of 4 years: Summer 2008, summer 2009, summer 2010, winter 2010, and summer 2011. The summer 2008 testing included seven urology residents, and all other testing session included eight residents, for a total of 39 participants. Of note, the same residents were not necessarily present for each time point tested, and of those tested, only two were present for each time point tested. Of the 39 participants, there were six female and 33 male subjects. The breakdown by PGY of training included: 10 PGY-2, 9 PGY-3, 10 PGY-4, and 10 PGY-5 residents.
Time
For Task 4, suturing a hexagon, residents at every PGY were stopped at the maximum of 3 minutes, having not completed the task with robotic technique; therefore no statistical tests could be run for this task because all times were at the maximum value of 3 minutes, leaving no variability for analysis. For the four skill tasks that could be analyzed, time to complete the task did not differ significantly across the different PGY of training for any particular type of surgical technique. The difference in times to perform skill tasks comparing the open vs laparoscopic vs robotic technique was statistically significant for the four analyzed tasks (all P values were <0.0005), with the open technique skill tasks performances consistently having the lowest times. For rings on a peg task, the time for laparoscopic technique was significantly longer than the time for robotic technique (P<0.001 after adjustment for multiple comparisons). For all other tasks, the time for laparoscopic and robotic techniques did not differ significantly from each other.
Quantity
For the rings on a peg and cut the line tasks, Tasks 1 and 3, statistical comparisons could not be assessed because all values were at the maximum score for the open technique and so no statistical variability existed. Task 5, suturing and knot tying, demonstrated statistically significant increases in scores across by all PGY, with P=0.04. For Task 2, threading the rings, scores showed an increasing performance score trend with increasing PGY, but it was not statistically significant, P=0.07. Differences between surgical technique performance scores were all statistically significant, P<0.0005, for the skill tasks that could be analyzed, including thread the rings, hexagonal suture and suture and knot tying. Quantity was highest for open technique and lowest for laparoscopic technique for each task with differences between surgical techniques significant at P<0.001 after adjustment for multiple comparisons.
Although there was no significant increase in quantity scores for thread the rings for open or laparoscopic technique (P>0.5), there was a significant increase in skill task quantity scores for robotic technique with increasing PGY (P=0.002), indicating residents displayed the greatest improvement when performing tasks with the robotic technique. For hexagonal suturing and suture and knot tying, surgery type by PGY interaction effects could not be demonstrated, with P values of 1.0 and 0.32, respectively.
Quality
Statistics could not be calculated for cut the line because all values were at the maximum for the open technique, meaning everyone achieved a perfect quality score of 4 for this task (see Appendix 1 for detailed account of quality requirements). There were no significant trends across PGY for any of the analyzed skill task quality scores, P values: Task 1, rings on a peg, P=0.39; Task 2, thread the loops, P=0.587; Task 4, hexagonal suture, P=0.14, and Task 5, suture and knot tying, P=0.12. The P values for the interaction effect between surgical technique type and PGY were also nonsignificant: Task 1, rings on a peg, 0.89; Task 2, thread the loops, 0.72; Task 4, hexagonal suture, 0.28; and Task 5, suture and knot tying, 0.18. The difference between surgery techniques was statistically significant with P=0.02 for Task 1, rings on a peg, and P<0.0005, for Tasks 2, thread the loops, 4, hexagonal suture, and 5, suture and knot tying. After adjustment for multiple comparisons, laparoscopic quality scores were significantly lower than open surgery for all tasks and significantly lower than robotic scores for Tasks 2, thread the loops, 4, hexagonal suture, and 5, suture and knot tying.
Quality times quantity product score
All the quality×quantity product scores for Task 3, cut the line, were at the maximum for the open technique and therefore could not be tested for differences across PGY (Table 1). The quality×quantity product scores for Tasks 2, thread the loops, and 5, suture and knot tying improved significantly with increasing PGY (P=0.03 and 0.02, respectively) (Figs. 2, 3). Quality×quantity product scores for Tasks 1, rings on a peg, and 4, hexagonal suturing, trended toward improvement with increased PGY level (Table 2), but these were not statistically significant: P=0.15 and 0.12, respectively.

Quality×quantity product scores (of a possible 44 points) for Task 2 (thread the rings) taken for each postgraduate year (PGY) (2–5) and separated by type of surgical technique: Open (diamonds), laparoscopic (triangles), and robotic (arrows). There is a statistically significant improvement in scores with increasing PGY (P value 0.03). Also demonstrated in the graph is the trend in scores by surgical technique with highest scores for open technique followed by robotic and, lastly, laparoscopic. Greatest improvements over the 4 years are shown here for robotic technique.

Quality×quantity product scores (of a possible 32 points) for Task 5 (suture and surgical knot tying) taken for each postgraduate year (PGY) (2–5) and separated by surgical technique: Open (diamonds), laparoscopic (triangles), and robotic (arrows). There is a statistically significant improvement in scores with increasing PGY (P value 0.02). Also demonstrated in the graph is a trend toward highest scores for open technique followed by robotic and, lastly, laparoscopic. Greatest improvements over the 4 years are shown here for robotic technique.
For the postgraduate year (PGY) P values, differences across PGY were tested for by looking at mean total scores by PGY averaged across surgery types. The surgery type P value tested for differences between surgery types by testing on average for all PGY if there was a difference across surgery type. The interaction P value tested if the change across time (PGY) differed between surgery types. Significant P values are in italic.
For Task 3, could not run P values for surgery type because there was no variability in open scores.
RA=robot-assisted; lap=laparoscopy.
Differences in quality×quantity product scores between laparoscopic, robotic, and open surgery techniques were statistically significant for all skill tasks that could be statistically analyzed with P<0.0001. The highest quality×quantity scores were achieved with the open technique, followed by the robotic, and lastly the laparoscopic technique. All pairwise differences between surgical techniques were statistically significant after adjusting for multiple comparisons for all tasks other than rings on a peg. For Task 1, rings on a peg, scores for open surgery were significantly higher than for robotic and laparoscopic techniques; however, the latter two did not differ significantly from each other. Performance improvements by PGY were not the same for all surgery techniques. The greatest improvements were observed for the robotic technique compared with the open or laparoscopic technique (Table 1).
Discussion
Tracking the progression of technical skill acquisition during urology residency training is an essential yet challenging task that has been mostly based on anecdotal and subjective performance assessment. We sought to evaluate urology residents' skill task performance over 4 years of their training to determine the effect increased clinical experience and surgical skills practice would have on specific measures of resident surgical skill acquisition as measured with five specific surgical skill tasks in three different surgical formats.
The residents evaluated in the current study consistently received the highest scores for quantity, quality, and the interaction between the two when performing skill tasks in an open surgery format, a trend seen throughout all four PGY levels tested. The lowest performance scores were consistently demonstrated for the laparoscopic technique. Lower scores for laparoscopic procedures is not surprising because the skill set needed for laparoscopy is inherently more complex and challenging than that needed for open or robot-assisted surgery and requires considerably more concentrated practice to develop than the other two surgical techniques. 17
Higher-level skill tasks, such as threading the rings (Task 2) and suture and knot tying (Task 5), were the most differentiating for level of training among the urology residents and as such appear to be most useful in assessing level of improvement among residents during their training. Previous studies have also demonstrated the practicality of tasks like suturing and knot tying in differentiating skill acquisition and mastery and translation into a satisfactory performance in the OR. 18,19
It was also noted that robotic skill tasks demonstrated the greatest improvements with increased time spent in training, a significant finding given the changing climate of minimally invasive surgery and the push to increasingly move from laparoscopic to robot-assisted surgery. We do encourage and provide considerable robotic and laparoscopic skills training time in our residency training program as previously mentioned, but unfortunately did not tabulate this to determine effect on robotic skill task improvement with our regular assessment program.
Also of note, at our institution we do approximately 126 open surgical cases, 63 endoscopic or percutaneous, and 26 laparoscopic or robot-assisted cases per month at our main teaching center alone. Therefore, while residents do have greater exposure to open surgical cases, they get the same amount of exposure with laparoscopic and robotic cases, and so this improvement in robotic performance is not simply a result of increased exposure to this technique.
This study has several limitations to address. First, the sample size of our study group was relatively small with only 17 residents participating over the 4 years for a total of 39 subject points. We did have fairly equal representation of the various PGY levels of training within our study group, however. In addition, performance assessment was dependent on the Surgical Education Center staff, and over time, there were changes in the laboratory personnel who were responsible for the skill task scoring of the residents during training. We think this limitation was somewhat mitigated because a standardized validated scoring checklist was used throughout the 4-year assessment, and all personnel were trained on using this evaluation before testing by the minimally invasive surgical education fellow who oversaw not only training but every testing session. Still, interpretation differences may exist between scorers.
Finally, and perhaps most importantly, we did not set predetermined expert proficiency scoring guidelines for the residents before their testing. As a result, there was wide variation among residents scores achieved among any given residency year and even between different testing times for individual residents. This observation only highlights the benefits in surgical skill acquisition of training to predetermined proficiency levels. 5,10,20,21
It was the intention of this study to create required performance scores for use in the residency training program. We may have garnered more valuable and representative proficiencies for future skills assessment, however, if we had proactively created a reasonable baseline performance goal based on expert skill task scores. We intend to use this acquired data, however, to establish minimum surgical skill performance levels for each PGY level of training for the different surgical platforms throughout our curriculum. We have already incorporated the data garnered from this study into our resident applicant skill testing during the most recent interview period. Applicants were tested on skills 1 (rings on a peg), 2 (thread the loops), and 5 (suture and knot tying) only, with Task 1 being a warm-up and performance on Tasks 2 and 5 being recorded for surgical skill proficiency.
Conclusions
Establishing formal surgical skill task performance assessments in urologic residency training for specific surgical techniques can assist in tracking progression of technical skill acquisition in trainees. Having a realistic and established baseline required performance score for these skill tasks can help residents practice and progress in their training. The high level surgical skill tasks seem to be the most reliable and effective in demonstrating improvement in surgical skill performance as residents transition through their PGY levels of learning.
Footnotes
Disclosure Statement
No competing financial interests exist.
| Time Allotted | Task | Quantity Score | Quality Score | Skill Assessment (Quantity × Quality) | Total Time |
|---|---|---|---|---|---|
| (max 2 mins) | |||||
| 2 min | Rings on Peg (6 ring model) | 0–12 | 0–4 | ||
| → start with rings on pegs | |||||
| → 2 graspers | |||||
| Sequentially remove one ring at a time (place at base), then place rings back on pegs in same order. Can use 2 hands simultaneously | |||||
|
|
|||||
| Total # of rings placed+removed | |||||
|
|
|||||
| 0=unable to place or remove any rings | |||||
| 1=dropped almost every ring, great difficulty | |||||
| 2=dropped many rings, handled rings with minimal difficulty | |||||
| 3=dropped only occasional ring, handled rings with fair ease | |||||
| 4=dropped no rings and handled with ease | |||||
| (max 2 mins) | |||||
| 2 min | Thread the Rings (11 ring model) | 0–11 | 0–4 | ||
| → 2 needle drivers or graspers | |||||
| Pass a 2-0 prolene suture (12cm) through the rings in any order, but the suture must remain in the rings (can't pull through) | |||||
|
|
|||||
| Total # of rings threaded | |||||
|
|
|||||
| 0=unable to thread any ring | |||||
| 1=missed ring on first pass or inadvertent pull back on ∼50% the rings | |||||
| 2=missed ring on first pass on many of the rings+occasional inadvertent pull back | |||||
| 3=missed ring on first pass only occasionally | |||||
| 4=passed suture easily through all rings | |||||
| (max 2 mins) | |||||
| 2 min | Cut the Line (2 semicircle model) | ||||
| → scissors & graspers | |||||
| Cut both sides as close to the inner black line. | 0–18 | 0–4 | |||
|
|
|||||
| Total length of line cut | |||||
|
|
|||||
| 0=unable to cut along line | |||||
| 1=cut>3mm off most of line (outside grey) | |||||
| 2=cut 2-3mm off most of line | |||||
| 3=cut 1mm off most of line | |||||
| 4=cut consistently on black line | |||||
| (max 3 mins) | |||||
| 3 min | Suturing (hexagonal grid on slab) | 0−12 | 0–4 | ||
| → 2 needle drivers | |||||
| → 4-0 vicryl/silk on RB1 needle | |||||
| Starting at 12 o'clock position, continuously suture around a hexagonal figure (sides=1cm). Must place figure-of-8 suture through each side, going through the 4 dots | |||||
|
|
|||||
| Total # of suture throws (2 per side) | |||||
|
|
|||||
| 0=unable to pass through any dots | |||||
| 1=missed all dots when passing needle | |||||
| 2=missed>50% of dots during suturing | |||||
| 3=missed<50% of dots during suturing | |||||
| 4=missed no dots when passing needle. | |||||
| (max 1 min) | |||||
| 1 min | Suturing & Knot tying (suture slab) | 0–4 | 0–8 | ||
| → 2 needle drivers | |||||
| → 0 vicryl/silk suture on SH needle | |||||
| Pass needle through 2 dots then place surgeon's knot (double throw+single throw) followed by a 3rd knot | |||||
|
|
|||||
| Hit both dots (1)+# of knots (3) | |||||
|
|
|||||
| Dots+knots | |||||
| 0=needle entry>3mm from dot | |||||
| 1=3mm from dot 2=2mm from dot | |||||
| 3=1mm from dot 4=on dot | |||||
| + | |||||
| 0=no knots 1=all air knots | |||||
| 2=mostly air knots 3=occasional air knots | |||||
| 4=all square knots |
TOTAL SKILL ASSESSMENT SCORE________
