Abstract
Abstract
Background:
Traditional stratification of expertise in laparoscopic simulation assigns participants to novice, intermediate, or expert groups based on case numbers. We hypothesized that expert video assessment might refine this discrimination of psychomotor expertise, especially in light of new measurable parameters.
Materials and Methods:
One hundred five participants performed a defined intracorporeal suturing task in the pediatric laparoscopic surgery simulator armed with force-sensing capabilities. Participants were stratified into novice, intermediate, and expert groups via three classification schemes: (1) number of complex laparoscopic cases, (2) self-declared level of expertise, and (3) average expert rating of participants' videos. Precision, time to task completion, and force analysis parameters (FAP = total, maximum and mean forces in three axes) were compared using one-way analysis of variance tests. P < .05 was considered significant.
Results:
Participants stratified on the basis of case numbers and on the basis of self-declared level of expertise had statistically significant differences in time to task completion, but no significant difference in FAP. When participants were restratified according to expert assessment of their video performance, time to task completion as well as total and mean forces in X, Y, and Z axes allowed discrimination between novices, intermediates, and experts, thus establishing construct validity for the latter. Precision did not allow discrimination in any stratification scheme.
Conclusion:
Compared with traditional stratification, video assessment allows refined discrimination of psychomotor expertise within a simulator. Assessment of FAP may become a relevant tool for teaching and assessing laparoscopic skills.
Introduction
Laparoscopic simulation is used with increasing frequency in surgical training programs to allow for increased skill acquisition ex vivo, to assess trainee competency, and to increase technical practice in an era of reducing duty hours.1–3 A critical aspect of laparoscopic simulation is how to assess expertise, so that educators may set goals and benchmarks to which trainees can aspire.4–6 Laparoscopic expertise is most frequently defined by the number of surgical procedures performed, in combination with self-assessment (most often also based on numbers of procedures as well as professional status).7,8 However, with emerging measures of performance, the classification of expert status may need to be reconsidered. In particular, analysis of laparoscopic videos in the operating room or in the simulator offers significant potential to more precisely define laparoscopic expertise.9,10 Appropriate evaluation of expertise is crucial, not only in teaching and assessing skills but also in the research, development, and validation of novel educational methodology (e.g., curricula) 11 and teaching tools (e.g., simulators using traditional as well as more advanced parameters for assessment, such as analysis of motion and force).12–14
While numerous articles describe novel approaches in the teaching and assessment of laparoscopic skills (motion and force analysis), none has re-examined the very definition of expertise that goes into the proof of the construct (construct validity). There is emerging literature questioning how physicians define expertise and querying whether methods based on self-assessment or years in practice are reasonable.15–17 Our own group, with over a decade of experience with the development, validation, and analysis in laparoscopic simulation, has questioned the traditional methodology for defining psychomotor expertise.
As laparoscopic training advances, identifying how to optimally evaluate expertise and how to allow trainees of all levels to continue to improve their technical skills may be increasingly important. In this study, we aimed to assess whether expert video assessment may offer refinement in the assignment of psychomotor expertise, using our recently developed capability to measure force analysis parameters (FAP) in the performance of a defined intracorporeal suturing task in the pediatric laparoscopic surgery (PLS) simulator. We took this opportunity to cross reference these new metrics (force) and traditional metrics (time to task completions and precision) with the traditional stratification scheme for assignment of expertise and contrast those findings with a refined scheme involving expert scoring of participants' videos.
Materials and Methods
Candidates
One hundred five participants were recruited at an education booth of the 2016 International Pediatric Endosurgery Group (IPEG) conference to undertake a laparoscopic task. As per a protocol approved by the Hospital for Sick Children (REB# 1000025362), informed consent was sought. We asked participants to perform a defined intracorporeal suturing task in the PLS simulator (Fig. 1). Candidates were stratified in three different ways: (i) self-declaration: novice, intermediate, and expert; (ii) by case number: novice (<10 laparoscopic procedures/year), intermediate (10–50 laparoscopic procedures/year), and expert (>50 laparoscopic procedures/year); and (iii) video assessment by three independent experts: novice, intermediate, and expert.

Pediatric laparoscopic simulator.
Simulator
The dimensions of the PLS simulator were 18 cm (length) × 10 cm (width) × 9 cm (height) as outlined previously. 18
Defined intracorporeal suturing task
A suture (4–0 silk on an RB-1 needle) was cut to 9–10 cm in length and placed at a predetermined position in the simulator. Participants were expected to grasp and position the suture on a laparoscopic 3 mm needle driver and pass the needle through two clearly drawn target points on either side of a slit in a Penrose drain. The size of the Penrose drain was 11 mm × 6 mm × 30 cm cut to a 15 mm length and secured with Velcro to an underlying board. After passing the needle through both targets in the Penrose, the participant was expected to tie one surgeon's knot (double throw) followed by two simple knots (single throw) to secure the knot.
Data collection and analysis
An HD digital camera was used to record and transmit videos in real time. A Microsoft LifeCam HD-6000 high-speed USB camera with the ability to record video with resolutions up to 1280 × 720 and speeds up to 30 frames per second was used. To measure force, we used a 3D mouse (DX-700034 SpaceNavigator for Notebooks), manufactured by 3Dconnexion (MA). This provided data on translational and rotational force in X, Y, and Z axes. The mouse was held in place by a second 3D-printed housing unit. The 3D mouse was connected to a personal computer and interfaced in Java by using a Java input (Jinput) library. The 3D mouse is polled at a rate of 40 hertz, providing us with force in all three directions along with the torque about each axis. The output of the mouse was calibrated to convert its movement into force measurements. This was achieved by applying a known force in each direction and mapping the output of the mouse to the applied force. Raw force data of each participant were exported as text files for postprocessing. The data were then processed using MATLAB and converted into newtons (N). Mean force, maximum force, and the total force exerted over the time span of the task were measured in the X (“side to side”), Y (“in and out”), and Z (“up and down”) axes.
Statistical analysis
Statistical analysis was performed using SPSS version 25 (Statistical Package for the Social Sciences; IBM Corporation, Armonk, NY). Results are reported as mean ± standard deviation (SD) for normal distribution. To assess concordance between the video assessments, correlations between participant marks were determined using Pearson's correlation coefficients and kappa scores. Precision, time to task completion, and forces were compared across the classification schemes for assigning expertise, using one-way analysis of variance (ANOVA) tests. Post hoc analysis was undertaken with least statistical difference post hoc tests. Statistical significance was determined at P < .05.
Results
Participants
One hundred five participants were recruited to undertake the task. Ninety-nine participants completed the task to generate videos that were assessed by the three independent, expert surgeons who were blinded to the identity of participants. Demographics of expertise level are outlined in Table 1. There were 20 novices (19%), 54 intermediates (51%), and 31 experts (30%) as defined by case numbers (Table 1). By self-assessment, 103 participants were willing to assign their own skill levels, and there were 75 novices (71%), 14 intermediates (13%), and 14 experts (13%). Ninety-nine participants completed the task in both simulators to generate videos that were assessed by the three independent, expert surgeons who were blinded to the identity of participants. By video assessment, there were 34 novices (34%), 50 intermediates (51%), and 15 experts (15%). The interobserver correlations between the video analysts were strong with Pearson's correlation coefficients of 0.719, 0.787, and 0.912, all with two-tailed significances of <0.001 and a moderate measure of overall agreement, kappa = 0.571.
Expertise Levels of Participants
Participants were assigned to expertise levels using three different stratifications—(i) case numbers—novice (<10 laparoscopic procedures/year), intermediate (10–50 laparoscopic procedures/year), and expert (>50 laparoscopic procedures/year), (ii) self-assessment of skill level, and (iii) video assessment of skill level by the average rating of three independent experts.
Discrimination of expertise by case numbers, self-assessment, or video assessment
Time to task completion allowed discrimination between novice, intermediate, and expert groups regardless of the method used to assign level of expertise (case numbers, self-assessment, or video assessment) (Table 2). Precision of the task (mm from the desired placement marks for the laparoscopic suture) did not discriminate between groups regardless of the method used to assign level of expertise. Evaluation of expertise based on case numbers did not allow discrimination between expertise levels for total, maximum, or mean forces in any axis. Evaluation of expertise based on self-assessment allowed discrimination between novices, intermediates, and experts in the maximum force in the Y (“in and out”) axis, but in none of the other FAP. Evaluation of expertise based on video assessment allowed discrimination between novice, intermediate, and expert groups for total forces and mean forces in X (“side to side”), Y (“in and out”), and Z (“up and down”) axes, as well as in maximum forces in the X (“side to side”) and Y (“in and out”) axes.
Discrimination Between Novice, Intermediate, and Expert Groups Based on Case Numbers, Self-Assessment, and Video Assessment Evaluating Outcomes, Including Time to Task Completion, Precision, and Total Forces
Where indicated, post hoc analysis failed to show a difference between novice and intermediate.
Where indicated, post hoc analysis failed to show a difference between intermediate and expert.
Discussion
While the traditional methodology for the assignment of expertise remains valid, our sense is that refinement in the assessment and definition of true psychomotor expertise is timely and important. With the development of novel parameters for teaching and assessing laparoscopic skills (analysis of motion and of force), the ability to ascribe true psychomotor expertise is crucial for the development and validation of the methodology (programs) and tools (simulators and their novel equipment) used in education. We are not proposing that the traditional paradigm for assigning expertise is faulty. Rather, we suggest that further refinement in the methodology by which we define psychomotor expertise (with expert video assessment or additional new parameters) may improve the teaching and assessment of laparoscopic skills.
The relevance of refining the definition of expertise stems from the fact that laparoscopic simulation is gaining importance in the upskilling of residents, and has been shown to have an impact on patient outcomes.2,3,19 This, combined with the development, validation, and evolution of emerging tools permitting the analysis of motion and force, will provide additional potential for the formative assessment of laparoscopic skills. The appropriate validation and implementation of these emerging tools for formative assessment of laparoscopic skills may well hinge on a better understanding and further refinement of our definition of psychomotor expertise. Without this, we may well lose the full potential for real-time/formative assessment and its impact on technical performance. We need to refine our goals if we are to strive for goal-oriented and competency-based training.
Our own experience from years of developing simulators and striving to validate them is that not all participants classified as experts exhibit technical skill expected of a psychomotor expert. Within a cohort of experts by case numbers or self-assessment, there are those who are clearly technical experts and those who are not. 20 This study demonstrates that stratifying participants to novice, intermediate, and expert groups on the basis of expert video assessment not only maintains the traditional ability for time to task completion, to discriminate between participants of varying levels of expertise, but allows the newer FAP to become relevant metrics for discrimination. This is not the case with traditional assignment of expertise on the basis of case numbers and self-assessment; time to task completion remains a relevant metric for discrimination, but FAP are not.
The importance of the analysis of force in the evaluation of trainee skills lies in the relevance of force application to skill in tissue handling and security in knot-tying.21,22 The measurement of force can be integrated into basic simulators such as ours or into more expensive virtual reality simulation equipment and has construct validity as a measure of laparoscopic skill.23–25 Integrating the analysis of force into formative assessment parameters can provide more discrete real-time measures of technical skill and, in time, may be used to assess expertise.
In this study, as in other articles on the topic, analysis of total force allowed discrimination between participants of varying levels of expertise. 23 Our analysis then broke down total, maximum, and mean force in each of the three axes, in an effort to discern whether participants of varying expertise levels applied different forces in each axis while completing the defined intracorporeal suturing task. In the PLS simulator, and for this defined task, all FAP (total, maximum, and mean) in the X axis (“side to side” movement) showed a significant variation across all levels of expertise (novice, intermediate, and expert). This speaks of the relevance of forces in this axis for the completion of this defined task. There was also good discrimination between the expertise levels in the Y axis (“in and out” movement) for all FAP, which may speak of more exaggerated but unnecessary forces in this plane by nonexperts. The maximum force generated in the Z axis (“up and down” movement) in the PLS simulator did not allow discrimination between participants of varying level of expertise, but the total and mean forces were significantly different between groups. This may simply be due to the fact that nonexperts undertake more unnecessary movements in this axis than experts who avoid unnecessary up/down movements. Future analyses will attempt to break down the task itself into discrete subsegments and determine what portion of a task is most challenging from the perspective of forces applied, as we have done previously for motion analysis parameters. 26
Other groups have used video assessment to add to, or form the basis of, their assessment of simulation skills. While vulnerable to a degree of subjectivity, we show good interobserver correlation between the three independent experts in this analysis. Alternatives that may increase the objectivity and accuracy of video assessment include an item response theory analysis as incorporated into systems such as the Global Operative Assessment of Laparoscopic Skills or other grading tools.24,25,27–29 A more structured video assessment may offer assessors a discrete and defined platform on which to base their analysis and increase the objectivity of the analysis.25,29 Alternatively, assessment of skills using computed algorithms may further add to the objectivity of the assessment. 29 We anticipate undertaking a more thorough evaluation of the specific metrics that our expert video analysts use to assign skill levels in a future study and to assess whether these metrics are reproducible with other experts, and across different tasks.
This article is not the first to question traditional metrics and methodology for assigning the level of expertise; across many medical fields, the accuracy of physician self-assessment is limited.17,30 Our research provides preliminary evidence that self-assessment, even on the basis of the number of cases a surgeon is doing, may not be the best way to assess pure psychomotor expertise, especially in the light of emerging technologic advances in the assessment of technical skills. Video assessment may offer refinement in the assignment of level of expertise. Individual studies in fields as diverse as general medical care and cancer pain physicians show that physicians commonly struggle to accurately assess their own level of knowledge or skills.15,31–33 However, there exists a degree of controversy particularly with reference to surgical skills—a meta-analysis of the reliability of self-assessment of technical skills in general surgery reviewed 12 articles and showed that there was a significant correlation between self-appraisal and expert score. 34 Intriguingly, self-assessment via videotaping may offer equivalent discrimination to peer or trainer assessment of videos, which may support the incorporation of video review and assessment into evaluation of surgical expertise.17,34
In any field where technical skills are crucial, even among a cohort of experienced practitioners, there are those whose psychomotor performance is superior. These are the individuals who set the bar for all those who aspire to be experts, and take performance to new heights. As the methodology to assess technical performance evolves in laparoscopic simulation, educators may need to refine the definition of psychomotor expertise, to realign our educational goals and objectives.
Limitations
Our analysis, while the first to compare expertise analysis between case numbers and video assessment in a simulator, remains somewhat limited by participant numbers. Our overall participant number is of a reasonable size for laparoscopic simulation studies, however, particularly in some of the expertise assessments (self-assessment), the size of the individual cohorts analyzed becomes small and this may restrict the discrimination of the individual tests between the smaller cohorts. In addition, the size of the groups varied between the different modes of expertise assignment. This variation may have had an impact on the power of the measures to detect differences between the groups. While our ANOVA showed significant variation between expertise groups, the means themselves were not markedly different, and in some cases, the SDs (reflecting the spread) overlapped. This may indicate that the amount of variation is small and it has not been established what amount of variation in force would have a clinical impact either on tissue or on surgical outcomes. Furthermore, some of the ANOVAs, which showed significant variation between the three groups, did not show individual intergroup significance for one or other intergroup comparisons on post hoc analysis, perhaps secondary to limited numbers in one or more of the groups.
While undertaking this task, our participants were not aware of what was being measured—whether they could have reduced their use of forces or changed the outcomes, and should they have been informed of the outcomes under analysis is unclear. We are looking forward to undertaking future analysis to appreciate how much the use of force can be moderated by participants of varying skill levels.
One of the most important considerations of any evaluation of laparoscopic simulation is that it does not speak to the complex intraoperative decision-making that remains critical in laparoscopic interventions.35,36 These nontechnical skills are difficult to evaluate in a simulation system but should not be underestimated when determining overall expertise levels. While our chosen task (an intracorporeal suturing task) is a common one in surgical practice, and is well validated as a training task, the analysis of expertise may well be further aided by adding more complex tasks to the evaluation of participants. Whether level of performance of said tasks is best ascribed by traditional or newer metrics for assigning expertise (such as video assessment) remains to be seen.
Footnotes
Acknowledgments
We thank all surgeons, residents, and students who contributed their time to this study, and for their valuable input and expertise.
Disclosure Statement
No competing financial interests exist.
