Abstract
Abstract
Purpose:
We previously developed a pediatric thoracoscopic surgical simulator and showed its construct validity. In this study, the same skill assessment experiments were conducted in an additional 31 surgeons and the results of 53 surgeons in total were analyzed.
Methods:
A suture pad with force sensors was placed in a rapid-prototyped pediatric chest model of a 1-year-old patient. Participants completed the endoscopic intracorporeal suturing and knot-tying task, both in the pediatric chest model setup and in a box trainer setup. The task was evaluated using eight metrics: the 29-point checklist score, error score, number of manipulations, task completion time, force index, width of the pad's slit after suturing, and right and left tool paths.
Results:
The 53 participants included skilled surgeons certified by the Endoscopic Surgical Skill Qualification (ESSQ) system (n = 8) and unskilled surgeons without the certification (n = 45). The skilled surgeons showed significantly better performance than the unskilled surgeons in six metrics in the pediatric chest model setup. In contrast, no significant differences between the two groups were observed in the box trainer setup. Upon comparison between the setups, the unskilled surgeons showed worse results in six metrics in the pediatric chest model than in the box trainer, whereas the skilled surgeons showed equivalent performance in both setups.
Conclusions:
Our pediatric thoracoscopic surgical simulator was superior to the conventional box trainer for identifying skilled surgeons. The skilled surgeons showed excellent performance even in the intricate pediatric chest model, providing quantified targets for young pediatric surgeons' training.
Introduction
S
Other researchers developed realistic surgical simulators and showed their construct validity and/or face validity. The face validity is the extent to which the examination resembles real-life situations. Plymale et al. 5 developed a simulator of laparoscopic pyloromyotomy for training. Ieiri et al. 6 and Jimbo et al. 7 developed laparoscopic fundoplication simulators using an instrument motion tracking system. Obata et al. 8 developed a congenital diaphragmatic hernia simulator using synthetic organs and sensors. Barsness et al. developed simulators of tracheoesophageal fistula,9,10 congenital diaphragmatic hernia, 11 and duodenal atresia 12 using animal organs. They demonstrated the validity of their simulators using a questionnaire and the Objective Structured Assessment for Technical Skills (OSATS). However, quantitative comparison of performance using the realistic pediatric simulator and a conventional box trainer had never been reported.
We previously developed a pediatric thoracoscopic surgical simulator using a rapid-prototyped pediatric chest model and motion and force sensors to evaluate surgeons' performance of the intracorporeal suturing and knot-tying task, and its construct validity was demonstrated by skill assessment experiments. 13 We demonstrated its construct validity by the video assessment method, 29-point checklist method,14–16 and error assessment sheet method,17,18 and compared the surgeons' performance on the pediatric chest model and a box trainer. 19 In our previous study,13,19 we placed the surgeons into groups of experienced and inexperienced surgeons by the number of laparoscopic fundoplication procedures performed because we had a small number of participants. In the present study, we placed the surgeons into skilled and unskilled groups using the status of certification by the Endoscopic Surgical Skill Qualification (ESSQ) system developed by the Japan Society for Endoscopic Surgery. The ESSQ system was developed to certify pediatric surgical experts by evaluating the number of pediatric surgeries performed (the surgical volume) and by evaluating video-recorded operations. 20 The ESSQ qualification is difficult to obtain, and in our previous study, we could not recruit a sufficient number of ESSQ-qualified surgeons. In the present study, the same experiments as those in our previous study were conducted in an additional 31 surgeons and the results of 53 surgeons in total, including eight ESSQ-qualified surgeons, were analyzed in this study.
Materials and Methods
The protocol of this study was approved by the Ethics Committee of the Graduate School of Medicine and Faculty of Medicine, The University of Tokyo (No. 10033). A detailed explanation of the study was provided to the examinees. Written informed consent was obtained from all examinees.
Experimental setup
We used the rapid-prototyped pediatric chest model of a 1-year-old patient that was used in our previous studies.13,19 A suture pad with a force-sensing capability (Suture Evaluation Simulator M57™; Kyoto Kagaku Co.) was placed in front of the third thoracic vertebra to simulate the thoracoscopic repair of esophageal atresia (Fig. 1a). The model was arranged in the left hemi-decubitus position (Fig. 1b). A 5-mm port was inserted as a camera port in the fifth intercostal space at the lower edge of the scapula, and a 5-mm 30-degree endoscope was used. Two 3-mm ports were inserted in the third intercostal space of the mid axillary line and in the fifth intercostal space of the dorsal area. For comparison with this pediatric model, a commercial box trainer with a flexible camera (K-ZWEI™; B. Braun Aesculap) with the same suture pad was prepared (Fig. 2). Deformation of the sponge that had been placed under the suture pad caused by force applied by a needle and surgical instruments was detected by photointerrupters, and the sum of the absolute displacement of the sidewalls of the sponge was used as a force index.13,21 A small force index indicates gentle manipulation and preferable needle insertion path. An electromagnetic motion tracking system (trakSTAR™, Mid-Range model; Ascension Technology Corporation) was used for real-time motion tracking of both the right and left instruments at 20 Hz. The path length of each of the right and left instrument tips was calculated after the experiment.

Pediatric chest model setup

Box trainer setup. The suture pad is placed in the center of the box trainer.
Skills assessment experiment
A skills assessment experiment was previously conducted at a national conference for pediatric surgeons in Japan in 2013. Thirty pediatric surgeons were recruited in the first experiment, but data of some examinees were invalid due to mechanical troubles of the system. 13 A second experiment was conducted at the same national conference in 2014. Data of the second experiment were used in the analysis of the participants who participated in the two experiments. The participants were grouped into the Skilled group and Unskilled group according to their status of ESSQ certification.
Each surgeon performed an endoscopic intracorporeal suturing and three knot-tying task in a box trainer setup, and then performed the same task in the pediatric chest model setup. Before each measurement, they practiced suturing for five minutes. The examinees were instructed to perform the task accurately, safely, and fast, and to close the open cut of the suturing pad using the intracorporeal slip knot technique if possible. A 5-0 PDS II suture with a 13 mm, 3/8 Circle needle (Z126H; Ethicon Endo-Surgery), whose thread was cut at 100 mm, was used. The suture pad, made of silicone rubber, was replaced with a new one in each trial. In the pediatric chest model setup, the thoracoscope was manipulated by the same pediatric surgeon. When a subject failed to finish the task, he/she was instructed to restart the task from the beginning. The task completion time, force index, width of the opening of the slit in the pad after suturing, and path length of the right and left instruments were measured.
Video-based skills assessment
The tasks were video recorded during the skills assessment experiments. The videos were rated by two blinded pediatric surgeons using two evaluation methods: the 29-point checklist method and the suturing errors score sheet method.14–18 The scoring sheets that we used are shown in Figures 3 and 4. The 29-point checklist method was first reported by Moorthy et al. in 2004. 14 This checklist consists of 29 items in six categories, and each item is scored as 1 or 0; we used this checklist without any modification. The suturing errors score sheet method was originally proposed by Van Sickle et al. for assessment of suturing and knot-tying in laparoscopic Nissen fundoplication. 17 The original definition of the errors was used, and a new item, failure to slip knot, 19 was added as shown in Figure 4. When a participant tore the suture pad and therefore performed the task again, the video of the successful task was used for analysis, and no penalty was given. This was because some suture pads were fragile, and performance of multiple trials did not always mean the lack of skills.

The score sheet of the 29-point checklist. The 29-point checklist method was first reported by Moorthy et al. 14 This checklist consists of 29 items in six categories. Each item is scored as 1 or 0.

The suturing errors score sheet. The suturing errors score sheet method was originally proposed by Van Sickle et al. 17 The original definition of the errors was used and a new item “Failure to slip knot” was added. 19 Each recorded video was divided into nine steps, and the observers checked for errors at each step. Multiple errors in a single step were counted as one error.
Statistical analysis
All data are expressed as median values (interquartile range). The results were compared between the Skilled group and Unskilled group in each setup using the Wilcoxon rank-sum test. The differences in results between the box trainer setup and pediatric model setup in each group were also analyzed using the Wilcoxon rank-sum test. To determine the interrater reliability of the checklist score method and the error score method, Cronbach's alpha coefficient was used. After confirmation of the interrater reliability of the methods, the average of the two observers' scores was used for assessment. All analyses were performed using the JMP statistical software (SAS Institute, Inc.), and a P value <.05 was deemed statistically significant.
Results
Fifty-three examinees in total who participated in the experiments in 2013 and 2014 were grouped into the Skilled group (n = 8, with ESSQ certification) and the Unskilled group (n = 45, without ESSQ certification). The characteristics of the examinees are summarized in Table 1.
Data are expressed as median values (interquartile range).
ESSQ system, endoscopic surgical skill qualification system.
Subjective video assessment by surgeons
The interrater reliability of the 29-point checklist score method was 0.87, and it was considered to be sufficiently high. The scores on the 29-point checklist in the Skilled and Unskilled groups are shown in Figure 5. The score of the Skilled group was significantly higher than that of the Unskilled group in the pediatric chest model setup [23 (2.6) vs. 20 (4.5), P = .009]; however, there was no significant difference in score when using the box trainer setup [23 (5.4) vs. 21 (4.3), P = .183]. There were no significant differences in scores between the box trainer setup and the pediatric chest model setup within each group (Skilled; P = .712, Unskilled; P = .322).

Comparison of the total score on the 29-point checklist between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The interrater reliability of the suturing error score method was 0.88. The total error scores are shown in Figure 6. The score of the Skilled group was significantly lower than that of the Unskilled group in the pediatric chest model setup [5.5 (2.5) vs. 10.5 (5.5), P = .001]; however, no statistically significant difference was observed in the box trainer setup [8.3 (5) vs. 7.5 (4.5), P = .980]. There were no significant differences between the scores when using the box trainer setup and the pediatric chest model setup in the Skilled group (P = .268), but the unskilled surgeons showed a significantly higher error score in the pediatric chest model than in the box trainer (P = .001).

Comparison of the total error score between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The number of needle manipulations is shown in Figure 7. The Skilled group showed a significantly smaller number of manipulations than the Unskilled group in the pediatric chest model setup [23 (10) vs. 29 (16), P = .027]; however, no significant difference was observed in the box trainer setup [18 (7.8) vs. 22 (9), P = .268]. There was no significant difference in the number of needle manipulations between the box trainer setup and the pediatric chest model setup in the Skilled group (P = .205), but the unskilled surgeons showed a significantly larger number of manipulations in the pediatric chest model than in the box trainer (P < .001).

Comparison of the number of needle manipulations between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
Objective assessment by sensors
The task completion time is shown in Figure 8. The Skilled group showed a significantly shorter completion time than the Unskilled group in the pediatric chest model setup [196 (76) sec. vs. 251 (153) sec., P = .015]; however, no statistically significant difference was observed in the box trainer setup [131 (80) sec. vs. 154 (87) sec., P = .157]. The task completion time of the pediatric chest model was significantly longer than that of the box trainer in both the Skilled group (P = .041) and the Unskilled group (P < .001).

Comparison of the task completion time between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The force index is shown in Figure 9. Data of two unskilled surgeons for the box trainer setup and six unskilled surgeons for the pediatric chest model were invalid in the force index analysis because of failure of the force measurement system. There were no significant differences in the force index values between the Skilled group and Unskilled group in both the pediatric chest model setup [1470 (834) vs. 2010 (1210), P = .130] and the box trainer setup [1830 (2150) vs. 1500 (967), P = 1.000]. There were no significant differences in the force index between the box trainer setup and the pediatric chest model setup in the Skilled group (P = 1.000), but the unskilled surgeons showed a significantly larger force index value in the pediatric chest model than in the box trainer (P = .027).

Comparison of the force index between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The width of the slit opening in the pad after suturing is shown in Figure 10. The width of the slit in the Skilled group was significantly smaller than that in the Unskilled group in the pediatric chest model setup [0 (0.8) mm vs. 0.8 (1.5) mm, P = .048]; however, no significant difference was observed in the box trainer setup [0.2 (0.6) mm vs. 0.6 (1.7) mm, P = .353]. There were no significant differences in width of the slit opening between the box trainer setup and the pediatric chest model setup in each group (Skilled, P = .325; Unskilled, P = .937).

Comparison of the width of the slit opening in the pad after suturing between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The path length of the right instrument's tip is shown in Figure 11. Data of four unskilled surgeons in the box trainer setup and five unskilled and one skilled surgeons in the pediatric chest model were invalid for analysis of the instrument path because of failure of the position tracking system. The Skilled group showed a significantly shorter path length than the Unskilled group in the pediatric chest model setup [2130 (1420) mm vs. 3550 (2240) mm, P = .019]; however, no significant difference was observed in the box trainer setup [2420 (861) mm vs. 2790 (1620), P = .250]. There were no significant differences in path length between the box trainer setup and the pediatric chest model setup in the Skilled group (P = 0.685), but the unskilled surgeons showed a significantly longer path length in the pediatric chest model than in the box trainer (P = .027).

Comparison of the tip path length of the right instrument between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
The path length of the left instrument's tip is shown in Figure 12. Data of four unskilled surgeons in the box trainer setup and five unskilled and one skilled surgeons in the pediatric chest model were invalid for the analysis of instrument path because of failure of the position tracking system. There were no significant differences between the Skilled group and Unskilled group in both the pediatric chest model setup [4970 (1340) mm vs. 5640 (3200) mm, P = .289] and the box trainer setup [2570 (1260) mm vs. 2820 (1180) mm, P = .209]. The left path length in the pediatric chest model was significantly longer than that in the box trainer in both the Skilled group (P = .002) and the Unskilled group (P < .001).

Comparison of the tip path length of the left instrument between the Skilled and Unskilled groups. The central black line is the median, data within the box are the interquartile range, and the ends of the vertical line denote the range. The points denote outliers. *P < .05 by Wilcoxon rank-sum test. n, the number of examinees.
Discussion
In this study, the skilled surgeons showed significantly better performance than the unskilled surgeons regarding six metrics using the pediatric chest model, and both groups showed similar results in all metrics using the box trainer. This result suggests that the task in a conventional box trainer is too easy for distinguishing skilled surgeons from unskilled surgeons who are also qualified to perform endoscopic pediatric surgery. In contrast, the skilled surgeons showed significantly superior performance in the pediatric chest model because high skills are required for the task in the model that well replicates the actual situation of pediatric thoracoscopic surgery. Specifically, the experiment replicated the pediatric narrow work space, exact shape of the rib, port placements, and presence of an assistant who manipulates a thoracoscope, which represent surgical technical challenges and also distinguish our model from other simulators. Barsness et al. also reported that experienced surgeons performed the procedure better than novice surgeons in their thoracoscopic esophageal atresia/tracheoesophageal fistula repair simulator, which has a thoracic rib cage model similar to ours. 10 In contrast to Barsness's simulator, we used a commercial suture pad with sensors inside the model instead of animal organs. The suture pad was replaced with a new suture pad in each experiment, and there were negligible differences in the pads. On the other hand, animal organs show wide variability. In the future, the suturing pad will be replaced with a sensorized artificial model with more realistic shape and mechanical properties.
The skilled surgeons showed equivalent performance in both setups, except that the performance in the pediatric model was worse regarding the task completion time and left path length. This suggests that the skilled surgeons put higher priority on the quality of the performance rather than speed and efficiency, which are represented by the task completion time and tool path.
Ieiri et al. 22 reported that pediatric surgeons who had received a 2-day endoscopic surgical training course designed for general surgeons demonstrated faster, more efficient, but less precise forceps manipulation than before training. This report suggested that pediatric surgeons need specific training to learn safe and precise surgical techniques for this field. Because our pediatric chest model replicates challenges that surgeons face in the actual pediatric surgical environment, the model would be useful to learn pediatric-specific skills.
Limitations of the present study include the small number of ESSQ-qualified pediatric surgeons. There are only 30 pediatric surgeons with the ESSQ certification in Japan.
In conclusion, the pediatric chest model was more useful than the conventional box trainer to distinguish skilled pediatric surgeons with ESSQ certification from other pediatric surgeons. This is because the simulator more accurately replicates the difficulties associated with pediatric thoracoscopic tasks. The skilled surgeons showed precise and secure performance both in the pediatric chest model as well as in the box trainer. In the future, the pediatric chest model would be useful for young pediatric surgeons to practice using a realistic and intricate model and learn to enhance both their precision and speed in pediatric surgery.
Footnotes
Acknowledgments
The authors thank Prof. Yuji Nirasawa of Kyorin University, Dr. Kosaku Maeda of Hyogo Children's Hospital, and The Japanese Society for Pediatric Endosurgery & Surgical Techniques for kindly providing the opportunity to conduct the experiments. The authors also thank Mr. Yusuke Tsukuda and Mr. Hideyuki Sato for their help during the experiments.
This study was partially supported by a Grant-in-Aid for Scientific Research (B) (No. 26293378), Grant-in-Aid for Scientific Research (S) (No. 23226006) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), and the project “Assessment methodology for innovative minimally invasive therapeutic devices, materials, and nano biodiagnostic devices” from Accelerating regulatory science initiative, the Ministry of Health, Labour, and Welfare (MHLW), Japan.
Disclosure Statement
No competing financial interests exist.
