Abstract
Abstract
Purpose:
Pediatric endoscopic surgery requires special surgical skills because of the small working space and tissue fragility. This article presents a video-based skill assessment method for endoscopic suturing using a pediatric chest model.
Materials and Methods:
A commercial suture pad was placed in a rapid-prototyped pediatric chest model of a 1-year-old patient to simulate the thoracoscopic repair of esophageal atresia type C. Twenty-eight pediatric surgeons (9 experts, 9 intermediates, and 10 trainees) performed an endoscopic intracorporeal suturing and knot-tying task both in the pediatric chest model and in a box trainer. The tasks were video-recorded and rated by two blinded observers using the 29-point checklist method and a suturing errors score sheet method. The task completion time and the number of needle manipulations were measured.
Results:
The expert group showed better performance than the intermediate and trainee groups in the pediatric chest model, and the differences were larger than those in the box trainer. Significant differences between the expert and the trainee groups were observed in the items related to safety such as the skills for keeping the needle in view at all times. Significant differences between the expert and intermediate groups were observed in the items related to task quality and efficiency such as the smoothness of knot tying and the number of needle manipulations.
Conclusions:
Video-based skill assessment of endoscopic suturing using the pediatric chest model and a box trainer distinguished pediatric endoscopic surgeons according to their clinical experience, and pediatric-specific skills were identified.
Introduction
P
Another problem in the pediatric domain is related to the training of minimally invasive surgical skills. Inexperienced pediatric surgeons have to improve those skills, but acquiring sufficient experience is difficult in the pediatric endoscopic surgery field. The wide variety of pediatric diseases often accompanied by congenital anomalies limits opportunities to experience many specific operations, and pediatric-specific training tools are not available. Commercially available box trainers are designed for the training of adult laparoscopy, and child-size trainers of this type are not generally available. Ieiri et al. 3 reported that pediatric surgeons who underwent endoscopic surgical training together with general surgeons demonstrated better economy and faster speed in forceps manipulation, but the number of errors was increased. This suggests that the development of pediatric-specific training method is essential.
Few studies on pediatric-specific skill assessment and training have been reported. Some studies showed that small box trainers or virtual reality trainers simulating a pediatric small body cavity are useful.4,5 Training using animal models is also useful, 6 but it is costly and time-consuming and has ethical problems. Recently, several realistic training models have been reported. Ieiri et al. 7 developed a suture ligature model of the crura of the diaphragm to assess the skills required for laparoscopic fundoplication. Davis et al. 8 and Barsness et al. 9 developed three-dimensional rapid-prototyped models simulating esophageal atresia and congenital diaphragmatic hernia. Unfortunately, these models are not commercially available, probably because of the small market for pediatric applications. Furthermore, companies and engineers don't recognize the need for pediatric-specific training tools because pediatric-specific skills have not been properly delineated in the field.
For these reasons, we have developed a three-dimensional rapid-prototyped pediatric chest model equipped with numerous sensors for quantitative assessment of pediatric surgical skills. This model has been used to evaluate intracorporeal suturing tasks by analyzing the sensor data. 10 Here, we present an analysis of video-based assessment of the suturing task in the pediatric chest model, compared with performance in a conventional box trainer. We demonstrate the usefulness of the pediatric chest model for assessing the specific skills required for pediatric minimally invasive surgery.
Materials and Methods
The protocol of this study was approved (protocol number 10033) by the Ethical Committee of the Graduate School of Medicine and Faculty of Medicine, The University of Tokyo. Examinees were well briefed on the experiment, and written informed consent was obtained from all.
Development of the pediatric chest model and assessment method
This section summarizes previously reported work 10 ; the data were collected in that study but are analyzed here. A pediatric chest model comprising a three-dimensional rapid-prototyped pediatric rib cage with accurate anatomical dimensions was developed using computed tomography volume data of a 1-year-old male patient. A commercially available suture pad (Suture Evaluation Simulator M57; Kyoto Kagaku Co., Kyoto, Japan) was placed inside the rib cage in front of the third thoracic vertebra to simulate the thoracoscopic repair of esophageal atresia type C (Fig. 1a). The pediatric chest model was arranged in the left hemidecubitus position. A 5-mm port was inserted as a camera port in the fifth intercostal space at the lower edge of the scapula, and a 5-mm 30° endoscope was used. Two 3-mm ports were inserted into the right thoracic cavity in the third intercostal space of the midaxillary line and in the fifth intercostal space of the dorsal area. This port placement is commonly used for surgery of esophageal atresia type C. For comparison with this pediatric model, a commercial box trainer with a flexible camera (K-ZWEI; B. Braun Aesculap, Tuttlingen, Germany) with the same suture pad was prepared (Fig. 2a).

Pediatric chest model setup.

Box trainer setup.
A skills assessment experiment was conducted at a domestic pediatric conference by recruiting 28 pediatric surgeons whose characteristics are shown in Table 1. Examinees were stratified into three groups according to their laparoscopic fundoplication experience: an expert group of 9 surgeons with a caseload of 20 or more; an intermediate group, also of 9 surgeons, with 1–19 cases; and a trainee group of 10 surgeons with no laparoscopic fundoplication experience.
JSES, Japan Society for Endoscopic Surgery; SD, standard deviation.
Each surgeon performed an endoscopic intracorporeal suturing and three knot-tying task in a box trainer setup (Fig. 2b) and then in the pediatric chest model setup (Fig. 1b). Before each measurement, they practiced suturing for 5 minutes. The examinees were instructed to perform the task accurately, safely, and quickly and to close the open cut of the suturing model using the intracorporeal slip knot technique if possible. A 5-0 PDS II suture with a 13-mm, 3/8-inch circle needle (Z126H; Ethicon Endo-Surgery, Cincinnati, OH), whose thread was cut at 100 mm, was used. The polyurethane rubber disposable suturing skin model was replaced with a new one for each trial. In the pediatric chest model setup, the thoracoscope was manipulated by the same pediatric surgeon. When a subject failed to finish the task, he or she restarted the task from the beginning. The tasks were video-recorded.
Video-based skills assessment
The videos were rated by two blinded pediatric surgeons using two evaluation methods: the 29-point checklist method and the suturing errors score sheet method.11–14 The rubrics we used are shown in Tables 2 and 3. The observers learned each scoring method using sample videos until the scoring results between them agreed. The 29-point checklist method was first reported by Moorthy et al. 11 in 2004. This checklist consists of six categories of 29 items, which are scored as 1 or 0. The total score and the subtotal score of each category were compared among the three test subject groups for each setup. We modified the suturing errors score sheet method originally proposed by Sickle et al. 14 for the assessment of suturing and knot-tying in the laparoscopic Nissen fundoplication. The original definition of the errors was used, and a new item was added as shown in Table 4. Each recorded video was divided into nine steps as defined in Table 5, and the observers checked for errors at each step. Multiple errors in a single step were counted as one error. The total error count and the subtotal score of each error item were compared among the three groups for each setup. In case of repeated trials for the task, the video of the successful task was used for analysis, but a penalty was given by decreasing the checklist score by 20% and increasing the error score by 20%. Additionally, the task completion time and the number of the needle manipulations 14 were measured from the videos by one observer and compared among the three groups for each setup.
Statistical analysis
The parameters were compared across the three groups using the Kruskal–Wallis test. The Steel–Dwass test was used to analyze the differences among the groups. To determine the interrater reliability of the checklist score method and the error score method, Cronbach's alpha coefficient was used. After confirmation of the interrater reliability of the methods, the average of the two observers' scores was used for assessment. All analyses were performed using JMP statistical software (SAS Institute, Inc., Cary, NC), and a P value of <.05 was deemed statistically significant.
Results
There was a significant difference across the three groups for the total score of the 29-point checklist in the box trainer (P=.024) and in the pediatric model (P=.016) (Fig. 3). The score of the expert group was significantly higher than the trainee group in the box trainer (P=.047) as well as in the pediatric model (P=.027). The score of the intermediate group was close to that of the expert group and higher than that of the trainee group in the box trainer (P=.101), but lower than the expert group in the pediatric model (P=.094).

Comparison of the total score of the 29-point checklist. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.
There was a significant difference across the three groups for the total error score in the box trainer (P=.010) and in the pediatric model (P=.032) (Fig. 4). The expert group showed significantly lower scores than the trainee group in the box trainer (P=.030) and in the pediatric model (P=.024). The score of the intermediate group was close to the expert group and lower than the trainee group in the box trainer (P=.051), but close to the trainee group in the pediatric model.

Comparison of the total score of the suturing errors score sheet. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.
There was also a significant difference in the completion time across the three groups in the box trainer (P=.025) and in the pediatric model (P=.004) (Fig. 5). The expert group completed the task significantly faster than the trainee group both in the box trainer (P=.034) and in the pediatric model (P=.005). The task completion time of the intermediate group was close to that of the expert group in the box trainer, but longer than that of the expert group in the pediatric model. There was a significant difference across the three groups for the number of needle manipulations in the pediatric model (P=.011) but not in the box trainer (Fig. 6). The number of needle manipulations of the expert group was significantly lower than that in the intermediate group (P=.045) and the trainee group (P=.019) in the pediatric model.

Comparison of the completion time. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.

Comparison of the number of needle manipulations. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. The points denote outliers. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.
Comparison of the subtotal score of the checklist categories and error items showed significant differences across the three groups in the pediatric chest model but not in the box trainer for the items “Pulling the suture through” (P=.008), “Technique of knotting” (P=.003) (Fig. 7), and “Needle out of view” (P=.004) (Fig. 8). The score of the expert group was significantly higher than that of the trainee group for “Pulling the suture through” (P=.011); also, the score of the expert group for “Technique of knotting” was significantly higher than that of both the intermediate and the trainee groups (P=.034 and P=.003, respectively). Finally, the error score of the expert group for “Needle out of view” was significantly lower than that of the trainee group (P=.010).

Comparison of the subtotal score of categories in the 29-point checklist. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.

Comparison of the subtotal score of “Needle out of view” in the suturing errors score sheet. The central black line is the median, the data within the box are the interquartile range, and the ends of the vertical line denote the whole range. The points denote outlines. *P<.05 by Steel–Dwass test. EX, expert group; IN, intermediate group; TR, trainee group.
The interrater reliability for the 29-point checklist score method was 0.86, and that for the suturing error score method was 0.90. Both values were considered to be sufficiently high.
Discussion
The results show that the expert group performed endoscopic suturing significantly better than the trainee group in both setups for almost all metrics. This suggests that these video-based skill assessment methods are able to distinguish differences in relation to the clinical experience among the examinees. More important is that assessment in the pediatric chest model was able to detect relevant differences between the expert group and the intermediate group. The overall performance of the intermediate group was equivalent or slightly worse than that of the expert group in the box trainer, whereas it was clearly worse in the pediatric chest model. This suggests that surgeons in the intermediate group lacked the surgical skills required for optimal achievement of the task in a small workspace, whereas more experienced surgeons had acquired these skills. Thus, the items that distinguish the experts from the intermediates must be the pediatric-specific skills which the expert surgeons acquired by performing many clinical procedures. Hence, video-based skills assessment using the pediatric chest model setup is better for the assessment of pediatric-specific expert skills than using the box trainer setup.
In this study, the video-based skill assessment items were more useful than completion time for detecting differences among the examinees. Faster surgery is always preferable, and thus the task completion time is often used as a valid metric for surgical skill assessment. 15 However, other metrics such as quality and safety are more important for clinical outcomes than the speed. Because expert pediatric surgeons perform surgery precisely and gently, the 29-point checklist method designed for quality assessment and the suturing errors score sheet method designed for safety assessment are better for pediatric skill assessment. Another advantage of these methods is the high interrater reliability, one of the reasons for which is probably that every checklist item was anchored by explicit criteria for that particular item to be scored as a 1 or a 0. 11
The analysis of the subtotal scores of all checklist categories and individual errors revealed significant differences between the expert and trainee groups in the pediatric chest model setup regarding items related to the ability to keep the needle in view at all times, techniques for avoiding possible tissue damage, and the knot-tying technique. These skills are related to safety and can be acquired in an initial phase of pediatric endoscopic surgical training. In the pediatric chest model setup, significant differences were observed between the expert group and the intermediate group in those items related to the number of needle manipulations and to the knotting technique. These skills are related to task quality and efficiency 14 and are required to become an expert pediatric endoscopic surgeon. In other words, the pediatric-specific expert skills can be related to task quality, efficiency, and safety.
Although intracorporeal suturing, especially requiring the slip knot technique, is rather unfamiliar to pediatric surgeons, it is nonetheless preferable for pediatric skill assessment compared with other tasks such as a peg transfer, a pattern cutting, ligation loop, or extracorporeal suturing. 4 Additionally, the PDS suture used in the experiment was not sufficiently flexible, and its manipulation took some time for inexperienced surgeons. As fumbles in the knot-tying task negatively influence both the 29-point checklist score and the suturing errors score, the difficulty of suture manipulation resulted in large differences of the scores between the expert group and the trainee group.
Limitations of the present study include the difficult grouping of the examinees. They were divided into three groups on the basis of their laparoscopic fundoplication experience because this operation is the most common among all advanced endoscopic applications in the pediatric surgical field. We considered surgeons who had performed 20 or more fundoplication operations as experts equivalent to certified surgeons, although fundoplication is rarely performed in neonates and infants. The Japanese qualification system for pediatric endoscopic specialists requires applicants to have experience of at least 20 advanced endoscopic operations such as fundoplication. 2 Thus, this definition of expert was applied in this study, although it does not refer specifically to pediatrics. Another limitation relates to the veracity of the pediatric chest model, which reproduced only some pediatric-specific features, including a small body cavity, interference by instruments and arms, a narrow endoscopic view, and the port placement. Simulation of hemorrhage, tissue fragility, and the tense atmosphere would improve the reality of the model in the future.
Future development of techniques for training for the identified pediatric-specific skills would make the pediatric chest model a good endoscopic surgical training and assessment platform for pediatric surgeons. The experimental results of the present study suggest that advanced learners such as the intermediate surgeons need training in a realistic setup, for which the pediatric chest model can be a good option. In the future, task performance improvement by training using the pediatric chest model will be investigated in both experimental and clinical settings.
In conclusion, video-based skill assessment of endoscopic suturing in a pediatric chest model and a box trainer was applied to evaluate pediatric endoscopic surgeons with different degrees of clinical experience. The expert group demonstrated a better performance than the intermediate and trainee groups in the pediatric chest model; these differences were larger than seen for the box trainer. The results suggest that this method can better assess pediatric-specific expert skills acquired by performing many clinical procedures. Safety was assessed by the ability of the operator to keep the needle in view at all times and by the techniques employed for avoiding possible tissue damage. Quality and efficiency, as a measure of advanced pediatric-specific skills, were assessed by smooth knot tying and efficient needle manipulation. The pediatric chest model with a training program for the identified pediatric-specific skills for safety, quality, and efficiency is a superior endoscopic surgical training and assessment platform for pediatric surgeons.
Footnotes
Acknowledgments
The authors thank Prof. Yuji Nirasawa of Kyorin University for kindly providing the opportunity for conducting the experiments. This study was partially supported by a Grant-in-Aid for Scientific Research (B) (number 26293378), Grant-in-Aid for Scientific Research (S) (number 23226006) from the Ministry of Education, Culture, Sports, Science and Technology, and the project “Assessment methodology for innovative minimally invasive therapeutic devices, materials, and nano-bio diagnostic devices” from the Accelerating Regulatory Science Initiative, Ministry of Health, Labour and Welfare, Japan.
Disclosure Statement
No competing financial interests exist.
