Abstract
Abstract
Purpose:
Thoracoscopic lobectomy in infants requires advanced minimally invasive skills. Simulation-based education has the potential to improve complex procedural skills without exposing the patient to undue risks. The study purposes were (1) to create a size-appropriate infant lobectomy simulator and (2) to evaluate validity evidence to support or refute its use in surgical education.
Materials and Methods:
In this Institutional Review Board-exempt study, a size-appropriate rib cage for a 3-month-old infant was created. Fetal bovine tissue completed the simulator. Thirty-three participants performed the simulated thoracoscopic lobectomy. Participants completed a self-report, 26-item instrument consisting of 25 4-point rating scales (from 1=not realistic to 4=highly realistic) and a one 4-point Global Rating Scale. Validity evidence relevant to test content and response processes was evaluated using the many-facet Rasch model, and evidence of internal structure (inter-item consistency) was estimated using Cronbach's alpha.
Results:
Experienced surgeons (observed average=3.6) had slightly higher overall rating than novice surgeons (observed average=3.4, P=.001). The highest combined observed averages were for the domain Physical Attributes (3.7), whereas the lowest ratings were for the domains Realism of Experience and Ability to Perform Tasks (3.4). The global rating was 2.9, consistent with “this simulator can be considered for use in infant lobectomy training, but could be improved slightly.” Inter-item consistency for items used to evaluate the simulator's quality was high (α=0.90).
Conclusions:
With ratings consistent with high physical attributes and realism, we successfully created an infant lobectomy simulator, and preliminary evidence relevant to test content, response processes, and internal structure was supported. Participants rated the model as realistic, relevant to clinical practice, and valuable as a learning tool. Minor improvements were suggested prior to its full implementation as an educational and testing tool.
Introducion
D
Ericsson 12 identified deliberate practice in domain-specific activities and challenging the status quo as the two most consistent behaviors of expert performers. In the development of expert surgical performance, domain-specific activities that have similar cognitive, technical, and nontechnical challenges are exceedingly difficult to find outside of the index procedure. Fortunately, innovative strategies for simulation-based education have emerged to meet a surgeon's need for domain-specific activities and a challenging environment.
Meyerson et al. 13 created a novel adult VATS lobectomy simulator using a box and a “blood”-infused heart/lung block of explanted porcine tissue. Data demonstrate that the VATS lobectomy simulator reliably discriminated between different learner groups and was a useful educational tool for teaching VATS lobectomy to general surgery residents.14,15 In our laboratory, we used similar principles to create several hybrid neonatal simulators for rare congenital anomalies, including esophageal atresia with tracheoesophageal fistula, duodenal atresia, and diaphragmatic hernia.16–19 In this work, we sought to develop and evaluate a novel hybrid simulator that replicates the cognitive, technical, and nontechnical skill challenges of performing a thoracoscopic lobectomy in an infant. The purposes of this study were (1) to create a realistic, size-appropriate infant lobectomy simulator and (2) to evaluate three levels of validity evidence—test content, response processes, and internal structure—to support or refute its use in pediatric surgical education.
Materials and Methods
Study setting
After review and exempt determination by Ann and Robert H. Lurie Children's Hospital of Chicago Institutional Review Board, data were collected during an advanced minimally invasive skills course offered in Chicago, IL. In total, 33 pediatric surgeons contributed to this study. Participants were categorized as “experienced” or “novice,” based on self-reported experience with thoracoscopic lobectomy: 11 surgeons were identified as “experienced,” having a mean of 13 (range, 4–50) self-reported thoracoscopic lobectomies; 20 surgeons were identified as “novice,” having a mean of 2 (range, 0–3) self-reported thoracoscopic lobectomies; and 2 participants did not report prior experience with thoracoscopic lobectomy.
Simulator
Literature review and computed tomography images were used to create computer-aided design drawings of an infant chest. The left side of the chest was then selectively converted into a three-dimentional printing file appropriate for rapid prototyping machinery. The left side of the ribcage was printed in acrylonitrile-butadiene-styrene plastic on a fused-deposition printer and then embedded in a base of platinum-cured silicone (Fig. 1). A mediastinal block of second-trimester fetal bovine tissue (Animal Technologies, Irvine, TX) was then surgically modified for use in the simulator. The left pulmonary artery and vein were selectively injected with “blood” (diluted ketchup solution), as previously described by Meyerson et al., 13 and the tissue was positioned in the simulator (Fig. 2). Participants were provided with 3-mm instruments and a 4-mm telescope (Karl Storz Endoscopy-America, Segundo, CA) and a 3-mm vascular sealing device (Just Right Surgical, Boulder, CO) for completion of a left lower lobe lobectomy. The anatomic relationships among the bovine pulmonary artery, left lower lobe bronchus, and pulmonary vein are the same as the left lower lobe in infants. The only relevant anatomic difference with the bovine lung tissue is the lack of a complete major fissure, necessitating completion of the fissure prior to dissection of the vessels and bronchus.

Three-dimensional printed, composite, left-sided infant ribcage for use in a simulated thoracoscopic lobectomy procedure.

Internal view of a simulated infant thoracoscopic lobectomy.
Measures and rating procedures
All participants completed a self-report survey following their experience with the simulator. The 26-item survey consisted of 25 4-point rating scales measuring six domains (Physical Attributes, Realism of Materials, Realism of Experience, Ability to Perform Task, Value, and Relevance) and one 4-point Global Rating Scale to measure participants' overall impression of the simulator.
Analyses
In order to evaluate validity evidence, we used the Standards for Educational and Psychological Testing (Standards), the guide developed jointly by American Education Research Association, the American Psychological Association, and the National Council on Measurement in Education. 20 The current Standards framework identified five different sources of validity evidence: (a) test content, (b) internal structure, (c) response processes, (d) relationships to other variables, and (e) consequences of testing. We used this work to evaluate three sources of validity evidence—test content, response processes, and internal structure.
To analyze validity evidence from the three sources, we used methods from both modern measurement and classical test theories. To analyze validity evidence relevant to test content, we used a Rasch model to analyze the subjective measures from the self-report survey, focusing on two Rasch indices—observed averages and point-measure correlation statistics. 21 For the purpose of this study, high observed averages from the survey suggest high perceived value for the simulator, whereas positive point-measure correlations for the survey attest to the “psychometric soundness” of the items in each instrument. Once established, these conditions support the assumption that participants' ratings reflect the intended concepts—perceived value of the simulator and quality of performance during a lobectomy. To evaluate validity evidence relevant to response processes, we examined rating differences across participants' experience levels using a many-facet Rasch model. Analyses were performed using the Facets software version 3.68 (Linacre 2011). To evaluate validity evidence relevant to internal structure, we estimated inter-item consistency using Cronbach's alpha. Statistical analysis was performed using IBM SPSS statistical software (version 22.0; IBM Corp., Armonk, NY).
Results
Evidence relevant to test content
Observed averages
For the survey items, the combined observed averages of the six domains were 3.7 (Relevance to Practice), 3.7 (Physical Attributes), 3.6 (Value), 3.5 (Realism of Materials), 3.4 (Realism of Experience), and 3.4 (Ability to Perform Task). Closer examination indicated the highest-rated items from the survey were “Physical attributes—chest circumference and chest depth” (3.8), “Physical attributes—intercostal space” (3.7), “Value of the simulator as a training tool” (3.8), and “Relevance to practice” (3.8), whereas the lowest ratings were associated with “Realism of materials—mediastinum” (3.2) and “Ability to perform tasks—dissection/ligation of pulmonary artery and vein” (3.3). The observed average of the global opinion ratings was 2.9 (out of 4.0), indicating that, on average, participants believed the thoracoscopic lobectomy simulator “could be considered for training, but could be improved slightly.”
Point-measure correlations
For the survey, all of the 26 items had positive point-measure correlations (range, 0.36–0.76). This indicates that each item of the survey contributed useful information to the construct as a whole.
Evidence relevant to response processes
Experienced surgeons (observed average=3.6) had slightly higher overall ratings than novice surgeons (observed average=3.4) (P=.001). Although practically minor, rating differences across experienced and novice surgeons did require deeper examination (Table 1). When participant groups (experienced versus novice) were individually examined, four items were ranked higher by the experienced surgeons: “Physical attributes—scale of tissue within the chest” (3.8 versus 3.6, P=.008), “Overall realism of materials” (3.8 versus 3.5, P=.008), “Value as a training tool” (3.9 versus 3.7, P=.003), and “Relevance to practice” (3.8 versus 3.7, P=.02). No detectable differences existed between participant group responses for the other 21 items. Although experienced surgeons' global rating (2.9) was slightly lower than novices' rating (3.0) (P=.02), the difference was practically insignificant.
Where 3=adequate realism as is, but could be improved.
Where 3=difficult to perform.
Where 3=some value.
Examination of item outfit statistics indicated acceptable variability for 22 out of the 24 items (Outfit mean square, <1.5). The item outfit statistic for “Realism of experience—chest wall resistance” was 1.65, indicating higher variability in responses than was expected. The item outfit statistic for “Relevance to practice” was very high (Outfit mean square, 4.13), indicating extreme variability in this specific item. Inconsistencies may highlight problematic response patterns such as carelessness, or item bias that can interfere with the measurement of the construct that is intended.
Evidence relevant to internal structure
Inter-item consistency estimates of the 21 items relevant to simulator quality (items 1–17, 23–26; α=0.92) and of the five items relevant to participants' ability to perform the critical tasks using the simulator (items 18–22; α=0.90) were estimated to be high. This index offers a measure of control and, when adequately high, indicates these assessment items are grouped appropriately and measure the same general construct. This allowed us to make inferences from our findings with a high degree of confidence and offers evidence of internal structure.
Discussion
In the adult surgical literature, several studies have demonstrated that simulation-based surgical education leads to improved operative performance, decreased operating times, fewer errors, and improved operative decision-making.22–25 Unfortunately, similar data do not exist for general pediatric surgery. A significant barrier to the acquisition of these data is the lack of relevant and realistic simulators that are able to address the unique needs of a pediatric surgical trainee. Additionally, the advanced technical proficiency of novice pediatric surgeons necessitates a much more comprehensive simulation experience to provide a challenging learning environment. To that end, we sought to create a size-appropriate infant lobectomy simulator and evaluate three levels of validity evidence—test content, response processes, and internal structure—to support or refute its use in pediatric surgical education.
Initial validity evidence suggests that our hybrid thoracoscopic lobectomy simulation model is relevant to clinical practice, is valuable as an educational tool, and realistically recreates the relevant size and anatomic features of a thoracoscopic lobectomy in an infant. These findings are supported by high observed averages across all domains, positive point-measure correlations for all items, and high estimates of internal consistency across all quality measures of the simulator. These results support validity evidence relevant to test content, response processes, and internal structure, as defined by the Standards for Educational and Psychological Testing. 20
The strongest domain was Value, with “Value as a training tool” scoring the highest of all items on the survey. This finding strongly supports our ultimate development goal, to create a simulator that is useful for pediatric surgical education. It is also consistent with some of our previous work evaluating a thoracoscopic esophageal atresia/tracheoesophageal fistula repair simulator, for which “Value as a training tool” was similarly rated highly by participants. 18 Participant selection bias may be present in the rating for this particular domain, given that these data were collected during a course for advanced minimally invasive surgery for pediatric surgical trainees. However, previous esophageal atresia/tracheoesophageal fistula data collection at a national pediatric surgery meeting had similarly high Value rankings, without participants being enrolled in an educational course. 18
Ratings in the Physical Attributes domain were also high across all items measured, especially the attributes of the synthetic ribcage and resultant space with the infant chest. These findings were consistent with our expectations, given that exact measurements were taken of infant ribcages to arrive at the final dimensions of the chest. Perhaps the more valuable finding is that “Scale of the tissue inside the chest” was also highly rated. Although the tissue comes from a second-trimester fetal calf, there is a range of calf sizes during the second trimester of a bovine pregnancy. Yet, despite the variability, the tissue appears to be appropriate to the infant thorax dimensions. The scale of the tissue was also one of three items where experienced surgeons rated the item higher than novice surgeons. These findings further support the high realism of the simulator's physical attributes, as experienced surgeons would be expected to be able to make more accurate comparisons with infants undergoing a thoracoscopic lobectomy.
The Realism of Experience and Ability to Perform Tasks were the lowest rated of all of the domains. There are several factors that may account for these findings. First, bovine lung anatomy is notable in that it lacks a complete fissure separating the lower lobe from the upper lobe and lingular segment. Second, the vascular supply and bronchus to the lingular segment branch off within 1–2 cm of the first branches of the lower lobe. Although these two findings are not outside the realm of human anatomic variation, they are not commonly encountered in infants. Second, the pulmonary vein of the bovine tissue appears to be more fragile than the pulmonary vein in infants. Many novice participants struggled with the vein dissection, encountering “bleeding” (extravasation of ketchup from the lumen of the vein) during attempts to dissect out individual branches. Although these anatomic variations and vessel fragility are not entirely consistent with the majority of lobectomies in infants, they do create a more challenging operation for the learners. Taken in the context of the observations of Ericsson 12 on expert performance, “challenging the status quo” is one of the behaviors most commonly identified among experts in any given domain. Although serendipitous, the anatomic variations are ideal for an educational tool to be used by even the most experienced of learners. Additionally, most would argue that learning to take extreme care during the dissection of the pulmonary vein can only be a good skill to learn early in the course of the learning curve for thoracoscopic lobectomy. The final factor affecting Realism of Experience and Ability to Perform Tasks was not specifically queried on the survey.
However, participant comments noted that the position of the tissue within the space was too inferior and that the tissue was not well stabilized within the model. These are important design flaws as they could lead to learned behaviors that are not advantageous, or even safe, in the operating room. Specifically, trainees may learn to place ports into lower intercostal spaces, which then might lead to safety and efficacy concerns. Additionally, strategies for tissue retraction learned in simulation may be ineffective or result in inadvertent organ and tissue damage in the operating room. In these situations, deliberate practice of incorrect or flawed techniques could lead to worse patient outcomes, rather than improved quality and safety.
Finally, the global rating was 2.9, consistent with “this simulator can be considered for use in infant lobectomy training, but could be improved slightly.”
We have addressed the above-mentioned design flaws with significant structural modifications for future use, and the survey has been modified to allow continued evaluation of these quality measures of the simulator. Therefore, ongoing evaluation of validity evidence will include these key structural refinements.
The final domain evaluated was Value. Experienced surgeons rated both “Relevance to practice” and “Value as a training tool” higher than novice surgeons. Although statistically significant, the difference between the two groups was negligible. Yet, the overall perception of all participants is that the model is not only valuable for training purposes, but that it is relevant to their current or anticipated practice as a pediatric surgeon. It should be noted that the item outfit statistics for “Relevance to practice” did indicate significant variability in responses. This is not unexpected, given that the majority of participants were novice pediatric surgeons and did not have a specific practice against which to accurately compare the experience.
There are several limitations related to the interpretation and applications of the findings presented in this study. First, the data were collected during the course of the annual advanced minimally invasive surgery course offered to all pediatric surgery trainees in the United States and Canada. The “experienced” group was composed of advanced, technically skilled minimally invasive surgeons who volunteer their time every year to teach at the fellows' course. Although specific techniques and approaches to thoracoscopic lobectomy may vary among surgeons, the mindset of all faculty members is strongly focused on education. The relatively homogeneous nature of the experienced group may have decreased the variability of the ratings. Additionally, the “novice” group had a relatively narrow experience with infant lobectomy, which may have limited their ability to accurately evaluate the simulator. Expansion of the participant groups outside of fellow educational courses will likely increase the variability of ratings, thereby strengthening future modifications to ensure broad applicability to all learner groups. Finally, we have only begun to examine evidence relevant to test content, internal structure, and response processes. Validity evidence relevant to relationships to other variables and consequences of testing has not yet been evaluated.
Equally as important as the validity evidence are the barriers to full implementation of the model as an educational tool. We have not yet developed a comprehensive simulation-based curriculum that addresses all issues of surgical competency for thoracoscopic lobectomy. Key components of such a curriculum would include (1) mandatory participation, (2) proficiency-based milestones, (3) a distributed training schedule, and (4) some degree of overtraining. 26 Additionally, the cost of simulation-based training can be high, including the costs of the models, instruments, and/or equipment and personnel costs for comprehensive team simulation. Also, dedicated faculty surgeons need to be identified and trained in simulation-based educational practices, including (1) the provision of a safe and supportive learning environment, (2) training in debriefing strategies that facilitate performance improvement, and (3) education on the principles of deliberate practice, proficiency-based milestones, and objective assessments of performance.
Finally, patient-specific outcomes need to be measured in the settings of conventional training and simulation enhanced training. It is only through improved patient outcomes that the value of simulation-based education can be fully realized. Content validity, as evaluated in this study, is only the first of many steps towards improving patient safety through simulation. Subsequent work on patient-specific outcome measures will be difficult, given the rarity of so many of the complex procedures performed by pediatric surgeons. Yet, operations that are at high risk for perioperative adverse events and are rare are perhaps best suited for simulation. With multi-institutional commitment, rigorous study design, and broad-based participation, these outcomes data are attainable once sufficient validity evidence exists to support the use of specific pediatric surgical simulation models.
In conclusion, we have successfully created a realistic and relevant thoracoscopic infant lobectomy simulation model. Initial validity evidence relevant to test content, response processes, and internal structure supports further structural refinement and the collection of additional validity evidence from a refined model.
Footnotes
Acknowledgments
The authors would like to thank Northwestern Simulation at Northwestern University Feinberg School of Medicine for the continued support of our research. We would also like to thank David Irvin, Manager of Simulation Operations, Northwestern Simulation, for his never-ending enthusiasm and commitment to the success of our educational research. Finally, we would like to add a special thank you to Shari Meyerson, MD, Division of Thoracic Surgery, Northwestern University Feinberg School of Medicine, for the inspiration and the assistance with development of a pediatric lobectomy model, using the techniques she had already perfected for adult VATS-assisted lobectomy. Without Dr. Meyerson's seminal work in thoracic education, none of this work would have been possible.
Disclosure Statement
No competing financial interests exist.
