Abstract
Abstract
Background and Aims:
The results of studies comparing two-dimensional (2D) and three-dimensional (3D) laparoscopy have shown variable results. We aimed to review the literature and develop an appropriate instrument to compare 2D and 3D laparoscopy. We further aimed to use the data extracted to perform a pilot study.
Methods:
Sixty-seven recent articles on 3D laparoscopy were reviewed and data extracted on factors influencing outcome variables. These variables were used to design a pilot study of 28 novices using a randomized crossover design. The results were analyzed using descriptive statistics and the Wilcoxon signed-rank tests.
Results:
Seven themes were identified to influence the outcome of 3D studies: applied technique (1), experience of subjects (2), study design (3), learning curve (4), subjective qualitative reports (5), laparoscopic tasks (6), and chosen outcome variables (7). The consecutively developed five laparoscopic simulation tasks contained placing a rubber band over hooks, ring and pearl transfer, threading a pipe cleaner through loops, and placing a suturise. The pilot study showed a primary benefit of 3D laparoscopy that was unrelated to repetition. Two tasks served well to assess first-time performance, and two tasks promise to serve well to assess a learning curve if performed repeatedly.
Conclusion:
We were able to identify important issues influencing the outcome of studies analyzing 3D laparoscopy. These may help evaluate future studies. The developed tasks resulted in meaningful data in favor of 3D visualization, but further studies are necessary to confirm the pilot test results.
Introduction
E
Nevertheless, 3D systems have not been universally accepted and are not standard equipment at most hospitals. This is likely related to the high cost of these systems and absence of clear data supporting a benefit. The studies published to date have had variable study designs, small numbers, or subjective data. We aimed to perform a qualitative analysis of the available literature to create a tool to evaluate 3D laparoscopy studies. This tool would hopefully allow an objective evaluation of existing studies and allow for a high quality of future studies. We further aimed to perform a pilot study to compare two-dimensional (2D) and 3D laparoscopy.
Materials and Methods
The first part of our study was to perform a qualitative analysis of the literature to find factors influencing the outcome of 3D laparoscopy studies. We focused on recent reviews and the references contained within those reviews. A literature search (PubMed/MEDLINE and Google Scholar) with the search terms “Three-dimensional,” “3D,” “laparoscopy,” and “review” was used to identify relevant articles. The search was limited to articles published since 2014 and written in English. The “Preferred Reporting Items For Systematic Reviews and Meta-Analyses” (PRISMA) guidance were applied. 8
Eligible reviews were identified and their references extracted. Doubles were removed. The abstracts of all references were scanned and checked for their relevance. Exclusion criteria were congress presentations, anecdotal reports about early experience, studies with a focus other than 3D laparoscopy, and studies analyzing robotic systems.
A qualitative study design was applied: full-text copies of all included articles were obtained. All articles were analyzed focusing on study design, considered or discussed influencing factors and confounders. This was done as an iterative process by two of the authors separately. Relevant issues were identified, then discussed, adjusted if necessary, cross-checked with the articles, and refined in the process. Disagreements about the themes were resolved by consensus. The process continued until all authors were satisfied that the themes summarized the published influences.
In the second part of our study, we developed a study design for a simulation setup for novices, taking the summarized influencing factors into account. In addition, Gray's considerations on simulated task environment were applied as a conceptual framework. 9 Simulation tasks used in prior studies were reviewed according to Gray's dimensions: complexity, tractability, correspondence, and engagement, and the tasks were adjusted accordingly to suit our research question and focus on depth perception and varying levels of cognitive load. We further concentrated on keeping the simulation material low fidelity and cost conscience.
We then tested these five tasks on 28 novices using a randomized crossover design. Participants were laparoscopically naïve medical students doing an elective in either pediatric surgery or pediatrics at the children's hospital in Lucerne Switzerland. All participants signed an informed consent before starting the study. The tasks were done in pairs with one student working on the simulation and the other driving the camera. The participants were randomized to start the first task in either 2D (period 1) or 3D (period 2) by drawing sealed envelopes. Either S.Z. or M.H.P. supervised the experiments at all times. The time keeping and scoring was done on a grading form by S.Z. or M.H.P. or a supervised student. Each session took between 2 and 3 hours. The experiment was carried out between March 2017 and December 2017. All tasks were performed in a standard laparoscopic simulator. Imaging systems from Karl Storz, Tuttlingen, Germany were used. The 2D imaging system consisted of 2D/HD Hopkins® 30° 10 mm diameter laparoscope, an Image1S™ camera head, and a 32″ full HD monitor. The 3D system consisted of the TIPCAM®1 S 3D laparoscope and camera, 10 mm diameter, and 30° connected with a 32″ 3D monitor. Participants wore passive 3D glasses. Both monitors were mounted on adjustable racks to allow adjustment to participants' height.
The results were analyzed using descriptive statistics and graphical displays such as boxplots of time versus method and period. Wilcoxon signed-rank tests were performed to assess the effect of the method of laparoscopy or period on each of the exercises, making use of the advantages of intraindividual comparisons enabled by the crossover design. Due to the exploratory nature of the study, no adjustment for multiplicity has formally been applied, and a P value <.05 may be considered as significant. From a statistical point of view, it should be noted that findings generated from this study should be supported by both, a relevant difference in the descriptive statistics and a P value that preferably is substantially <.05 (indicating a potential effect even considering the conduct of multiple comparisons based upon a limited sample size).
Results
Factors influencing evaluation of 3D laparoscopy
Six relevant review articles were found during the initial literature search.7,10–14 These articles had a total of 254 references. After deleting doubles and articles not meeting the inclusion criteria, 67 articles were analyzed. Seven influencing themes were identified by the iterative analysis:
Applied technique
The currently used 3D technique that applies binocular endoscopes and passive polarizing techniques allows for a better image than its predecessors, and is a lot more comfortable to use. Data from studies using the other techniques may reach different results, and should therefore not be used to predict the value of today's technique.
Experience of subjects
The experience of laparoscopic surgeons seems to have an impact on study results. There is some evidence on simulation and clinical settings that 3D visualization brings more benefit for less experienced surgeons.6,15 Therefore, participants' prior experience in laparoscopic surgery should be evaluated, and authors should define whose performance they aim to assess. Generalizing results generated by novices to all laparoscopic surgeons is not viable.
Appropriate study design
A study design that eliminates bias related to repetition of tasks is essential. This is true for simulation settings and clinical studies. Comparing current 3D data with historic controls, small numbers, nonstandardized procedures, or procedures with a large variation of performance time are not clinically meaningful.16–19 For clinical cases, the design needs to be prospective, randomized, and case controlled. For simulation studies, there needs to be a crossover design, and the change in performance in the repeated task needs to be analyzed. If two cohorts are created and each cohort is only tested on one modality, very large numbers are needed to extinguish individual aptitude and talent.
Taking single-repetition designs for an assessment of learning curve
Care should be taken that results from single-repetition studies are not used to draw conclusions on learning curves.1,19–23 Some studies found the difference between the 2D and 3D performance indistinguishable after three 24 and four25–27 repetitions. On the contrary, inexperienced laparoscopic surgeons who started a task in the 3D vision were better in performing the skill in 2D when compared with the participants who started with 2D vision. 6 To answer the question of an accelerated learning curve with 3D, multiple-repetition studies need to be carried out.
Insufficient qualitative study designs
A number of studies report on participants' subjective preference of 3D laparoscopy.15,18,27,28 Reliable data should be generated by using validated qualitative methods such as the NASA-TLX questionnaire.20,22 In addition, fatigue can be measured by two different methods: the simulation sickness questionnaire (SSQ) and the critical flicker fusion (CFF) test. 29 Workload dimension ratings can be assessed with the National Aeronautics and Space Administration Task Load Index. 30
Selection of the right laparoscopic tasks
Using the right simulation tasks to assess 3D simulation is of central importance. A number of studies have used nonvalidated homemade tasks. Others have used tasks that have been validated to assess laparoscopic skills but were not validated to assess 3D performance. Furthermore, we believe that the use of tasks adapted from the “Fundamentals of laparoscopic surgery 31 ” to assess the performance of novices is an often repeated mistake. From a medical education aspect, the “Fundamentals of laparoscopic surgery” combine tasks with a small cognitive load (transfer of elements, threading), and tasks that are much more difficult to accomplish and are not self-explaining. Tying a knot intracorporally without being taught to do so is an issue of trial and error. Mistry et al. have had similar thoughts when concluding that their applied MISTELS tasks may exceed a novice user's skill. 32 We believe the issue of choosing the wrong tasks to be the main reason for the contradictory results of published data.5,19,32–37 Exemplarily, Tanagho et al. found significantly better result in low cognitive load tasks with 3D but did not see any difference in knot tying. 35 Smith et al. used before assessment instruction, only one complex task, and issued ten repetitions, 38 and Mashiach et al. used low cognitive load tasks. 39 Both studies showed faster performance and fewer errors on 3D. Their data seem to be scientifically meaningful, and their study design should be repeated with larger cohorts.
Questionable outcome variables in clinical trials
For clinical trials, meaningful outcome criteria need to be defined. A lot of studies used clinical parameters to compare 2D and 3D laparoscopy. We believe these (e.g., early continence; a difference of 3 mL of blood loss, length of hospital stay, or mortality) to be neither dependent nor meaningful outcome parameters.28,40–45 There is no evidence that the 3D technique is affecting these parameters. Instead, variables such as movement economy might be a much better outcome parameter. 46
One of the themes found in our qualitative analysis of the literature was not judged relevant for 3D studies: a number of studies emphasized that testing participants' stereoacuity is essential to interpret the data. Even though it is accepted that visual perception and stereoacuity vary significantly among individuals, 47 its impact on task performance is unclear. A number of practicing surgeons seem to have little stereoacuity without it effecting their career, 48 and Dion et al. found that individuals who took longer to discriminate how objects are positioned in relation to each other did not take longer fulfilling laparoscopic tasks. Even though their data rely on small numbers, they conclude that motor skills might be of more importance. 49 But most importantly, ∼2.7% of the population are believed to possess no wide-field stereopsis in one hemisphere. 50 Since the absence of stereopsis only occurs in a small percentage of individuals, we believe this issue can be neglected.
Simulation tasks for novices analyzing 3D performance
Table 1 displays the developed tasks, the focus of the exercise, the grading of performance, and photos of the simulator. Cutoff times for the rubber band exercise were 120 seconds, and for all the other tasks 300 seconds. Points from full score were taken off for nonachievement, lost objects, and lack of accuracy.
Pilot test results
The performance of 28 laparoscopically naïve medical students was analyzed as a pilot test. Figures 1 and 2 display boxplots of median times needed for each task, differentiated for either method or period. Medians, quartiles, and P values are shown in Tables 2 and 3. As a rule, all five tasks were done faster and with fewer mistakes in 3D than in 2D. Considering both, the graphic and the P value data, this is least distinct in Task 1 and most distinct and significant in Tasks 2 and 3, where P values were <.001 and .002, respectively.

Median time differentiated by method (2D/3D). The boxplots show the times needed to complete the exercise carried out in either 2D (upper) or 3D (lower) boxplot. 2D, two dimensional; 3D, three dimensional.

Median time differentiated by period (first attempt/second attempt). The boxplots show the times needed to complete the exercise carried out for either the first time (upper) or second time (lower) boxplot.
2D, two dimensional; 3D, three dimensional.
In addition, the tasks were predominantly done faster and better when performed for the second time but only Task 1 reached a P = .006, all other tasks had P values >.05. There was an interesting adverse effect noted in Task 5; a number of participants took longer when performing the task for the second time.
The quality was distinctly better when performed with 3D (full score n = 24 versus 17; 1 mistake n = 3 versus 4; 2 mistakes n = 1 versus 5; 3 and 4 mistakes only occurred in the 2D group). The number of mistakes made when comparing the first and second attempt was similar.
The robustness of the one-dimensional comparisons (3D versus 2D and period 2 versus period 1) has been assessed by some supportive, more advanced evaluations such as stratified descriptive statistics and graphs, time-to-event approaches or an ANOVA for a crossover design (with effects for method, period, sequence, and subject within sequence). These additional evaluations, even if the available data may not always fully comply with some of the underlying model assumptions, confirmed that the estimates and the main effects seen in the one-factor analyses are quite robust, and that other statistical approaches considering interactions, imbalances, or censoring would still lead into similar conclusions.
Next to this analysis allowing a comparison of 3D visualization and repetition, it allows for an assessment of the quality of the developed tasks. Task 1 shows an advantage in both 3D and repetition. Median times (graphic and P value) for Tasks 2 and 3 show a distinct difference between 3D and 2D, which exceeds the advantage of experience. The interquartile range, that is, the difference between the upper and lower quartiles, of median times is relatively small. In contrast, this range is stretched out in Task 4 for both 2D and the first attempt. The data from Task 5 are diverse, and statistical significance was not reached for method or period.
Discussion
The qualitative analysis of the literature resulted in relevant themes influencing the outcome of 3D laparoscopy studies, which then enabled the development of focused simulation tasks for our pilot study. Even though the analysis was done as an iterative process by two raters individually, some issues might have been missed and might therefore diminish the evidence of this analysis.
Carrying out the simulation with medical students performing the developed tasks proved feasible. Time proved to be a reliable and easily comparable parameter, which additionally allowed a comprehensible way to show the results.
The data and our observation during the simulation show that Task 1 is comparably easy to perform. Discrimination between repetitions, method, and aptitude is not possible since differences are too small to be meaningful. Therefore, we believe Task 1 to be useful to introduce novices to the simulation and the use of Maryland dissectors. We believe that it is not suitable to generate further evidence.
Tasks 2 and 3 are performed quicker and better when performed with 3D, and quicker and better when performed for the second time. The fact that the upper and lower quartiles are not spread out very far proves that both tasks are easily comprehendible and feasible for novices. Aptitude does not affect results strongly, and the cognitive load is not too high. We therefore believe these two tasks to be well suitable to assess first-time performance of novices.
Task 4 on the other hand, even though it shows a benefit for both 3D and experience, showed a wide spread of the upper and lower quartile. The task's cognitive load and required practical skill are high. This task is affected by talent much more than the previous tasks. Since it is more complex, it promises to allow an assessment of change due to repetition and therefore an assessment of the learning curve.
The data from Task 5 are more difficult to interpret because of its diversity. The median time participants needed to complete the task would suggest an easy task with a low cognitive load. The relatively high number of nonachievements and the fact that some participants took considerably longer the second time contradict this. We believe that the task should be altered: more precise suturing is required, and more stitches need to be performed in one task. With these alterations, Task 5 should contain a high cognitive load and skill level, and therefore be as good a parameter as Task 4 to assess the learning curve. After these adjustments, the tasks need to be further validated.
Conclusion
We were able to identify important issues influencing the outcome of studies analyzing 3D laparoscopy, and thus develop simulation tasks that resulted in meaningful data.
The trend of a benefit in 3D laparoscopy for novices, seen in our pilot study, needs to be verified in a study with a larger number of participants. Since the required time to finish our developed tasks proved to be such a veritable outcome variable, we believe limiting the assessment to this variable might be acceptable. In addition to our study, other more complex tasks to correctly assess the benefit of 3D laparoscopy for experienced surgeons need to be developed and tested.
Ethical Approval
According to the ethical committee of the Department of Medicine at the University of Tuebingen, ethical approval is granted for all studies with an exclusive educational focus.
Footnotes
Disclosure Statement
No competing financial interests exist.
