An Objective Scoring System for Laparoscopic Nephrectomy

Abstract

Background and Purpose:

The current first-line recommended modality for nephrectomy is by the laparoscopic approach. This is one of the most frequent laparoscopic interventions conducted in urology. From a skills acquisition and delivery perspective in minimally invasive urologic surgery, there is a paucity of objective scoring systems for advanced laparoscopic urologic procedures. We developed a system of direct observation with structured criteria to evaluate the surgical conduction of laparoscopic nephrectomy (LN). We tested the application and preliminary validity of the scoring system.

Methods:

Sixty cases of prerecorded LN performed in four teaching hospitals were each analyzed by four mentors. Each mentor scored each case based on a 100-point scoring systemthat comprised 20 key steps for LN (each step ranging 0 to 5). Steps included port placement and safety checks in addition to the actual case. In addition, a negative marking system based on a 50-point index scoring system was deployed such that technically unsound techniques were penalized. The sum of the two resulted in the final score.

The final scores independently submitted for each recorded case were analyzed and compared. The system was then used to predict the experience of a surgeon for 10 pilot cases. The cases included a mix of five fellows and five experienced laparoscopic urologic surgeons. The cases were blinded to the independent assessors. A further 20 cases involving 10 cases performed by a trainee who sufficiently completed training (as deemed by the recent award of a certificate of specialist training in urology) vs one who is not ready were reviewed.

Results:

There was no significant difference in the scores submitted by each of the four mentors for each of the cases observed. There was a strong correlation between overall score and seniority/experience of the performing surgeon of each case; ie, it was able to predict whether an experienced surgeon or laparoscopic fellow performed the case. It was able to predict accurately between a trainee who sufficiently completed training vs one who is “not ready.”

Conclusion:

The scoring system was a reliable tool for assessing the performance of LN and accurately predicts the level of experience of the surgeon. This system could be a useful supplementary tool for assessing the baseline skill and progress of trainees.

Introduction

S urgical competence entails a combination of technical skills and manual dexterity, with knowledge and decision making. Assessment of a surgical trainee in the United Kingdom is the responsibility of the individual's trainer. Yet despite the introduction of Direct Observation of Procedural Skills and Procedure Based Assessment, it is argued that their assessment is largely subjective.¹ In addition to their subjectivity, current protocols also fail to include a targeted preoperative assessment or the discussion of a patient's radiologic investigations before undergoing surgery. With the widespread uptake of complex and inherently difficult and challenging approaches such as the laparoscopic approach to renal extirpative surgery, there is a need for development of not only robust training platforms, but also optimal assessment tools.

At present, we found that a vital part of surgical development is missed by the current assessment tools in laparoscopic surgery.^2
–4 We developed a system of direct observation with structured criteria to evaluate the surgical conduction of laparoscopic nephrectomy (LN). We then tested the application and preliminary validity of the new scoring system. The reliability and validity of the system were then tested by analyzing the correlation among four different observers. Although focusing on radical nephrectomy, it is the authors' belief that similar systems can be used for the assessment of most advanced laparoscopic/minimally invasive urologic procedures.

Current limitations in the assessment of surgery

Any evaluation of surgical skill should be feasible (ie, the test must be practical and straightforward to administer), valid (ie, the test should be able to measure given outcomes), reliable (if the test were administered to the same person on separate days, the results should be similar), objective, and the results should be reproducible.¹ Direct observation of trainees without explicit criteria means that any judgment made on skill is subject to personal bias and opinion.⁵ The use of operation-specific checklists, however, has been shown to be valid,⁶ with high inter-rater reliability.⁷

From a skills acquisition and delivery perspective in minimally invasive urological surgery, there is a paucity of objective scoring systems for advanced laparoscopic urologic procedures. Laparoscopic procedures are uniquely suited to the development of scoring systems, because the operative field of view can easily be recorded and reviewed at a later time. Therefore, if appropriate standardized criteria could be developed, assessment could be carried out at a convenient time for the trainer, as well as provide a self-assessment tool for the trainee. Any score generated could also be used by the trainee to track his/her progress with that procedure.

The instrument developed highlights each important step of LN and assigns a relative numeric value (score 0–5) that is dependent on its successful completion. In addition, a negative marking system was deployed such that technically unsound techniques or errors were penalized. The sum of the two provides the final score. This work has been undertaken in an effort to improve the reliability and validity of urologic trainee assessment.

Methods

The LN scoring system was developed with the input of four separate consultant mentors and after the analysis of previously recorded intraoperative videotapes. Twenty key steps were identified, each step attracting a score from 0 to 5. A score of zero meant that the observed task was unsafe and/or performed in a manner that warranted major concern. A score of 5 meant that no further development was deemed necessary. A score of 1 meant that the step was performed safely but needed major improvement. A score of 2, 3, and 4 meant that a given step was deemed safe but needed varying degrees of improvement; ie, practice, respectively. As such, the maximum possible mark for performing LN was set at 100 (Appendix 1, Score Sheet).

The important steps identified are shown (Fig. 1) and include port placement and safety checks in addition to the actual case. The main subsections for scoring included (a) preoperative preparation, (b) intraprocedural score, and (c) postprocedural score. The preoperative preparation section included four key steps with a maximum of 20 points subdivided into (1) image discussion and interpretation including the World Health Organization checklist, (2) patient positioning and draping, (3) port positioning (site), and (4) port placement (physical insertion/safety).

FIG. 1.

Scoring system used in this platform.

The intraprocedural score included 12 key steps with a maximum of 60 points subdivided into (1) colonic mobilization, (2) identification of the gonadal vein, (3) identification of the ureter, (4) dissection of the ureter from psoas and retraction of lower pole, (5) dissection toward the pedicle along the right border of the inferior vena cava or left border of the aorta, (6) dissection of the renal pedicle, (7) clipping and/or stapling of the renal artery, (8) clipping of the renal vein, (9) respect for adrenals/spleen or liver, (10) safe dissection of the kidney from the renal bed, (11) safe insertion of an Endobag,™ and (12) safe retrieval of the specimen.

The postprocedural score included four key steps with a maximum of 20 points subdivided into (1) hemostatic check (renal bed/intracavitary port site), (2) drain insertion, (3) closure including fascia, and (4) postoperative check.

The sum of all the elements resulted in a preliminary score and was called the positive score (PS) and ranged 0 to 100. In addition, a negative marking system based on a 50-point index scoring system was also incorporated such that technically unsound techniques were penalized. The negative marking score was called the negative score (NS) and ranged 0 to 50. It comprised (1) instruments out of target site when activated including diathermy with a penalty of minus 10 maximum points, (2) unsafe handling of tissue (minus 10 maximum), (3) unsafe handling of pedicle (minus 20 maximum), and (4) hemostasis and organ retrieval at end of the procedure (minus 10 maximum) (Table 1). The final performance score (FPS) was determined by subtracting NS from PS.

Table 1.

Negative Scoring

1. INSTRUMENTS OUT OF TARGET SITE WHEN ACTIVATED INCLUDING DIATHERMY	(−10 maximum)
2. UNSAFE HANDLING OF TISSUE	(−10 maximum)
3. UNSAFE HANDLING OF PEDICLE	(−20 maximum)
4. HAEMOSTASIS AND ORGAN RETRIEVAL AT END OF PROCEDURE	(−10 maximum)

After the scoring system was created, 60 cases of prerecorded LN performed in four teaching hospitals were each analyzed by four mentors using the above system. A total of 240 scores were available for analysis. Anonymity for the surgeon and the patient were preserved. There were five trainees/fellows each with experience of fewer than 15 LNs and five experienced laparoscopic surgeons with more than 100 cases at the time of analysis. Thirty cases were analyzed in each wing.

In addition to this, the system was then used to predict the experience of a surgeon for 10 pilot cases. The cases included a mix of five fellows and five experienced laparoscopic urologic surgeons. The difficulty is in distinguishing between a fellow who has sufficiently completed training vs one who is not ready, or identifying a surgeon with unsafe practices. Furthermore, after the preliminary study, we set out to study a further 20 cases that involved 10 cases performed by a trainee who sufficiently completed training (as deemed by the recent award of a certificate of specialist training in urology) vs one who was deemed “not ready.” The scoring system was able to accurately predict the category that a given trainee fell into.

Results

Individual FPS for the trainee groups recorded by each mentor with a mean FPS for each trainee are illustrated in Table 2. Similar data for the expert group were recorded in Table 3. The mean FPS for trainee and expert groups were compared using the two-tailed unpaired t test showing mean±standard deviation (SD) of (34.83±1.154) and (79.96±1.741) for both groups, respectively, proving a statistical significance (P<0.0001) (Fig. 2). This showed more than double the score in cases performed by an expert vs a trainee (77.5/34.8=2.2). In comparison, FPS given by each of the four mentors for the expert and trainee groups revealed a small and statistically insignificant difference between individuals (Table 4). There was, hence, no observer-related skew in scoring; ie, none of the mentors tended to over or under score in this study (Fig. 3).

FIG. 2.

Average final performance score (FPS).

FIG. 3.

Final performance score (FPS) average given by each mentor.

Table 2.

Scores for Trainees/Fellows (<15 Cases)

Candidate	Mentor 1 (FPS=PS-NS. Score range 0–100)	Mentor 2	Mentor 3	Mentor 4	Average score per individual candidate
1.	42	44	38	42	41.5
2.	40	44	42	38	41.0
3.	38	40	36	36	37.5
4.	30	32	28	28	29.5
5.	28	26	24	26	26.0
6.	28	30	32	28	29.5
7.	38	39	40	38	38.8
8.	33	32	30	31	31.5
9.	38	40	41	38	39.3
10.	28	26	25	27	26.5
11.	26	24	26	24	25.0
12.	26	25	23	25	24.8
13.	36	37	37	36	36.5
14.	37	36	38	36	36.8
15.	40	44	42	44	42.5
16.	37	40	41	38	39.0
17.	39	40	42	36	39.3
18.	38	36	37	39	37.5
19.	38	39	40	38	38.8
20.	34	33	32	32	32.8
21.	32	32	32	32	32.0
22.	44	40	41	40	41.3
23.	22	23	25	24	23.5
24.	28	26	25	27	26.5
25.	38	35	37	36	36.5
26.	46	48	47	45	46.5
27.	35	36	38	37	36.5
28.	33	32	34	32	32.8
29.	44	43	45	44	44.0
30.	32	32	30	31	31.3
Mean scores by individual mentors	33.6	35.1	33.8	34.3

FPS=final performance score; PS=positive score; NS=negative score.

Table 3.

Scores for Experienced Laparoscopic Surgeons (>100 Cases)

Candidate	Mentor 1 (Score 0–100)	Mentor 2	Mentor 3	Mentor 4	Average score per individual candidate
1.	85	88	87	86	86.5
2.	90	92	90	92	91.0
3.	84	85	87	85	85.3
4.	78	80	84	80	80.5
5.	85	83	85	88	85.3
6.	81	82	83	82	82.0
7.	69	72	68	70	69.8
8.	69	74	70	73	71.5
9.	90	92	90	91	90.5
10.	92	88	90	91	90.3
11.	68	72	69	68	69.3
12.	69	70	72	68	69.8
13.	77	79	80	78	78.5
14.	89	90	92	89	90.0
15.	78	76	79	80	78.3
16.	78	82	82	84	81.5
17.	85	88	84	86	85.8
18.	92	94	92	94	93.0
19.	68	70	72	69	69.8
20.	78	82	82	84	81.5
21.	76	74	72	75	74.3
22.	88	90	90	90	89.5
23.	91	87	86	90	88.5
24.	72	74	76	80	75.5
25.	65	68	62	68	65.8
26.	62	60	60	62	61.0
27.	56	60	56	62	58.5
28.	90	88	86	90	88.5
29.	89	90	88	87	88.5
30.	77	79	80	78	78.5
Mean scores by individual mentors	79	80.3	79.8	80.7

Table 4.

Comparision of Final Performance Scores Given by Each of the Four Mentors for the Expert and Trainee Groups

	Mentor 1 (Score 0–100)	Mentor 2	Mentor 3	Mentor 4
Mean scores by individual mentors for expert laparoscopic surgeon	79.0	80.3	79.8	80.7
Mean scores by individual mentors for trainee	33.6	35.1	33.8	34.3

The system was then applied to predict the experience of surgeon for 10 pilot cases. The cases were conducted by five fellows and five experienced laparoscopic urologic surgeons such that each one did a case and scores given. Four independent assessors who did not participate in the study previously then scored each case. The cases were blinded to the independent assessors. The scores were then presented to the original panel of mentors and they were asked to predict expert from trainee. All four of the original scoring mentors were able to decipher with 100% accuracy, based solely on the presented scores, between experts and trainees.

Pearson correlation coefficient was used to assess the relationship of experience to score. Significance was accepted for P<0.05. There was, hence, a strong correlation between overall score and seniority/experience of the performing surgeon of each case; ie, it was able to predict accurately whether an experienced surgeon or laparoscopic fellow did the case.

Discussion

The assessment of laparoscopic surgical skill competence is a critical facet in minimally invasive surgical skill acquisition and delivery.⁸ In the United Kingdom, concepts such as the Calman system, the European Working Time Directive, the Hospital at Night project, coupled with ever-increasing health economic burdens, surgical skill acquisition has become very challenging. The reduced “working time” restricts the trainee exposure to the “experience” of surgery and makes the traditional apprenticeship mode of learning difficult.

With the reduction in the time to acquire optimal surgical competence, trainees who will subsequently progress to being independent specialists could potentially be suboptimally equipped in terms of skills set.⁹ With the advent of current training platforms, assessment of laparoscopic skills in terms of competence has become a challenge. Currently, methods of skills assessment include procedure lists with evaluation of surgical logbooks, direct observation of procedural skills with or without clearly defined objective criteria, assessment of skills on the animal model, and evaluation of videotapes. The reliability and validity of each of these modes varies. Review of the procedure lists and logbooks is the least reliable, with poor validity, whereas assessments such as direct observation of procedural skills and videotape analysis are known to have high reliability and are purported to be close to reality.

The assessment of technical skills by direct observation in the operating room is potentially subjective. This form of assessment lacks objective scoring against set specific criteria and hence risks being unreliable. Subjective assessments such as this inherently have poor test-retest reliability. Furthermore, there is the risk of error because of interobserver reliability. This has been proven because assessment by experienced mentors still has the risk of a high degree of disagreement while rating the skills set of a given surgical trainee.⁸

Theoretically, to have a more robust and, hence, objective assessment, one needs set criteria against which technical/surgical skills can be assessed. There are several objective modes of assessment that have been explored. The use of surgical competence checklists has been shown to somewhat deter the assessor from being an interpreter and makes him/her rather an observer with attenuation of the subjective component.¹⁰ The current widespread use of the objective structured clinical examination has led to the development of a similar concept for the assessment of technical skills.¹¹

The so-called objective structured assessment of technical skills (OSATS) comprises six stations on which trainees conduct tasks on live animal and/or bench models in set time frames.¹² One of the drawbacks to OSATS assessment is the demands placed in terms of resources in getting trained mentors to perform the assessment. It is also time consuming. Video analysis of performance has been explored and seems to have the highest degree of reliability and validity. It has the added advantage that the assessment is blinded to the rater. Datta and associates¹³ showed the construct validity of the global rating scale with an inter-rater reliability of retrospective video analysis to be 0.81.

In our study, we used the concept of retrospective video analysis and made the assessment tool for LN blinded and objective. The detailed and task-specific scoring system applied to video analysis showed that the ratio of the FPS for experienced laparoscopic surgeon vs FPS for trainee (77.5/34.8)=2.2. This indicated a more than double the score in cases performed by an expert vs a trainee. The large variation in score hence made construct validity of the system optimal. In this study, we found a strong correlation between overall score and seniority/experience of the performing surgeon of each case; ie, it was able to accurately (100% accuracy) predict whether an experienced surgeon or laparoscopic fellow performed the case.

We acknowledge that our system may have an element of inherent subjectivity. A degree of subjectivity exists, however, in all assessment platforms involving direct observation of procedural skills. We suggest that the tool described for evaluation of improvement, with experience and with further validation, could potentially be used for graduation or maintenance of certification. The implications of this could be followed up by further targeted studies to decipher the impact on trainees in different training schemes on a global and more diverse platform. Similar assessment tools could be applied to other laparoscopic and robotic urologic procedures in the future.

Conclusion

The concept has the advantage in that it could be tailored for given surgical procedures. There is a need for further large-scale studies using this concept. The current scoring system could have a useful application in current endourologic fellowship programs.

Footnotes

Disclosure Statement

No competing financial interests exist.

Abbreviations Used

Appendix

Appendix 1.

Stilus Sarg Objective Scoring System for Laparoscopic Nephrectomy: Mentor Score Sheet

	INTERFACE OBSERVED	INDIVIDUAL SCORE	TALLY SCORE	NOTES
□	IMAGE DISCUSSION AND INTERPRETATION INCLUDING WHO CHECKLIST	0 1 2 3 4 5	□
□	PATIENT POSITIONING AND DRAPING	0 1 2 3 4 5	□
□	PORT POSITIONING (SITE)	0 1 2 3 4 5	□
□	PORT PLACEMENT (PHYSICAL INSERTION/SAFETY)	0 1 2 3 4 5	□
□	COLONIC MOBILISATION	0 1 2 3 4 5	□
□	IDENTIFICATION OF GONADAL VEIN	0 1 2 3 4 5	□
□	IDENTIFICATION OF URETER	0 1 2 3 4 5	□
□	DISSECTION URETER FROM PSOAS AND RETRACTION OF LOWER POLE	0 1 2 3 4 5	□
□	DISSECTION TOWARDS PEDICLE ALONG RIGHT BORDER OF IVC OR LEFT BORDER OF AORTA	0 1 2 3 4 5	□
□	DISSECTION OF RENAL PEDICLE	0 1 2 3 4 5	□
□	CLIPPING AND/OR STAPLING OF RENAL ARTERY	0 1 2 3 4 5	□
□	CLIPPING OF RENAL VEIN	0 1 2 3 4 5	□
□	RESPECT FOR ADRENALS/SPLEEN OR LIVER	0 1 2 3 4 5	□
□	SAFE DISSECTION OF KIDNEY FROM RENAL BED	0 1 2 3 4 5	□
□	SAFE INSERTION OF ENDOBAG	0 1 2 3 4 5	□
□	SAFE RETRIEVAL OF SPECIMEN	0 1 2 3 4 5	□
□	HAEMOSTATIC CHECK (RENAL BED/INTRA CAVITARY PORT SITE)	0 1 2 3 4 5	□
□	DRAIN INSERTION	0 1 2 3 4 5	□
□	CLOSURE INCLUDING FASCIA	0 1 2 3 4 5	□
□	POST OPERATIVE CHECK	0 1 2 3 4 5	□
NEGATIVE TALLY POINTS:
INSTRUMENTS OUT OF TARGET SITE WHEN ACTIVATED INCLUDING DIATHERMY				(−10 maximum)
UNSAFE HANDLING OF TISSUE				(−10 maximum)
UNSAFE HANDLING OF PEDICLE				(−20 maximum)
HAEMOSTASIS AND ORGAN RETRIEVAL AT END OF PROCEDURE				(−10 maximum)

References

Reznick

. Teaching and testing technical skills. Am J Surg, 1993; 165:358–361.

Kommu

, Rane

. Laparoscopic urological training programmes: The need for a consensus on minimum standards. BJU Int, 2007; 99:489–491.

Kommu

, Dickinson

, Rané

. Optimizing outcomes in laparoscopic urologic training: Toward a standardized global consensus. J Endourol, 2007; 21:378–385.

Kommu

, Rane

. Working party for urological studies in improving minimally invasive skill acquisition and delivery. BJU Int, 2007; 100:218.

Shah

, Darzi

. Surgical skills assessment: An ongoing debate. BJU Int, 2001; 88:655–660.

Watts

, Feldman

. Assessment of technical skills. Nuefeld

, Norman

. Assessing Clinical Competence. New York: Springer, 1985; 259–274.

Kopta

. An approach to the evaluation of operative skills. Surgery, 1971; 70:297–303.

Grantcharov

, Bardram

, Funch-Jensen

, Rosenberg

. Assessment of technical surgical skills. Eur J Surg, 2002; 168:139–144.

Moorthy

, Munz

, Sarker

, Darzi

. Objective assessment of technical skills in surgery. BMJ, 2003; 327:1032–1037.

10.

Scott

, Valentine

, Bergen

et al. Evaluating surgical competency with the American Board of Surgery In-Training Examination, skill testing, and intraoperative assessment. Surgery, 2000; 128:613–622.

11.

Regehr

, MacRae

, Reznick

, Szalay

. Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Acad Med, 1998; 73:993–997.

12.

Martin

, Regehr

, Reznick

et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg, 1997; 84:273–278.

13.

Datta

, Chang

, Mackay

, Darzi

. The relationship between motion analysis and surgical technical assessments. Am J Surg, 2002; 184:70–73.