Competency-Based Training and Simulation: Making a “Valid” Argument

Abstract

The use of simulation as an assessment tool is much more controversial than is its utility as an educational tool. However, without valid simulation-based assessment tools, the ability to objectively assess technical skill competencies in a competency-based medical education framework will remain challenging. The current literature in urologic simulation-based training and assessment uses a definition and framework of validity that is now outdated. This is probably due to the absence of awareness rather than an absence of comprehension. The following review article provides the urologic community an updated taxonomy on validity theory as it relates to simulation-based training and assessments and translates our simulation literature to date into this framework. While the old taxonomy considered validity as distinct subcategories and focused on the simulator itself, the modern taxonomy, for which we translate the literature evidence, considers validity as a unitary construct with a focus on interpretation of simulator data/scores.

Introduction

Within the health professions' education community, competency-based medical education (CBME) has regained traction as the prevailing curricular framework, with terms like “competency,” “entrustable professional activities,” and “milestones” well ensconced in the lexicon of the education community. Representing a departure from the traditional time-based model of training, CBME is an outcome-based approach to the design, assessment, and evaluation of a training program using predefined learner competencies as a framework.¹

Akin to conceptual frameworks in curriculum development and skill learning, central to CBME is the need for iterative assessments of learner abilities and competences. In surgical disciplines like urology, this includes a significant emphasis on not only cognitive and attitudinal abilities but also skill competencies.

The time spent in the operating room by modern surgical trainees in the CBME era no longer provides sufficient exposure to train and assess them adequately. To address this need, simulation-based training and assessment methods have become important adjunctive tools utilized by surgical training programs. This is clearly illustrated by the significant increase in simulation-based training research published over the past decade.² The literature supporting the role of simulation-based training methods is now quite robust, with much of the surgical literature focusing on simulators and the validation of simulators.

Validity theory has changed significantly since the urologic community adopted the notion that simulation-based training methods were effective complements to surgical training. As a result, much of the current literature in urologic simulation-based training and assessment uses a framework and definition of validity that is now outdated—likely from absence of awareness rather than an absence of comprehension. Moreover, the literature on simulation-based assessment tools has been sparse over this same time period. The use of simulation as an assessment tool is much more controversial than is its utility as an educational tool, but without valid simulation-based assessment tools, the ability to objectively assess technical skill competencies in a CBME framework will remain challenging.

The following review article focuses on providing the urologic community an updated taxonomy on validity theory as it relates to simulation-based training and assessments and translates our simulation literature to date into this framework. In addition, we provide a review of the literature at various stages of the competence continuum (medical student, resident, and practicing physician), as simulation-based curricula and assessment tools will have an important role in improving our ability to accurately assess technical skill in the context of a CBME training model.

Validity—the Old and the New

Over the past 60 years, several documents from three different organizations have been published and disseminated to guide the development and use of assessment tests. During 1954, the first guide called “Technical Recommendations for Psychological Tests and Diagnostic Techniques” was introduced by a committee from the American Psychological Association (APA). Through collaborative efforts with the American Educational Research Association (AERA) and the National Council on Measurement in Education (NCME), several iterations of this “consensus standards” guide have been published under the updated title “Standards for Educational and Psychological Testing.”

In 1974, the consensus standards included the framework of validity most often used in current simulation-based training research published by the urologic community: “valid instruments” with distinct “types of validity.” The process for validating a given test or simulator involved subjective approaches such as face and content validity and objective approaches such as criterion and construct validity.³ Face validity was defined as whether or not a simulator represented what it was intended to represent, as judged by learners. Content validity was defined as whether or not a simulator realistically taught what it was supposed to teach, as assessed by educational content experts. Objective approaches to validity included criterion and construct validity. Criterion validity referred to the concept of performance scores on a simulator correlating with another gold standard definition of skill while construct validity referred to the concept of the simulator being able to distinguish expert performances from novice performances (Table 1). While this taxonomy may still have some relevance with respect to simulators and simulation-based training, it is no longer considered acceptable by behavioral scientists when it comes to testing or assessment. Accordingly, this approach has been removed from subsequent revisions of the 1985 consensus standards. These newer iterations of the consensus standards have included several significant changes to the concept of validity, most of which has not been reflected in the current urologic education literature. The most recent guide was published in 2014.⁴

Table 1.

Old and Modern Validity Concepts

Old validity concept	Modern validity concept
A. Subjective approaches: Face validity: About the realism of the simulator, does the simulator represent what it is supposed to represent? From nonexperts' point of view Content validity: A formal evaluation of the simulator by experts of the appropriateness of a simulator as a teaching modality, addresses the question, “does the simulator realistically teach what it is supposed to teach?” B. Objective approaches: Criterion validity: About the degree of correlation of scores obtained from a new simulator with scores on another gold standard model, called “concurrent validity,” correlation with future scores inside the operating room is called “predictive validity.” An OSATS tool to assess the performance is needed. Construct validity: About the ability of a simulator to distinguish between experienced and inexperienced persons and among different levels of experience. ^aValidity in this concept was considered for the simulator itself.	Validity is “unitary,” an “ongoing” process, “hypothesis,” and “validity evidence” should be collected either to accept or deny this hypothesis. Evidence of: A. Content: evaluates the relationship between the simulator content, including scenarios, procedures, or scoring, and the target construct for which this simulator was designed to measure. B. Response processes: refers to theoretical and empirical analyses evaluating data integrity—the degree to which performances and ratings align with the intended construct. C. Internal structure: refers to the degree to which the relations among individual simulator task items relate to the construct–concept of “reliability.” D. Relation with other variables: evaluates the associations between assessment scores and other external variable(s) or relevant criterion that has a potential theoretical relationship. E. Consequences: refers to the impact or the consequences, favorable or unfavorable, of the interpretation of simulator scores and the actions or decisions taken such as credentialing, promotion, or privileging. ^aValidity in this concept is for the application or the interpretation of simulator scores not for the simulator itself.

Old validity concept

Modern validity concept

A. Subjective approaches:
Face validity: About the realism of the simulator, does the simulator represent what it is supposed to represent? From nonexperts' point of view
Content validity: A formal evaluation of the simulator by experts of the appropriateness of a simulator as a teaching modality, addresses the question, “does the simulator realistically teach what it is supposed to teach?”
B. Objective approaches:
Criterion validity: About the degree of correlation of scores obtained from a new simulator with scores on another gold standard model, called “concurrent validity,” correlation with future scores inside the operating room is called “predictive validity.” An OSATS tool to assess the performance is needed.
Construct validity: About the ability of a simulator to distinguish between experienced and inexperienced persons and among different levels of experience.
^aValidity in this concept was considered for the simulator itself.

Validity is “unitary,” an “ongoing” process, “hypothesis,” and “validity evidence” should be collected either to accept or deny this hypothesis.
Evidence of:
A. Content: evaluates the relationship between the simulator content, including scenarios, procedures, or scoring, and the target construct for which this simulator was designed to measure.
B. Response processes: refers to theoretical and empirical analyses evaluating data integrity—the degree to which performances and ratings align with the intended construct.
C. Internal structure: refers to the degree to which the relations among individual simulator task items relate to the construct–concept of “reliability.”
D. Relation with other variables: evaluates the associations between assessment scores and other external variable(s) or relevant criterion that has a potential theoretical relationship.
E. Consequences: refers to the impact or the consequences, favorable or unfavorable, of the interpretation of simulator scores and the actions or decisions taken such as credentialing, promotion, or privileging.
^aValidity in this concept is for the application or the interpretation of simulator scores not for the simulator itself.

N.B.

OSATS = objective structured assessment of technical skills.

The core concept, based on Messick, that remains in the 2014 framework of validity for testing no longer refers to types of validity but rather focuses on a unitary concept of validity where all validity is construct validity.⁵ This was brought to the attention and translated for the surgical community in 2010 by Sweet and colleagues and Korndorffer and colleagues.^6,7 Validity is defined as “the degree to which evidence and theory support the interpretation of assessment scores for proposed uses of tests.”⁴ This new definition considers validity to be a “hypothesis” whereby evidence should be collected to either accept or refute it and this evidence should come from multiple sources, including the test content, response processes, internal structure, relationships to other variables, and consequences of testing.

Validity evidence is also required for each use of the test or simulator. That is, validity evidence is unique to the defined population in which it was evaluated—validity evidence for a simulator using medical student assessments does not necessarily mean it will be a valid assessment tool for maintenance of certification of practicing physicians—and validity evidence must be considered in the context of the planned use of those results—strength of validity evidence for low-stake formative assessments differs from that of high-stake summative assessments.

Finally, validity evidence is not for the simulator or the test itself, but rather it applies to the interpretation of the simulator/test performance scores—will we be able to validly interpret the performance scores obtained using a specific simulator to make a judgment of competence or skill.

The validation process is considered a responsibility of both test/simulator developers and users. The responsibility of developers is to provide relevant rationale and evidence that support any test/simulator score interpretations for particular uses intended by the developers. However, the responsibility of users is to evaluate the evidence in the particular setting in which the test is to be used.

Current validity taxonomy—the five sources of validity evidence

First introduced in the 1999 iteration of the consensus standards,⁸ rather than delineating types or categories of validity, the modern validity taxonomy describes a framework with five distinct sources of validity evidence that are to be used to help accept or deny the interpretation of an assessment (Table 1). Not all five sources of evidence are required for all assessments and depending on the type of assessment being made, more emphasis on one or more sources may be necessary.⁹

Simulation-based assessment tools in a CBME continuum

The CBME model involves a continuum of learning not limited to residency training alone. It is a paradigm that focuses on the acquisition, maintenance, and enhancement of skills at various stages of competence, including undergraduate medical education, residency training, and continuing professional development. As part of a CBME framework, simulation-based training curricula and assessment tools can play a large role in developing and confirming competence within each stage of learning. Simulation-based training methods can reduce the educational footprint of training on patient outcomes while simultaneously allowing trainees and practicing physicians to acquire and maintain various skills.¹⁰ With the demonstration of robust validity evidence, simulation-based methods can also provide educators with the ability to accurately assess competence.

One of the most well-known and widely adopted simulation-based training curricula in surgery is the fundamentals of laparoscopic surgery (FLS) curriculum.^11,12 It not only serves as a comprehensive basic laparoscopic skill training module but also it has been shown to have validity evidence for use as an assessment tool for trainees and practicing surgeons alike. In fact, the FLS curriculum is now a mandatory requirement for all general surgery trainees in the United States for certification by the American Board of Surgery.¹³ In the urologic community, there is growing evidence for the use of comparable simulation-based basic laparoscopic skill training curricula. Both the American Urological Association Basic Laparoscopic Urologic Skills (AUA BLUS) curriculum^14,15 and the European Association of Urology European Basic Laparoscopic Urologic Skills (E-BLUS) program have been described for the training and assessment of residents and practicing urologists.¹⁶

While the urologic surgical education and simulation literature has grown significantly over the past decade, much of the literature still focuses on an outdated concept of validity. In addition, there is a relative paucity of data supporting the use of various simulation-based assessment tools for the purpose of objective assessment of surgical skill. In particular, there are no data supporting the use of these simulation-based assessment tools for high-stake assessments, a problem not only in the urologic literature but also the surgical literature in general. In addition, studies demonstrating translation of performance to the operating room, as a result of a specific training curriculum (previously referred to as ‘predictive validity), are rare and attempts to translate the impact of these interventions to improved patient outcomes are even rarer and methodologically flawed.

Crowd-sourced Assessment of Technical Skills (C-SATS) is a relatively recent tool for obtaining assessments through an online community of lay persons or crowds using expert developed and validated evaluation tools.¹⁷ Assessments from C-SATS are comparable to those obtained from experts for both basic and advanced robotic skills.^18,19 A recent report from the Michigan Urologic Surgery Improvement Collaborative (MUSIC) showed a strong correlation between the reviews of crowds and peer surgeons for assessment of robot-assisted radical prostatectomy (RARP) skills of 12 robotic surgeons using the Global Evaluation Assessment of Robotic Skills (GEARS) and Robotic Anastomosis and Competency Evaluation (r = 0.78 and r = 0.74), respectively.¹⁹ In a later study, the same group reported a significant correlation between the crowd reviews of the urethrovesical anastomosis and the postoperative outcomes of RARP in terms of the urethral catheter replacement rate and the readmission rate.²⁰ Importantly, C-SATS assessments are also rapid and cost-effective.

While not exhaustive, the following tables list studies that have been published in the urologic literature that aimed to evaluate simulators or simulation-based assessment tools; the associated validity evidence presented within each study is also provided (Tables 2 –5). The last table shows that most of the current studies on assessment provided the evidence of relations with other variables, either in the form of comparison of performance between experts and novices or correlation of performance with the previous experience, and neglected other sources of validity evidence especially the response processes and consequences (Table 6).

Table 2.

Simulation-Based Assessments in Endourology

Authors	Year	Participants	Assessment tool	Type	Test content	Response processes	Internal structure	Relation to other variables	Consequences
Sweet et al.²¹	2004	19 novices, 72 expert faculty	UofW TURP simulator	Part-task and VR model	User ratings only			Correlation to experience only
Rashid et al.²²	2007	72 urologists, 45 residents, and 19 novices	TURP simulator	VR				Correlation to experience and comparison among experts, residents, and novices
Kishore et al.²³	2008	10 urology residents	Cystoscopic and ureteroscopic OSATS	OSATS	Expert consensus using Delphi method and Trainee questionnaire	Tested the alignment of responses with the learning objective		Correlation of the cognitive and psychomotor skills with experience
Hudak et al.²⁴	2010	4 med students, 19 residents, 12 faculty	SurgicalSIM TURP simulator	VR				Correlation to experience only
Brewin et al.²⁵	2014	8 residents, 8 expert faculty	Bristol TURP simulator and OSATS scores	Part-task model	User ratings only			Correlation to experience only
Blankstein et al.²⁶	2015	15 residents	COOK Ureteroscopy model and GRS	Part-task model	User ratings only			Correlation to experience only
Aydin et al.²⁷	2015	25 med students, 14 residents, 7 expert faculty	The GreenLight Simulator	VR	User ratings only			Correlation to experience only
Argun et al.²⁸	2015	30 urology residents	Cystoscopic and ureteroscopic OSATS	OSATS			Item analysis (internal validity)	Correlation with training year
Aloosh et al.²⁹	2016	13 residents	URO Mentor	VR				Predictive validity	Possible transfer of skill from VR to OR
Noureldin et al.³⁰	2016	102 PVP-OSATS for residents, 14 PVP-OSATS for faculty experts	PVP-OSATS scoring system	GRS	Test blueprint, expert consensus		Reliability scores	Correlation to level of training, concurrent validity
Noureldin et al.³¹	2016	26 residents	PERC Mentor simulator and GRS	VR and GRS				Correlation to level of training

OR = Operating Room

Table 3.

Simulation-Based Assessments in Robotic Urologic Surgery

Authors	Year	Participants	Assessment tool	Type	Test content	Response processes	Internal structure	Relation to other variables	Consequences
Gavazzi et al.³²	2011	18 novices, 12 experts	SEP Robot simulator	VR	User ratings			Correlation to experience
Hung et al.³³	2012	24 novices, 9 intermediates, 13 experts	Robotic Partial Nephrectomy porcine model and GOALS	Part-task model	User ratings			Correlation to experience
Lee et al.³⁴	2012	13 novices, 7 experts	MIMIC robotic surgery dV-Trainer	VR	User ratings			Correlation to experience
Kelly et al.³⁵	2012	18 med students, 9 residents, 2 fellows, 9 faculty	da Vinci Skill Simulator	VR	User ratings			Correlation to experience
Goh et al.³⁶	2012	25 residents, 4 faculty	Robotic Skill scoring system (GEARS)	GRS		Rater training, BUT not blinded	Reliability scores, item analysis	Correlation to experience	Formative and summative
Hung et al.³⁷	2013	38 residents, 11 faculty experts	Basic robotic skill tasks	Part-task, VR, and porcine models		Rater training		Correlation to experience, GEARS scores, cross-method validation
Alzahrani et al.³⁸	2013	13 med students, 18 residents, 4 fellows, 11 faculty	da Vinci Skill Simulator	VR	User ratings			Correlation to experience
Foell et al.³⁹	2013	45 residents, 8 faculty	da Vinci Skill Simulator	VR	User ratings	Rater training		Correlation to experience, concurrent validity
Raza et al.⁴⁰	2014	41 trainees, 20 faculty	RoSS robotic simulator	VR	Expert consensus for task selection			Correlation to experience
Whittaker et al.⁴¹	2016	20 med students, 13 residents, 13 faculty	RobotiX Mentor simulator	VR	User ratings			Correlation to experience
Ghani et al.¹⁹	2016	12 urologic robotic surgeons	GEARS and RACE	OSATS and C-SATS			Reliability scores	Association between the peer-review and the C-SATS
Aghazadeh et al.⁴²	2016	21 urologic robotic surgeons	da Vinci Skill Simulator and inanimate model tasks (FIRST)	VR and part-task models				Correlation with GEARS scores	Performance on dVSS simulator may predict robotic clinical performance (favorable consequence)
Noureldin et al.⁴³	2016	9 residents	da Vinci Skill Simulator	VR				Correlation to PGY level and competency
Hussein et al.⁴⁴	2017	28 trainee videos, 28 faculty videos	PACE robotic prostatectomy scoring tool	GRS	Test blueprint, expert Delphi methodology		Reliability scores	Correlation to experience	Competency-based training, procedure specific assessment
Raison et al.⁴⁵	2017	59 novices, 6 intermediate, 8 experts	ICARS nontechnical skill scoring tool	GRS	Test blueprint, Delphi methodology, expert consensus		Reliability scores, item analysis	Correlation to experience, concurrent validity comparison to NOTSS
Mills et al.⁴⁶	2017	10 attending robotic surgeons	da Vinci Skill Simulator	VR				The correlation between the simulator scores and the GEARS scores	Absence of correlation between dVSS scores and OR scores (unfavorable consequence)

C-SATS = Crowd-sourced Assessment of Technical Skills; GEARS = Global Evaluation Assessment of Robotic Skills; RACE = Robotic Anastomosis and Competency Evaluation.

Table 4.

Simulation-Based Assessments in Laparoscopic Urologic Surgery

Authors	Year	Participants	Assessment tool	Type	Test content	Response processes	Internal structure	Relation to other variables	Consequences
Vassiliou et al.⁴⁷	2005	17 residents, 4 faculty	Lap skill scoring system (GOALS)	GRS	Test blueprint	Rater training	Reliability scores, rater variance	Correlation to experience	Formative and summative
Kommu et al.⁴⁸	2011	n/a	Lap Nephrectomy scoring system	GRS	Test blueprint			Correlation to level of training
Sweet et al.¹⁴	2012	35 trainees, 81 practicing urologists	AUA BLUS tasks	GRS	Test blueprint, expert consensus		Reliability scores	Correlation to experience, convergent evidence
Lee et al.⁴⁹	2012b	16 residents (uro and anesthesia)	High-fidelity lap nephrectomy simulation and technical GRS and NOTSS	Immersive simulation	Test blueprint, Delphi methodology, scenario development	Familiarity assessed, raters trained	Reliability scores	Correlation to level of training	Team-based training
Alwaal et al.⁵⁰	2015	12 residents	LapSim simulator				Reliability scores	Correlation with GRS for lap. radical nephrectomy	Negative transfer of skills to the clinical practice (unfavorable consequences)
Kowalewski et al.¹⁵	2016	27 med students, 42 residents, 18 fellows, 37 faculty	AUA BLUS tasks and EDGE device	EDGE metrics, GRS (GOALS)	Test blueprint		Reliability scores, item analysis
Lee et al.⁵¹	2017	99 residents, 6 faculty	AUA BLUS tasks	GRS (GOALS) and C-SATS		Rater training	Reliability scores	Correlation to level of training, correlation with C-SATS scores, standard setting

Table 5.

Simulation-Based Assessments in Miscellaneous Urologic Procedures

Authors	Year	Participants	Assessment tool	Type	Internal structure	Relation to other variables	Consequences
Grober et al.⁵²	2004	50 residents	Silicone tubing model and live rat vas deferens	Bench and live		Association with previous microsurgery experience
Todsen et al.⁵³	2013	76 medical students	Urethral Catheterization Simulator	Urethral Catheterization OSATS	Inter-rater reliability, intraclass correlation coefficient	Correlation with clinical practice experience	Possible transfer of skills to the real patients
Vernez et al.⁵⁴	2017	25 medical students	Lap MentorTM and Box Trainer	OSATS and GEARS and GOALS and C-SATS	Inter-rater agreement between expert assessment and C-SATS	Correlation between the crowd match rank with the final faculty match rank	Acceptance for urology residency

Table 6.

Demonstrates How Frequent the Five Validity Evidences Were Represented in the Reviewed Articles

The study	Test content, n (%)	Response processes, n (%)	Internal structure, n (%)	Relation to other variables, n (%)	Consequences, n (%)
Endourologic (11 articles)	6 (54.5)	1 (9)	2 (18)	11 (100)	1 (9)
Robotic (16 articles)	10 (62.5)	3 (18.7)	4 (25)	16 (100)	4 (25)
Laparoscopic (7 articles)	5 (71.4)	3 (42.8)	6 (85.7)	6 (85.7)	3 (42.8)
Miscellaneous (3 articles)	0	0	2 (66.7)	3 (100)	2 (66.7)
Total articles (37)	21 (56.8)	7 (18.9)	14 (37.8)	36 (97.3)	10 (27)

Future Directions

The majority of the published urologic literature on the validation of simulators or other simulation-based assessment tools seems to use the old framework of validity evidence. As a result, almost all studies reported validity evidence from relation to other variables in one form or another. Many studies also included, at least on a cursory level, evidence based on test content and internal structure. Unfortunately, most studies have not addressed validity evidence from response processes or consequential validity evidence and this represents a significant gap in the validity literature.

As the surgical training community moves toward a CBME paradigm enriched with simulation-based training and assessment methodology, it is important that the urologic community embraces the need for robust validity evidence to support the judgments of competency made using these simulation-based assessment tools, particularly if we are going to be making high-stake judgments of competence. As such, it is imperative that the urologic community moves away from the outdated and limited concept of “types” of validity and adopts the contemporary taxonomy of validity evidence, which espouses a unitary concept of validity where all validity is construct validity and in which validity evidence comes from various sources. We must understand that we are not looking to validate a simulator or test itself, but rather looking for validity evidence to support the judgments we make using the “scores” that result from the simulator or test within a specified context.

Ethics Statement

This study was conducted according to the Declaration of Helsinki 2013 and its amendments.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Abbreviations Used

References

Frank

, Snell

, Cate

, Holmboe

, Carraccio

, et al. Competency-based medical education: Theory to practice. Med Teach, 2010; 32:638–645.

McGaghie

, Issenberg

, Petrusa

, Scalese

. A critical review of simulation-based medical education research: 2003–2009. Med Educ, 2010; 44:50–63.

McDougall

. Validation of surgical simulators. J Endourol, 2007; 21:244–247.

American Educational Research

Association

, American Psychological

Association

, and National Council on Measurement in

Education

. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association, 2014.

Messick

. Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. Am Psychol, 1995; 50:741–749.

Sweet

, Hananel

, Lawrenz

. A unified approach to validation, reliability, and education study design for surgical technical skills training. Arch Surg, 2010; 145:197–201.

Korndorffer

Jr. , Kasten

, Downing

. A call for the utilization of consensus standards in the surgical education literature. Am J Surg, 2010; 199:99–104.

American Educational Research

Association

, American Psychological

Association

, & National Council on Measurement in

Education

. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association, 1999.

Downing

. Validity: On meaningful interpretation of assessment data. Med Educ, 2003; 37:830–837.

10.

Palter

, Grantcharov

. Simulation in surgical education. CMAJ, 2010; 182:1191–1196.

11.

Fraser

, Klassen

, Feldman

, Ghitulescu

, Stanbridge

, Fried

. Evaluating laparoscopic skills: Setting the pass/fail score for the MISTELS system. Surg Endosc, 2003; 17:964–967.

12.

Scott

, Ritter

, Tesfay

, Pimentel

, Nagji

, Fried

. Certification pass rate of 100% for fundamentals of laparoscopic surgery skills after proficiency-based training. Surg Endosc, 2008; 22:1887–1893.

13.

American Board of Surgery Booklet of Information–Surgery. www.absurgery.org/xfer/BookletofInfo-Surgery.pdf (Accessed July 25, 2017 ).

14.

Sweet

, Beach

, Sainfort

, Gupta

, Reihsen

, Poniatowski

, McDougall

. Introduction and validation of the American Urological Association Basic Laparoscopic Urologic Surgery skills curriculum. J Endourol, 2012; 26:190–196.

15.

Kowalewski

, Sweet

, Lendvay

, Menhadji

, Averch

, et al. Validation of the AUA BLUS Tasks. J Urol, 2016; 195(4 Pt 1):998–1005.

16.

Brinkman

, Tjiam

, Schout

, Muijtjens

, Van Cleynenbreugel

, Koldewijn

, Witjes

. Results of the European Basic Laparoscopic Urological Skills examination. Eur Urol, 2014; 65:490–496.

17.

Holst

, Kowalewski

, White

, Brand

, Harper

, et al. Crowd-sourced assessment of technical skills: An adjunct to urology resident surgical simulation training. J Endourol, 2015; 29:604–609.

18.

White

, Kowalewski

, Dockter

, Comstock

, Hannaford

, Lendvay

. Crowd-sourced assessment of technical skill: A valid method for discriminating basic robotic surgery skills. J Endourol, 2015; 29:1295–1301.

19.

Ghani

, Miller

, Linsell

, Brachulis

, Lane

, Sarle

, et al; Michigan Urological Surgery Improvement

Collaborative

. Measuring to improve: Peer and crowd-sourced assessments of technical skill with robot-assisted radical prostatectomy. Eur Urol, 2016; 69:547–550.

20.

Ghani

, Comstock

, Miller

, Dunn

, Kim

, et al; Michigan Urological Surgery Improvement Collaborative. Technical skills assessment of surgeons performing robot-assisted radical prostatectomy: Relationship between crowdsourced review and patient outcomes. J Urol, 2017; 197 (4S, Supplement):e609.

21.

Sweet

, Kowalewski

, Oppenheimer

, Weghorst

, Satava

. Face, content and construct validity of the University of Washington virtual reality transurethral prostate resection trainer. J Urol, 2004; 172(5 Pt 1):1953–1957.

22.

Rashid

, Kowalewski

, Oppenheimer

, Ooms

, Krieger

, Sweet

. The virtual reality transurethral prostatic resection trainer: Evaluation of discriminate validity. J Urol, 2007; 177:2283–2286.

23.

Kishore

, Pedro

, Monga

, Sweet

. Assessment of validity of an OSATS for cystoscopic and ureteroscopic cognitive and psychomotor skills. J Endourol, 2008; 22:2707–2711.

24.

Hudak

, Landt

, Hernandez

, Soderdahl

. External validation of a virtual reality transurethral resection of the prostate simulator. J Urol, 2010; 184:2018–2022.

25.

Brewin

, Ahmed

, Khan

, Jaye

, Dasgupta

. Face, content, and construct validation of the Bristol TURP trainer. J Surg Educ, 2014; 71:500–505.

26.

Blankstein

, Lantz

, D'A Honey

, Pace

, Ordon

, Lee

. Simulation-based flexible ureteroscopy training using a novel ureteroscopy part-task trainer. Can Urol Assoc J, 2015; 9:331–335.

27.

Aydin

, Muir

, Graziano

, Khan

, Dasgupta

, Ahmed

. Validation of the GreenLight™ Simulator and development of a training curriculum for photoselective vaporisation of the prostate. BJU Int, 2015; 115:994–1003.

28.

Argun

, Chrouser

, Chauhan

, Monga

, Knudsen

, et al. Multi-institutional validation of an OSATS for the assessment of cystoscopic and ureteroscopic skills. J Urol, 2015; 194:1098–1105.

29.

Aloosh

, Noureldin

, Andonian

. Transfer of flexible ureteroscopic stone-extraction skill from a virtual reality simulator to the operating theatre: A pilot study. J Endourol, 2016; 30:1120–1125.

30.

Noureldin

, Elkoushy

, Aloosh

, Carrier

, Elhilali

, Andonian

. Objective structured assessment of technical skills for the photoselective vaporization of the prostate procedure: A pilot study. J Endourol, 2016; 30:923–929.

31.

Noureldin

, Fahmy

, Anidjar

, Andonian

. Is there a place for virtual reality simulators in assessment of competency in percutaneous renal access?. World J Urol, 2016; 34:733–739.

32.

Gavazzi

, Bahsoun

, Van Haute

, Ahmed

, Elhage

, et al. Face, content and construct validity of a virtual reality simulator for robotic surgery (SEP Robot). Ann R Coll Surg Engl, 2011; 93:152–156.

33.

Hung

, Patil

, Zehnder

, Cai

, Ng

, Aron

, Gill

, Desai

. Concurrent and predictive validation of a novel robotic surgery simulator: A prospective, randomized study. J Urol, 2012; 187:630–637.

34.

Lee

, Mucksavage

, Kerbl

, Huynh

, Etafy

, McDougall

. Validation study of a virtual reality robotic simulator—Role as an assessment tool?. J Urol, 2012; 187:998–1002.

35.

Kelly

, Margules

, Kundavaram

, et al. Face, content, and construct validation of the da Vinci skills simulator. Urology, 2012; 79:1068–1072.

36.

Goh

, Goldfarb

, Sander

, et al. Global evaluative assessment of robotic skills: Validation of a clinical assessment tool to measure robotic surgical skills. Urology, 2012; 187:247–252.

37.

Hung

, Jayaratna

, Teruya

, et al. Comparative assessment of three standardized robotic surgery training methods. BJU Int, 2013; 112:864–871.

38.

Alzahrani

, Haddad

, Alkhayal

, et al. Validation of the da Vinci surgical skill simulator across three surgical disciplines: A pilot study. Can Urol Assoc J, 2013; 7:E520–E529.

39.

Foell

, Finelli

, Yasufuku

, et al. Robotic surgery basic skills training: Evaluation of a pilot multidisciplinary simulation-based curriculum. Can Urol Assoc J, 2013; 7:430–434.

40.

Raza

, Froghi

, Chowriappa

, et al. Construct validation of the key components of Fundamental Skills of Robotic Surgery (FSRS) curriculum—A multi-institution prospective study. J Surg Educ, 2014; 71:316–324.

41.

Whittaker

, Aydin

, Raison

, Kum

, Challacombe

, Khan

, et al. Validation of the RobotiX Mentor robotic surgery simulator. J Endourol, 2016; 30:338–346.

42.

Aghazadeh

, Mercado

, Pan

, Miles

, Goh

. Performance of robotic simulated skills tasks is positively associated with clinical robotic surgical performance. BJU Int, 2016; 118:475–481.

43.

Noureldin

, Stoica

, Kassouf

, Tanguay

, Bladou

, Andonian

. Incorporation of the da Vinci Surgical Skills Simulator at urology Objective Structured Clinical Examinations (OSCEs): A pilot study. Can J Urol, 2016; 23:8160–8166.

44.

Hussein

, Ghani

, Peabody

, Sarle

, Abaza

, Eun

, et al.; Michigan Urological Surgery Improvement Collaborative and Applied Technology Laboratory for Advanced Surgery Program. Development and validation of an objective scoring tool for robot-assisted radical prostatectomy: Prostatectomy assessment and competency evaluation. J Urol, 2017; 197:1237–1244.

45.

Raison

, Wood

, Brunckhorst

, Abe

, Ross

, et al. Development and validation of a tool for non-technical skills evaluation in robotic surgery-the ICARS system. Surg Endosc, 2017; 31:5403–5410.

46.

Mills

, Hougen

, Bitner

, Krupski

, Schenkman

. Does robotic surgical simulator performance correlate with surgical skill?. J Surg Educ, 2017;pii: S1931–S7204:30134–4.

47.

Vassiliou

, Feldman

, Andrew

, Bergman

, Leffondré

, Stanbridge

, Fried

. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg, 2005; 190:107–113.

48.

Kommu

, Emara

, James

, Finnigan

, Cartlidge

, et al; Nair for the Stilus Academic Research Group Sarg

. An objective scoring system for laparoscopic nephrectomy. J Endourol, 2011; 25:1497–1502.

49.

Lee

, Mucksavage

, Canales

, McDougall

, Lin

. High fidelity simulation based team training in urology: A preliminary interdisciplinary study of technical and nontechnical skills in laparoscopic complications management. J Urol, 2012; 187:1385–1391.

50.

Alwaal

, Al-Qaoud

, Haddad

, Alzahrani

, Delisle

, Anidjar

. Transfer of skills on LapSim virtual reality laparoscopic simulator into the operating room in urology. Urol Ann, 2015; 7:172–176.

51.

Lee

, Andonian

, Pace

, Grober

. Basic laparoscopic skills assessment study: Validation and standard setting among Canadian Urology Trainees. J Urol, 2017; 197:1539–1544.

52.

Grober

, Hamstra

, Wanzel

, Reznick

, Matsumoto

, et al. Laboratory based training in urological microsurgery with bench model simulators: A randomized controlled trial evaluating the durability of technical skill. J Urol, 2004; 172:378–381.

53.

Todsen

, Henriksen

, Kromann

, Konge

, Eldrup

, Ringsted

. Short- and long-term transfer of urethral catheterization skills from simulation training to performance on patients. BMC Med Educ, 2013; 13:29.

54.

Vernez

, Huynh

, Osann

, Okhunov

, Landman

, Clayman

. C-SATS: Assessing Surgical Skills Among Urology Residency Applicants. J Endourol, 2017; 31(S1):S95–S100.