Abstract
Simulation for education and training in health-care professions has been widely applied. However, its value as an assessment tool for competence is not fully known. Logistical barriers of simulation-based assessments have led some health-care organizations to utilize computer-based case simulations (CCSs) for assessment. This article provides a review of the literature on the identification of psychometrically sound, CCS instruments designed to measure decision-making competence in health-care professionals. CINAHL, MEDLINE, and Ovid databases identified 84 potentially relevant articles published between January 2000 and May 2017. A total of 12 articles met criteria for inclusion in this review. Findings of these 12 articles indicate that summative assessment in health care using CCSs in the form of clinical scenarios is utilized to assess higher order performance aspects of competence in the form of decision-making. Psychometric strength was validated in eight articles and supported by four replication studies. Two of the eight articles reported evidence of construct validity and support the need for evidence based on a theoretical framework. This literature review offers implications for further research on the use of CCS tools as a method for assessment of competence in health-care professionals and the need for psychometric evidence to support it.
Keywords
Competence and patient safety in health care was brought to the forefront in 2000 with the Institute of Medicine’s (IOM) To Err is Human: Building a Safer Health System. This report identified a systems-based, nationwide approach to quality improvement, one of which centered upon performance standards for patient safety (IOM, 2000). A subsequent report in 2001, Crossing the Quality Chasm: A New Health System for the 21st Century, challenged health-care organizations to respond to patient needs through the use of the best evidence (IOM, 2001). One such challenge contained in this report pointed toward incorporation of lifelong learning and the evaluation of performance standards in the processes of licensing and certification for health-care professionals (IOM, 2001).
In regard to certification, this standardized movement and level of accountability toward validation of competence prompted medical and nursing specialties to incorporate more rigorous methods in the maintenance of certification that focused on clinical practice assessment (Brennan et al., 2004). Simulation has been widely applied in education and training, but its value as an assessment tool is not fully known. Further, assessment with simulation that involves the use of computerized patient case scenarios is a novel and underreported approach in the assessment of competence (Cook & Triola, 2009). The purpose of this review was to identify psychometrically sound, computer-based case simulation (CCS) instruments designed to measure decision-making competence in health-care professionals and to identify studies that have utilized the identified instruments in other populations and settings.
Background
Simulation has a long-standing history in educational-based environments, with a goal often being the demonstration of competence, to include decision-making, through replication of patient care scenarios in a realistic environment (Steadman & Huang, 2012). Miller (1990) pointed out Webster’s definition of competence as “having sufficient knowledge, judgment, skill, or strength for a particular duty” (p. S63). He further elucidated that competence is not merely knowing (Harden & Gleeson, 1979) but includes the skill of acquiring information from a variety of diagnostic sources, having the ability to analyze and interpret that data, and translating such findings into a patient management plan (Harden & Gleeson, 1979).
Since the 1970s, simulation-based methods of assessment operationalized the assessment of clinical practice and decision-making with the development of the objective structured clinical examination (OSCE; Turner & Dankoski, 2008). The OSCE was developed in 1975 to assess decision-making competence of medical students on simulated patients (Turner & Dankoski, 2008). Advantages include its ability to directly assess aspects of competence beyond cognitive knowledge such as performance and patient management (Harden & Gleeson, 1979; Miller, 1990; Turner & Dankoski, 2008). These higher order domains of competence were introduced by George E. Miller in a hierarchical pyramid in 1990 (Appendix Figure A1). The higher aspects of the pyramidal framework, to include performance (“shows how”) and action (“does”), are the aspects where evaluation methods must be targeted in the assessment of competence. Miller (1990) further posited that demonstration of learning is best measured with simulation-based clinical scenarios and patient management problems.
The OSCEs and simulation-based methods of assessment have provided value in the demonstration of competence, however, logistical constraints and conflicting evidence regarding validity and reliability characteristics represent barriers for effective utilization in high-stakes testing environments such as those involving licensure and certification (Steadman & Huang, 2012; Turner & Dankoski, 2008). Recognizing these barriers, health-care credentialing organizations have taken steps toward an alternative use of simulation: CCSs or clinical scenario-type questions, which aim to circumvent the logistical and psychometric challenges of live simulation (Boulet, 2008; National Board of Certification and Recertification for Nurse Anesthetists, 2016). Often referred to as virtual patients (VPs), these CCSs have played an increasing role in education and assessment in the past decade. However, as pointed out by Cook and Triola (2009), their effective use requires evidence in the form of validation in order to guide the design and successful implementation into educational environments and high-stakes assessments.
While myriad simulation-based assessment tools have been identified in the literature, psychometric evidence is sparse (Ahmed et al., 2011; Cook, Brydges, Zendejas, Hamstra, & Hatala, 2013; Cook, Erwin, & Triola, 2010; Cook, Zendejas, Hamstra, Hatala, & Brydges, 2014; Duff, Miller, & Bruce, 2016; Edler et al., 2009; Feldman, Sherman, & Fried, 2004; Kardong-Edgren, Adamson, & Fitzgerald, 2010; Ryall, Judd, & Gordon, 2016; Van Nortwick et al., 2010). Such is also the case for simulation-based assessment tools that are computerized in the form of VPs (Cook et al., 2013, 2014). Further, the identification of computerized simulation-based assessment tools that correlate with a formal theoretical framework such as Miller’s pyramid is not only understudied but also responds to Cook, Zendejas, Hamstra, Hatala, and Brydges’s (2014) recommendation for use of a formal validity framework for interpretation of validity evidence in guiding these tools’ designs. We have attempted to address these gaps with a review of the literature during the years of 2000–2017.
Method
Search Strategy
In March, April, and May, 2017, a comprehensive literature search was conducted in MEDLINE, the Cumulative Index of Nursing and Allied Health (CINAHL), and Ovid for the years of 2000–2017. A search was conducted to identify CCS methods that identified at least one psychometric analysis in which to assess decision-making among health-care professionals. The key words were combined with logical operators “AND” and “OR” and included the following: computer-based, case simulations, clinical scenarios, decision-making, virtual patients, and assessment. This yielded 84 titles, 18 from CINAHL and 66 from MEDLINE. The abstracts for each of the 84 articles were reviewed. Those that discussed assessment methods that were computer-based and used the identified key words were included for review. An additional search was conducted to identify studies that were replications of the instruments identified in the original search. To accomplish this, the key word replication was added to the search and the entire reference list of included studies was reviewed to identify whether the included articles were replications of previously identified instruments. The included articles were then entered into the search engine MEDLINE, one by one, and articles in the “Cited By” section of the search were scanned in order to identify similar instruments and/or studies that utilized the original instrument in the development of a subsequent instrument. This yielded 15 additional articles. All articles were reviewed and included if the article contributed to further validation and understanding of the original instrument.
Inclusion and Exclusion Criteria
The search was limited to English-language articles. The following inclusion criteria were used: (1) original or primary data on instrument development or the study either replicated a previous pilot study or contributed to further validation and understanding of the original instrument, (2) the discussion of at least one measure of psychometric analysis (validity and/or reliability), (3) the instrument was developed to measure decision-making in the health-care profession, and (4) the instrument was computer based and involved clinical patient scenarios. The articles were excluded if the instrument involved the use of a simulator or mannequin, was not web based, or if the article was unpublished (such as abstracts or dissertations). Articles pertaining to the U.S. Medical Licensing Examination (USMLE, 2016) were also excluded related to confidentiality issues in association with USMLE testing data.
The articles retrieved reflect psychometrically analyzed, CCS instruments that measure decision-making in health-care professionals to include original and replication studies. The articles reporting on psychometric data were compared and critiqued on the following categories: study aims, instrument characteristics and scoring, sample, critical appraisal score, and study results in the form of reliability and/or validity.
A total of 12 articles met the inclusion criteria and were reviewed in full. Of the 12 articles, 4 articles represent studies that either utilized the original instrument in a subsequent study (Klemenc-Ketis & Kersnik, 2014) or based their study on a previous psychometrically analyzed instrument in a different population in order to provide further validation of a similar instrument (Fida & Kassab, 2015; Lin et al., 2013; Oliven, Nave, Gilad, & Barch, 2011).
Critical Appraisal
Methodological quality of articles reporting psychometric evidence was appraised utilizing a modified critical appraisal tool. The original tool, the McMasters Critical Review Form for Quantitative Studies (Law et al., 1998), was modified by Ryall, Judd, and Gordon (2016) in a systematic review of simulation-based assessments in health profession education. Further modification of Ryall et al. (2016) was performed in order to fit the critical appraisal tool with instruments that measured decision-making. The tool used in the current narrative review was based on 14 items, and scoring was based on the scoring method used by Ryall et al. (2016). Each item was scored dichotomously (yes = 1/no = 0), and each was also scored 0 (either not addressed or not applicable). Of the articles reporting psychometric analysis, each was independently scored, and the critical appraisal score was used as a measure of methodological rigor, with a higher score indicative of higher methodological rigor.
Results
Study Aims
Eight of 12 articles expressed a desire to evaluate their instrument in terms of validity and/or reliability. The remaining four articles aimed to describe or evaluate the tool for its use in future studies (these are described under replication studies). Of note, only 5 of the 12 articles made reference to the ultimate aim of impacting society through translational patient outcomes with more sophisticated measures of competence (Guagnano, Merlitti, Manigrasso, Pace-Palitti, & Sensi, 2002; Kassab, Fida, & Ansari, 2014; Lin et al., 2013; Oliven et al., 2011; Yu, Straus, & Brydges, 2015). Three of the articles described the reliability performance of the instrument compared to a previously validated tool such as standardized patients (Guagnano et al., 2002), OSCEs (Fida & Kassab, 2015; Oliven et al., 2011), multiple-choice questions (MCQs; Fida & Kassab, 2015), and real patients (Fida & Kassab, 2015). One study aimed to report criterion validity evidence based on a previously validated tool of diagnostic reasoning (Jerant & Azari, 2004). Three studies collected construct validity evidence by way of factor analysis, which helped to identify the underlying variables for the assessment of decision-making (Kassab et al., 2014; Klemenc-Ketis & Kersnik, 2013; Yu et al., 2015). Yu, Straus, and Brydges (2015) collected content validity evidence with development of the scoring method by subject matter experts. One study reported the statistical analysis and scores of the case simulation’s diagnostic test-ordering portion, and how this information can be used as a formative feedback tool to promote learning (Kreiter et al., 2011).
While all studies evaluated the instrument performance from multiple perspectives, the psychometric measures of both validity and reliability were reported in only two of the eight studies (Klemenc-Ketis & Kersnik, 2013; Yu et al., 2015). This may be a reflection of the novelty of these instruments, as all eight of the identified instruments that reported on psychometric evidence were newly developed and piloted in these studies.
Instrument Characteristics and Scoring
The design of each instrument varied widely, but each was computer based and case based, involving clinical scenarios for which the examinee was required to utilize reasoning (decision-making) skills in order to manage each clinical case. Each instrument’s domains of decision-making included aspects of performance within each case: the ability to obtain a history and physical, diagnostic tests, differential diagnoses, ability to reach a diagnosis, and treatment interventions. All of the eight instruments included web-based virtual clinical vignettes; however, only three of the eight instruments were interactive, meaning the computer software allowed for patient management decisions by the examinee, while providing real-time and accurate feedback based on those decisions (Jerant & Azari, 2004; Oliven et al., 2011; Yu et al., 2015). The instrument in the Kassab et al. (2014) article was somewhat interactive as it provided performance feedback, but that feedback was generated after completing the case study versus real time as the examinee worked through the scenario.
The remaining four instruments (Fida & Kassab, 2015; Guagnano et al., 2002; Klemenc-Ketis & Kersnik, 2013; Kreiter et al., 2011) involved decision options that the examinee could select based on the clinical scenario that was presented. The Multimedia Integrated Pilot Project (MIPP) required the examinee to rate each option in terms of the appropriateness of each choice for the respective scenario; it also involved multiple-choice and true/false questions in terms of differential diagnoses and interpretation of diagnostic studies (Guagnano et al., 2002). The CCSs in the Fida and Kassab (2015) study, termed the DxR Clinician Program, required the examinee to collect a patient history, conduct a virtual physical examination, and order diagnostics all based on the opening case scenario of the VP’s chief complaint. The virtual case-based assessment tool (Klemenc-Ketis & Kersnik, 2013) was essentially a report that the examinee prepared in a predefined format after being presented with a computerized clinical scenario. The examinee was required to work through the scenario and reach a diagnosis by the end. Laboratory Computer Assisted Patient Simulation (LabCAPS) focused much of its assessment on the reasoning skills underlying a diagnostic laboratory test. It featured a simulated test-ordering checklist from which the examinee could choose. In addition, it incorporated measurement objectives based on the same domains of performance as the other eight instruments (Kreiter et al., 2011).
Scoring also varied widely for each design modality. The most straightforward scoring of all eight instruments was the MIPP (Guagnano et al., 2002). Scoring was based simply on combining the number of correct answers from the CCS and the standardized patient simulation. The total possible point was 110, with a pass–fail cutoff of 66/110. Scoring for the CCSs in the Kassab et al. (2014) and Fida and Kassab (2015) articles was based on an overall performance score among three categories: clinical reasoning, diagnostic performance, and patient management. While each category was scored differently, the cumulative final score was weighted: clinical reasoning = 50%, diagnostic performance = 40%, and patient management = 10%. Scoring for two of the eight studies was based on a Likert-type scale that was based on the domains of decision-making for respective clinical scenarios (Klemenc-Ketis & Kersnik, 2013; Yu et al., 2015). The Likert-type scales differed slightly in regard to human raters and rating by the simulator; the virtual case-based assessment tool utilized a 5-point Likert-type scale from 1 = not acceptable to 5 = excellent, based on 10 items related to the decision-making process. Student performance was independently assessed by two human instructors with the final data set consisting of the two instructors’ mean scores on each item (Klemenc-Ketis & Kersnik, 2013). The diabetic ketoacidosis (DKA) simulator and scoring system was assessed on a 3-point Likert-type scale from 1 = unacceptable to 3 = acceptable. Scoring was based on each of the seven identified domains of DKA management for which the simulator tabulated the percentage of correct actions, identified critical errors, and then calculated a 3-point scoring scale per item ranging from 18 to 54. Scores of 18 were indicative of unacceptable performance, and scores of 54 were indicative of acceptable performance in all performance domains (Yu et al., 2015). Scoring for both the web-based interactive VP (Oliven et al., 2011) and LabCAPS (Kreiter et al., 2011) was accomplished similar to the MIPP instrument (Guagnano et al., 2002), with an assigned mean score. Oliven, Nave, Gilad, and Barch (2011) did not describe specific details as to how the mean grade was reached, but scores of LabCAPS (Kreiter et al., 2011) were reached by assigning a 1 for each correctly ordered diagnostic test and a −0.25 penalty for ordering a test that was keyed as incorrect or failing to order a test that was keyed as correct. The other phases of the study involving diagnostic hypothesis, diagnosis, and treatment were not scored in the pilot study.
Scoring of the web-based DxR Clinician (Jerant et al., 2004) was unique in that evaluation of the student’s performance began at arrival of the diagnosis, which occurred after the student moved through the case based on a yes/no algorithm. The student was either “moved up” or “moved down” based on the response given. An additional score was derived from points based on achievement at five different nodes in the scoring algorithm.
Sample
The sample sizes ranged widely from 13 (Kreiter et al., 2011), in their pilot study, to 262 (Oliven et al., 2011). Seven of the eight samples included medical students. The one study that did not include medical students was the study of Guagnano, Merlitti, Manigrasso, Pace-Palitti, and Sensi (2002), which included medical school graduates. Yu et al. (2015) included Years 1–3 medical students but also included postgraduates in Year 2 of internal medicine residency as well as attending physicians in the specialty of endocrinology.
The settings were largely medical schools within university settings. However, the validation phase of the Yu et al. (2015) study was based in a large urban academic health sciences center. It is interesting to note that six of the eight studies took place internationally (Guagnano et al., 2002—Italy; Kassab et al., 2014—Bahrain; Fida & Kassab, 2015—Bahrain; Klemenc-Ketis & Kersnik, 2013—Slovenia; Oliven et al., 2011—Israel; and Yu et al., 2015—Canada). One study, Kreiter et al. (2011), was U.S.-based at the University of Kansas (for the pilot test administration) and at the University of Iowa (for the evaluation of the software as an educational tool). The other U.S.-based study was conducted at the University of California, Davis, School of Medicine (Jerant et al., 2004).
Critical Appraisal
The critical appraisal tool score was used as a measure of methodological rigor. Each of the eight articles was independently scored based on the modified McMasters Critical Review Form for Quantitative Studies (Ryall et al., 2016). All of the studies used convenience samples, and only one study reported power in order to detect statistical significance (Yu et al., 2015). The lack of reporting of power in the remaining seven studies is not unexpected based on the novelty of each of these instruments, that is, there were no previous data on these instruments to determine the effect size. Yu et al. (2015) reported an estimated adequate sample size based on a power of 0.80 and explained that a power of 0.80 was justified based on the wide variation in expertise of the participants.
Critical appraisal scores ranged from 8 (Jerant et al., 2004; Oliven et al., 2011) to 12 (Yu et al., 2015). Scores were higher, with a rating of 10–12, for three studies that reported both validity and reliability measures (Klemenc-Ketis & Kersnik, 2013; Kreiter et al., 2011; Yu et al., 2015). Yu et al. (2015) is the only article that reported procedures for dropouts from the study and is also the only study that reported an institutional review board approval as well as informed consent.
Reliability and Validity Results
Reliability measures were reported in all but two studies (Jerant et al., 2004; Kassab et al., 2014). However, the aim of one study was to establish construct validity alone in order to identify the constructs of decision-making competence in their instrument (Kassab et al., 2014), while Jerant et al. (2004) aimed to establish validity based on a criterion measure of diagnostic reasoning ability, the Diagnostic Thinking Inventory. Authors in both studies identified this as one limitation in their studies. The remaining six studies reported reliability measures utilizing interrater reliability (Guagnano et al., 2002); intraclass correlation coefficient (Klemenc-Ketis & Kersnik, 2013); internal consistency reliability using Cronbach’s α (Fida & Kassab, 2015; Klemenc-Ketis & Kersnik, 2013; Oliven et al., 2011; Yu et al., 2015); and correlation with other measures such as curriculum scores (Guagnano et al., 2002), OSCEs (Fida & Kassab, 2015; Oliven et al., 2011), MCQs (Fida & Kassab, 2015), short-answer questions (Fida & Kassab, 2015), and real patients (Fida & Kassab, 2015).
Validation of instruments was established in three of the eight studies (Kassab et al., 2014; Klemenc-Ketis & Kersnik, 2013; Yu et al., 2015). Validity measures included variations of factor analysis in all studies in order to establish construct validity. Of the three studies reporting on validity measures, Yu et al. (2015) was the most thorough in the discussion of other validity measures to include content validity and criterion-related validity with other measures (Pearson’s correlation coefficient). Guagnano et al. (2002), Fida and Kassab (2015), and Oliven et al. (2011) reported correlation with other measures as mentioned; however, they did not identify this as a measure of criterion-related validity but rather reported it as a measure of reliability. Jerant et al. (2004) failed to establish criterion validity and identified this as a methodological limitation, however, also pointed out that their results are the beginning of filling an important research gap in validation evidence of CCS instruments.
Replication Studies
Four studies in this review did not report psychometric evidence but are included in this review as a replication of one of the validated instruments or as a means of identifying the study’s contributions toward the development of the validated tool. Klemenc-Ketis and Kersnik, in their 2013 pilot study, developed and validated a tool to be used in the assessment of undergraduate students’ decision-making based on questions asked by real patients in a virtual setting. In 2014, the authors utilized their previously validated tool in a sample of 147 fourth-year medical students for the purpose of assessing the decision-making process of family practice consultation and recognizing potential deficiencies (Klemenc-Ketis & Kersnik, 2014). This tool enabled the authors to conclude that medical education should devote more time to the consultative and holistic approach to health promotion in addition to the clinical management of problems.
The web-based VP system designed by Oliven et al. (2011) for educational purposes and for exams elaborated on the value of previous VP instruments’ abilities to replace the standard multiple-choice system through interactive dialogue with text entry (Bergin & Fors, 2003; Courteille, Bergin, Stockeld, Ponzer, & Fors, 2008). However, these studies’ implications for future research included assessing their impact on learning through assessment (Bergin & Fors, 2003) and the need for students to be trained beforehand to eliminate the need for assistants (Courteille et al., 2008). Oliven et al. (2011) addressed these issues through the design of multiple scenarios for practice at home, designing user-friendly capabilities with their instrument, and allowing for real-time feedback to the student. In addition, these authors stressed the importance of the material being directly related to the final examination in order to motivate the students to utilize the VP training system. This was shown to be evident by nearly all students’ entry to the VP site prior to the final examination.
Lin et al. (2013) conducted a feasibility study based on the previous pilot studies of Courteille, Bergin, Stockeld, Ponzer, and Fors (2008) and Oliven et al. (2011). Based on the success of these studies, authors aimed to assess the educational effectiveness of an integrated OSCE (iOSCE) using standardized patient and VP to improve students’ clinical skills and perceptions of the iOSCE. Conclusions were positive in that 100% of the users of the iOSCE found it to be very satisfactory in the evaluation of competency, and these authors also concluded that further refinement would be beneficial prior to implementation in a standard exam (Lin et al., 2013).
Fida and Kassab (2015) examined the psychometric properties of the CCS, DxR Clinician, for the assessment of medical students in a problem-based, integrated multisystem module. This study was a replication of Jerant et al. (2004), in which no correlation was found between students’ clinical reasoning and their scores on the Diagnostic Thinking Inventory. Fida and Kassab (2015), however, established internal consistency reliability of the DxR Clinician and its measurement of competence in a construct similar to those measured by the OSCE but different from constructs measured through real patient encounter examinations.
Discussion
This review identified simulation-based instruments that are computer based, clinically relevant in the form of patient case scenarios, and aimed to measure the construct of decision-making. There is not only a lack of evidence about the effectiveness of simulation for assessment of competence (Boulet & Swanson, 2004; Clauser, Margolis, & Swanson, 2002), but there also exists a paucity of validity evidence for computer-based simulations in the form of patient case scenarios. This review contributed to the same.
The review had three primary strengths. First, instruments identified in the review were developed to measure the higher construct of decision-making competence, which supports a theoretical framework described by Miller (1990). All instruments were CCSs such as VPs. Second, three studies with high critical appraisal scores presented strong psychometric evidence, with two of the three reporting on both reliability and validity (Klemenc-Ketis & Kersnik, 2013; Yu et al., 2015). The two studies reporting strong evidence of construct validity support the need for validity evidence which aims to measure latent variables such as decision-making (Klemenc-Ketis & Kersnik, 2013; Kassab et al., 2014). Third, the inclusion of replication studies in this review strengthens the external validity of the original instrument and contributes to understanding the development and validation of other instruments which aim to measure a similar construct.
This review has limitations. The inclusion criteria were somewhat restrictive in that we sought to identify instruments in which authors aimed to measure the latent construct of decision-making using computer-based clinical patient scenarios. In addition, we aimed to identify instruments in their original state in order to describe the psychometric measures of the instrument. These criteria narrowed the results to just eight instruments in a specific health-care field: medicine. We cannot exclude the possibility of bias, in that the sample sizes in these studies were convenience samples of medical students conducted within their own medical schools. Finally, quality evidence of the instruments was based on a critical appraisal tool that was modified by the authors to fit the tool to instruments that measured decision-making. This cannot fully exclude the perception of bias. Other tools in the literature (Medical Education Research Quality Instrument, Quality Assessment of Diagnostic Accuracy Studies) may have guided a more accurate appraisal of the studies for methodological rigor.
This review supports prior findings that validity evidence for CCSs is sparse, often lacks a theoretical or validity framework from which to guide and interpret evidence, and is concentrated within specific specialties such as medicine (Cook & Triola, 2009, 2013, 2014). The present review provides a detailed summary of psychometric evidence and identified studies, past and present, that either contributed to the development and understanding of the instrument or replicated the instrument in a different setting and population. Few studies were found that have reported on the replication of their instruments.
Findings of this review contribute to existing psychometric evidence of web-based, technology-enhanced simulation methods which aim to measure latent constructs such as decision-making. While the current review includes a specified population of medicine, many health-care professionals participate in decision-making; however, tools specific to the criteria utilized in this review were not found in other professions. This review lends itself to the awareness that similar tools which may be utilized in other professions are widely underreported. In addition, findings of this review contribute to the findings of similar studies in that validation research in technology and simulation-based assessment is lacking.
Future researchers are encouraged to focus validity evidence on a clear theoretical framework to guide the evaluation of their instrument. Rigorous approaches to psychometric evidence advance the science of validation and help to assure the public that health-care professionals are held accountable to rigorous standards in the validation of competence. Further research on the existence of similar tools beyond medicine is a worthwhile endeavor for future research in health-care professions such as advanced practice nursing disciplines.
Footnotes
Appendix A
Appendix B
Review Articles.
| Authors | Subjects | Method | Results |
|---|---|---|---|
| Bergin and Fors (2003) | Three field tests: Test 1: N = 17 Test 2: N = 23 Test 3: N = 30 | Tests 1–3 involved a paper-based questionnaire post–ISP cases regarding opinions of the ISP as a learning tool. Tests 2 and 3 included interviews by the test leader. | Most students were positive to the ISP and the way it presented cases. Students stated the desire for future cases like ISP in their training. |
| Courteille, Bergin, Stockeld, Ponzer, and Fors (2008) | N = 110 medical students in Year 4 Pilot studya | Twelve-module OSCE used to test a VP-based system | The VP system showed reliability in differentiating between students’ performances but confounding influence by the use of VP assistants was noted; results were affected by students’ perceptions of the usefulness of assistance with the VP |
| Fida and Kassab (2015) a | N = 130 medical students in Year 4 | Internal consistency reliability of exam scores in different test item formats, interitem correlations between different exam scores using Pearson’s correlation, and predictors of CCS scores using hierarchical stepwise linear regression | CCS (α = .862), SAQs (α = .817), OSCE (α = .767), and PCE (α = .644); CCS scores all positively correlated with diagnostic performance, clinical reasoning, and management scores (p ≤ .01); OSCE scores predicted 33% and 35% of the variance in scores of clinical reasoning and patient management components of CCS examination, indicating the CCS tests a similar construct as that tested in OSCEs, but different from that which is measured by PCEs |
| Guagnano, Merlitti, Manigrasso, Pace-Palitti, and Sensi (2002) | N = 80 medical student graduates | Pearson’s correlation and interrater reliability | Step 1, Step 2, and total MIPP scores were moderately correlated with curriculum scores; moderate correlation between scores for Steps 1 and 2 (r = .44, p < .001). Interrater reliability: H&P, diagnoses, management = 0.8–0.9; communication skills = 0.7–0.8 |
| Jerant and Azari (2004) | N = 89 medical students in Year 3 | Diagnostic Thinking Inventory (DTI) to assess pre- and postclerkship scores using Wilcoxon signed rank test and Spearman rank correlation | Mean DTI subscale scores improved from beginning to end of the year (p < .001); no significant correlation between clinical reasoning score or level of diagnostic performance scores |
| Kassab et al. (2014) | N = 245 medical students in Year 4 | Exploratory and confirmatory factor analysis and structural equation modeling | Four underlying latent variables (constructs) were yielded from exploratory factor analysis → clinical skills, procedural skills, medical knowledge, and reasoning skills. Reasoning skills had heavy loadings from the CCS measures (clinical reasoning, diagnostic performance, and patient management); the four constructs moderately correlated with each other; confirmatory factor analysis indicated the four constructed tapped on a common construct (competence). Fit indices improved and reached a level of acceptable fitness in the third model with identification of a common domain called “competence.” |
| Klemenc-Ketis and Kersnik (2013) | N = 147 medical students in Year 4 Pilot studya | ICC, reliability using Cronbach’s α, and validity using factor analysis with principal component analysis | ICC = .742 and α = .848. Factor analysis with principal component analysis revealed four factors: initial assessment, physical examination planning, planning patient management, and patient education/involvement |
| Klemenc-Ketis and Kersnik (2014) a | N = 147 medical students in Year 4 | Ten-item assessment tool on a 5-point Likert-type scale with minimum score of 5 points and maximum score of 50 points | Mean total score 35.1 ±7.0 points. Students scored higher in initial assessment items and lower in patient education/involvement items. Female students scored significant higher on total assessment and on initial assessment/patient education/involvement (p < .001). |
| Kreiter et al. (2011) | N = 13 medical students in Year 2 (pilot); N = 143 medical students in Year 4 | Reliability using Cronbach’s α and generalizability after initial scoring using classical test theory analysis | Pilot: α − G = .70, which suggested the potential to generate valid scores with larger groups of examinees |
| Lin et al. (2013) a | N = 30 medical students in Year 1 of internship | Feasibility study to assess the educational effectiveness of an integrated OSCE (iOSCE) using standardized and VPs | 100% of the sample found the iOSCE to be satisfactory. The iOSCE was rated 4.4 and 4.5 on a scale of 1–5 on two clinical cases in terms of ease and helpfulness |
| Oliven, Nave, Gilad, and Barch (2011) a | N = 262 medical students (year not specified) | Reliability using Cronbach’s α and Pearson’s correlation | Human OSCE: α = .65–.74 VP: α = .82–.89 Correlation of VP, OSCE, and human OSCE (over 3 years): r = .68, p < .001 Computerized examination: α = .82–.89 |
| Yu, Straus, and Brydges (2015) | N = 75 to include medical students in Years 1 and 2, postgraduate trainees (n = 2), and endocrinologists |
Validity evidence using: content validity internal consistency and reliability using Cronbach’s α and exploratory factor analysis relations with other variables using analysis of variance and Pearson’s correlation consequences to determine a pass–fail cut point for discrimination between staff and trainees |
Content validity: Expert review by subject matter experts and a scoring system based on a preexisting framework; internal consistency and reliability: α = .795 for the seven subscales of DKA management; and relations with other variables: significant group difference, F(3, 71) = 11.2, p < .001 Consequences: Optimal cutoff score occurred at a simulator score of 75% demonstrating high sensitivity (cutoff scoreable to identify 94.7% of practicing physicians) but low specificity (cutoff scoreable to exclude 48.2% of trainees) |
Note. CCSs = computer-based case simulations; SAQ = short-answer questions; MCQs = multiple-choice questions; OSCE = objective structured clinical examination; PCE = patient-based clinical encounter; H&P = history and physical; VPs = virtual patients; DKA = diabetic ketoacidosis; ISP = interactive simulation of patient; MIPP = Multimedia Integrated Pilot Project; ICC = intraclass correlation coefficient.
aReplication studies.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
