Abstract
Objectives:
The provision of courses in complementary and integrative medicine (CIM) varies widely between medical schools. To effectively improve CIM education, it is essential to use robust evaluation instruments that measure the impact of different educational interventions. This review aimed to identify and critically appraise qualitative and quantitative instruments used to evaluate CIM courses in undergraduate medical education.
Methods:
A systematic review was conducted in PubMed/MEDLINE, LIVIVO, CINAHL/EBSCO, Scopus, Web of Science, and Ovid/Embase in January 2023. Eligible studies included complete evaluation instruments for medical students and reported learning outcomes. Data extraction included information on the study design, the educational intervention, the evaluation instrument, and the outcome measure (e.g., Kirkpatrick levels: 1 reaction, 2a attitudes, 2b knowledge/skills, 3 behavioral change, 4 results). Instruments were categorized as validated, nonvalidated, or qualitative and analyzed using descriptive statistics. Validated instruments were assessed for quality using standardized criteria.
Results:
Of the 1909 records identified, 263 were subjected to a full-text review and 100 studies met the inclusion criteria. Twenty-seven studies reported on 14 validated instruments, 7 studies reported on qualitative, and 66 reported on nonvalidated instruments. Most were conducted in the United States (31) and Europe (28), 51 were cross-sectional studies, and 42 were intervention studies. Most of the instruments were self-administered (50), addressed general aspects of CIM (53), and assessed student attitudes (74). None of the validated instruments covered Kirkpatrick level 1, one covered level 3. Measurement of levels 2b and 3 was usually based on subjective self-assessment. Qualitative instruments covered the widest range of outcomes overall. Validated instruments often had good content validity and internal consistency, but lacked reliability and responsiveness. Revalidation of translated or modified instruments was mostly inadequate.
Discussion:
This structured and comprehensive set of existing instruments provides a starting point for the further development of CIM course evaluation in undergraduate medical education. Future studies should prioritize the measurement of higher-level learning outcomes, such as behavioral change and impact on patient care. Comparative intervention studies between medical schools or with pre–post designs and follow-up evaluations are needed to assess the effectiveness of different teaching approaches. Regular revalidation of both existing and newly developed instruments is essential to ensure their applicability to different audiences and settings. Their structured and standardized use would promote evidence-based CIM training and understanding of its impact on student competencies and patient outcomes.
Introduction
The use of complementary medicine has increased significantly in recent decades and has recently stagnated at a high level of 45%–70% of the population. 1 –5 A number of positive effects on health care have been shown, for example, on chronic diseases 6 or on the use of antibiotics. 7 It is therefore important to bring together conventional and complementary approaches in a coordinated way. 8 The potential benefits of integrative health have already been demonstrated in a variety of settings, including pain management, symptom relief in cancer patients and survivors, and programs to promote healthy behaviors. 9 Complementary medicine is when a nonmainstream approach is used alongside conventional medicine. 9 Integrative medicine goes beyond this, as “Integrative medicine and health reaffirms the importance of the doctor-patient relationship, targets the whole person, is informed by evidence, and employs all appropriate treatment, preventive, health-promoting, or lifestyle approaches, health care professionals and disciplines to achieve optimal health and healing; equally emphasizing the art and science of healing. It is based on a social and democratic as well as a natural and healthy environment.” 10
A review by Joyce et al. shows that medical students perceive that the teaching of complementary and integrative medicine (CIM) is not adequately covered in their curriculum and that sufficient CIM knowledge and skills are important for their future as doctors to be able to advise and refer patients. Therefore, CIM content should be introduced or expanded upon. 11 Soliman and Bilszta conducted a scoping review and found an inconsistent inclusion of CIM teaching, even within medical schools in the same country. Medical curricula that include CIM teaching and learning vary widely in both content and delivery. 12,13 Although there is a need to equip medical students with CIM-related knowledge and practice, there is no research on the optimal way to impart this knowledge. 14 In the absence of such evidence, it is difficult to define clear objectives for a CIM teaching program within a medical curriculum and to develop strategies for its implementation. 15,16 For example, while interprofessional education is a promising approach in many countries for training health professionals together to improve patient-centered collaboration, little is known about how teaching CIM can promote both: interprofessional collaboration and CIM practice. 17,18 Well-designed and validated program evaluation instruments allow medical researchers to assess whether a program or teaching format has worked, how it has worked, why it has worked, and to gather more information about the implementation process of the educational intervention. By engaging in this process, the researcher will develop a sound understanding of the relationships between programs, the contexts in which they operate, and the outcomes that result. 19 There is currently no overview of the instruments available to evaluate CIM teaching. A systematic review from 2012 shows that high-quality program evaluations available at that time were scarce. 14
In order to be able to further develop teaching in the field of CIM, it is necessary to identify suitable evaluation instruments for CIM training, which, in the best case, can be used across faculties. Quantitative instruments, such as closed-ended questionnaires, allow the measurement of antecedents, learning gains, and impact on patient care, for example, by identifying differences and correlations. The use of validated instruments can significantly increase the reliability and validity of the data collected. Qualitative tools, such as interview guidelines, can be used to explore influencing factors and underlying patterns and attitudes, even in small groups. 20
Objectives
This study aims to investigate what qualitative and quantitative instruments are available to evaluate CIM courses and programs for undergraduate medical education and to assess the quality of validated instruments.
Materials and Methods
The review was conducted by members of the Committee for Integrative Medicine and Pluralism of Perspectives of the German Association for Medical Education in cooperation with the Forum of University Working Groups on Naturopathic Treatment and Complementary Medicine. The authors conducted a systematic review in accordance with the Preferred Reporting Items for abstract Systematic reviews and Meta-Analyses (PRISMA) 2020 Checklist for abstracts (see Supplementary Table S1) and the guidelines detailed in the PRISMA checklist (see Supplementary Table S2). 21 The review protocol was prospectively registered with PROSPERO, number CRD42022354606.
Data sources and search strategy
The following databases were searched: PubMed/MEDLINE, LIVIVO, CINAHL/EBSCO, Scopus, Web of Science, and Ovid/Embase on January 5, 2023. The search term was composed of the following elements: “undergraduate education,” “complementary medicine,” and “questionnaire,” and was adapted to each database. The search strategies are listed in Supplementary Table S3. The findings were imported into the Rayyan research collaboration platform (https://www.rayyan.ai/). Duplicates were deleted.
Eligibility criteria
All studies that met the following criteria were included: The study describes or provides an instrument for the evaluation of CIM training or CIM aspects. The instrument described may be either qualitative (e.g., an interview guide) or quantitative (e.g., a questionnaire) or both. The target population of the instrument is undergraduate medical students or interprofessional student groups that include medical students.
Studies describing or providing instruments for examinations or accreditation procedures were excluded, as were studies in which the instrument described could not be sufficiently reconstructed (not all questions listed, measurement method not specified) or they were only abstract publications or study protocols. There were no restrictions on the types of studies included.
In the review protocol, we had originally specified that only studies directly related to CIM training would be included. However, at the beginning of the search process, we realized that this meant that psychometric studies and instruments for recording current training or knowledge would not be included, although they might be suitable for use in teaching evaluation, and we therefore changed this specification.
Selection of studies
The screening and selection of studies took place in two steps. In the first step, the titles and abstracts were checked for suitability. Studies were sorted alphabetically by first author. The first 20 studies were reviewed by all researchers (A.H., B.S.-S., G.R., K.F., M.T.) in a blinded mode. Any conflicting reviews were then discussed by the whole group in order to refine the previously defined screening strategy. The remaining studies were screened blindly by at least two researchers each (title and abstract). Any discrepancies and/or conflicts were resolved through discussion by the whole group. If discrepancies remained, the study was retained. In a second step, all remaining studies were included in the full-text screening. Two researchers independently screened each full text to decide on final inclusion or exclusion in the review. Any discrepancies and/or conflicts were resolved through discussion in the research group meetings. To check the objectivity of the rating in both steps, interrater reliability was calculated using Cohen’s kappa. 22
Foreign language (not German, English, or Spanish) studies were translated into English using DeepL Pro, and a person with the relevant language skills was consulted if there were any uncertainties.
Data extraction
A Microsoft Excel spreadsheet was created containing the following information: Identification/demographic information (DOI, first author) Study information (country, study design, sample size) Educational intervention, if indicated (educational approach, duration) Instrument information (approach, target group, subject/domains, if indicated: the name of the questionnaire) Quality of the instrument (development process, validation status) Outcome measures (interest in learning, personal experiences, learning experiences, didactic issues, Kirkpatrick levels of learning outcome,
23
individual benefits)
The following rules were defined to specify the outcome measures prior to data extraction: Interest in learning: Respondents are asked if they are generally interested in the subject of CIM or if they would like to be taught in this subject area. Personal experience: Respondents are asked if they have any experience with CIM, for example, if they have visited alternative practitioners or know CIM applications from their family or use them for self-treatment. Learning experience: Respondents are asked if they have received CIM training, what was taught and how much. Didactic issues: Respondents are asked about CIM teaching methods and questions about specific delivery or the choice of content, for example, whether they prefer particular teaching methods or formats or content, or whether they find the program useful at a particular stage in their studies. Reaction or satisfaction (Kirkpatrick level 1): Respondents are asked how satisfied they are with the CIM training and how good they think the training was (only if a course was actually offered). Attitudes or perceptions (Kirkpatrick level 2a): Respondents are asked about their attitudes, opinions, and evaluations of CIM in general or of individual procedures. Knowledge and skills (Kirkpatrick level 2b): Respondents assess their own knowledge or skills, answer knowledge questions, or are observed performing. Behavioral change or intention to change (Kirkpatrick level 3): Respondents are asked if they are already using what they have learned, or if they expect to use it in the future or if they think it will benefit them in their professional practice. Results (Kirkpatrick level 4): Captures changes in patient care, for example, in health or well-being organizational change, for example, in work units or institutions, or community change. Individual benefits: Respondents are asked whether they feel that they have personally benefited and learned something for their own health or personal development.
Data extraction was carried out by one researcher. A second researcher performed a quality control of extracted data. All missing data were marked as such in the Excel spreadsheet.
Data analysis and synthesis
The identified studies are divided into the following three categories: Studies with validated instruments, studies with nonvalidated instruments, and studies with qualitative instruments. An instrument is considered validated if at least one psychometric test has been carried out. All studies and the instruments described in them are analyzed using descriptive statistics and summarized in a table. In addition, the data extracted from each study are also summarized in tables. As we do not analyze the data collected in each study as such, we do not measure the effects or synthesize these results.
Quality assessment
As we refer to the instruments provided, the quality of the studies themselves and the data collected in them will not be subject to any structured quality assessment, and no reporting or safety assessment will be carried out in relation to the data collected. Instead, we include instruments from the category “validated” in the quality assessment using the following established criteria for measure properties for questionnaires: content, criterion and construct validity, internal consistency, reproducibility, responsiveness, floor/ceiling effects, and interpretability. 24 Our approach followed the steps and scoring strategy described by Orfanos et al. 25 Each item within the criterion was assessed by two researchers (A.H., B.S.-S.) for adequate design and statistical reporting. If the criteria for a particular psychometric property were met, a score of 2 was assigned. If the design of the criterion was questionable, if there was no clear description of the aspects outlined, or if the criteria for a particular psychometric property were only partially met, a score of 1 was assigned. If no information was provided for a particular criterion, or if the criterion was explicitly not met, a score of 0 was given. The total score for each measure and each category of quality criteria is given for all measures. The two researchers initially evaluated each criterion independently. If they disagreed, an agreement was reached through argumentative discussion. The results are tabulated including the total score achieved.
Results
Search process and study selection
Our electronic search identified a total of 2329 records. After reviewing the titles and abstracts, we identified 263 potentially relevant studies. We obtained two further studies from the reference lists of the identified studies and finally included 265 studies in the full-text screening, resulting in 100 included studies. Thirty-five studies described an evaluation instrument, but lacked information on the type, sequence, or completeness of the questions and could therefore not be fully reconstructed. These studies were excluded. The search process is shown in Figure 1.

Flowchart. Adapted from Page et al. 21
Interrater reliability was at the level of substantial agreement in both steps (step 1: kappa = 0.671, standard error 0.025, 95% confidence interval [0.623–0.72]; step 2: kappa = 0.657, standard error 0.043, 95% confidence interval [0.572–0.742]).
A total of 14 different instruments were included in the 27 studies that reported validated instruments. Seven of the included studies reported a qualitative instrument, whereas 66 described nonvalidated instruments.
Characteristics of the included studies
From the 100 studies identified, 31 were conducted in the United States and 28 in European countries. Fifty-one studies were cross-sectional studies and 42 refer to an educational intervention. Fourteen studies used a single-arm pre–post design, three studies included a control group, and four were randomized. The studies that included other students as well as medical students mostly included students from other health professions, such as pharmacy or nursing. Exceptions are the studies by Iktidar 2022 and Xie 2020, which also included students who had not studied medicine, health care, or any other patient-related subjects. 26,27 Of the 10 studies that include more than just students, some included teachers 28 –33 or professionals, 34 –36 and one included patients. 37
The majority of the quantitative studies (n = 48 out of 93) were conducted with 100–499 participants. The four large studies with 1784, 38 2292, 27 2004, 39 and 2839 40 participants were national surveys. International studies were conducted by Schmidt et al. 2005 and Rees et al. 2009. 41,42
Table 1 provides an overview of the characteristics of the included studies.
Characteristics of the Included Studies
Multiple assignments possible.
CT, controlled trials; RT, randomized trial; RCT, randomized controlled trial.
Characteristic of the evaluation instruments
Half of the authors (50 out of 100) stated that the instruments had been developed by themselves; 35 based on previously published questionnaires. Five studies with validated instruments also consulted a panel of experts, 35,38,43 –45 of the other studies it was indicated that the process was iterative 46,47 with internal revision 48 or with a multidisciplinary team. 49
Most of the identified instruments related to CIM in general. Some combined CIM with other aspects, such as health issues or conventional medicine. Evaluation instruments on specific topics dealt with, for example, osteopathy, Chinese Medicine, kampo medicine, or acupuncture.
Most of the instruments, especially the validated ones, measure students’ attitudes (Kirkpatrick level 2a), for example, their attitudes toward specific therapeutic methods and concepts in relation to CIM in general. None of the validated questionnaires measured Kirkpatrick level 1. This is mainly because there are no intervention studies in this category and therefore reaction to or satisfaction with an educational intervention could not be measured. In contrast, of the seven studies describing qualitative instruments, five are intervention studies, four of which measure satisfaction with the intervention. 50 –53 Kirkpatrick level 2b (knowledge and skills) was usually assessed by subjective self-assessment of the students in relation to their existing knowledge or knowledge acquisition. Only the unvalidated questionnaire by Karpa et al. 54 asked objective questions about CIM knowledge before and after the training program. Kirkpatrick level 3 (behavioral change) was also addressed more frequently in the qualitative instruments and referred mainly to the intention to change rather than actual changes. For example, the validated Furnam et al. 1999 questionnaire 45 included questions about treatment preferences. Other validated questionnaires also asked Kirkpatrick level 3 questions, that is, outside the validated range, to assess intended behavioral change, for example, whether students would recommend CIM methods or think they would use the knowledge gained in their future careers. 27,34,43,55,56 Similarly, five of the seven qualitative instruments only assessed whether students intended to recommend or practice CIM, or thought the gained knowledge would contribute to their work as physicians. 46,51 –53,57 None of the instruments identified measured Kirkpatrick level 4 (results). Overall, the qualitative studies are the most comprehensive in terms of outcome measurement.
An overview of the characteristics of the evaluation instruments is given in Table 2.
Characteristics of the Evaluation Instruments
Data in brackets: additional questions, not within the validated scale.
As stated by the authors, multiple assignments possible.
CAM/CIM, complementary and alternative medicine/complementary and integrative medicine.
Table 3 lists the validated instruments, Table 4 lists the qualitative instruments. The nonvalidated instruments are provided in Supplementary Table S4, instruments regarding CIM training in general, and Supplementary Table S5, instruments for specific topics in CIM training.
Validated Evaluation Instruments for Complementary and Integrative Medicine Training
Extended: additional questions were added.
Area: as stated by the authors, areas in brackets not within a validated scale.
Data in brackets: additional questions, not within a validated scale.
Learning outcome described with Kirkpatrick level: 1 = reaction, satisfaction, 2a = attitude, 2b = knowledge and skills, 3 = behavioral change or intention to change.
+ qual: additional open questions.
CAIMAQ, Complementary Alternative and Integrative Medicine Attitude Questionnaire; CAM, complementary and alternative medicine; CHBQ, CAM Health Belief Questionnaire; FAN, Freiburg Questionnaire on Attitudes toward Alternative Medical Procedures; HCAMQ, Holistic Complementary and Alternative Medicine Attitude Scale; IAPSU, Instrumento de Avaliação da Promoçao da Saúde na Universidade; IM, Integrative Medizine; IMAQ, Integrative Medicine Attitude Questionnaire; n.r., not reported; TCM, Traditional Chinese Medicine; QACAM, Questionnaire on Attitudes toward Complementary Medical Treatment; QAPT, Questionnaire on Attitudes Toward the Use of Psychotherapeutic Help.
Qualitative Evaluation Instruments for Complementary and Integrative Medicine Training
Area as stated by the authors.
Kirkpatrick level 1 = reaction, satisfaction, 2a = attitude, 2b = knowledge and skills, 3 = behavioral change or intention to change.
CAM, complementary and alternative medicine; CIM, complementary and integrative medicine; MBM, mind body medicine; n.r., not reported; TCM, Traditional Chinese Medicine.
Quality of the validated evaluation instruments
For the assessment of the quality of the validated instruments using a standardized list, 24 the following two instruments were not included because they used a validated instrument from another study without revalidation, which was not originally aimed at medical students: Kirsoy 66 used the English-language questionnaire developed by Hyland, 67 which has been validated for interviewing patients, for medical students in Turkish. Teixeira et al. 44 used the Assessment Tool for Health Promotion at the University (IAPSU) 68 without revalidation. This questionnaire is self-administered and used by Brazilian researchers after data collection in nursing and physiotherapy courses.
The analysis of the 12 included instruments showed that only two achieved more than half of the total number of points. 35,77 While most of the instruments scored well in terms of content validity and internal consistency and had no floor/ceiling effects, there were considerable shortcomings in the areas of criterion validity, reproducibility, and responsiveness. The results of the analysis are shown in Supplementary Table S6. The lack of retesting is also evident in the translated instruments. The validated Complementary and Alternative Medicine Health Belief Questionnaire (CHBQ) 61 was translated into Danish, Serbian, Chinese, and German. The German translation was revalidated by Hinse et al. 65 using exploratory factor analysis, score, and split half reliability, but no further validation was carried out for the other translated questionnaires. Even when untranslated questionnaires were used repeatedly, they were often not revalidated, even though some questions were added and/or the target group was changed. 29,43,58,63,69
Discussion
The aim of this systematic review was to provide an overview of the quantitative and qualitative instruments available to evaluate CIM courses and programs in undergraduate medical education and to assess the quality of the validated instruments. A total of 100 studies were identified that included currently available evaluation instruments. Of these, 14 were validated, 7 were qualitative, and 66 were nonvalidated, primarily quantitative. Twelve validated original instruments were assessed in detail for quality. Both the low proportion of validated instruments overall and the results of the quality assessment show that there is a paucity of high-quality instruments.
Limitation
Due to database restrictions our results may be biased. We primarily searched English-language databases, so it is possible that additional instruments exist in other regions or have not been published. This may also explain the preponderance of studies from English-speaking countries. However, it is notable that there are only a few publications from Australia and Canada, and we found relatively few duplicates in five databases, despite similar search strategies. It is possible that the inclusion of other databases would have resulted in even more hits. When selecting the literature, there were many studies that did not include the complete evaluation tool or did not provide it as an appendix. In this case, the individual parts of the tool were reconstructed from the article, in particular from the tables and figures. It is possible that some aspects were overlooked. Many of the instruments were not primarily used to evaluate teaching. Their suitability for this purpose should be further investigated. In assessing the quality of the instruments, we made only a limited assessment of the quality of the underlying studies. This could have added further aspects to the quality assessment.
Study design
The studies identified are mainly cross-sectional studies that have been used to measure attitudes. Kirkpatrick emphasizes that the assessment of attitudes, knowledge, and behavioral changes requires a pre–post design and the use of a control group. 23 None of the studies met these criteria. Some of the studies were conducted as national surveys with a very large number of respondents. The aim of these studies was to determine the baseline situation rather than to evaluate a course. Nevertheless, we assume that the instruments could also be used for a pre–post study. In the case of the many nonvalidated cross-sectional and intervention studies with more than 100 participants, the opportunity for validation was missed.
Outcome measures of the instruments
Most of the instruments identified focused on assessing students’ attitudes and perceptions, and none of the validated instruments captured actual changes in behavior or patient care. This finding is consistent with other studies that have examined outcomes in CIM education. For example, Quartley et al. 14 also found a lack of studies that objectively assessed behavioral change and longer term objective outcomes. Soliman and Bilszta 12 noted that the instruments used mainly assessed student satisfaction, learning success through self-assessment, but rarely effectiveness in terms of changing clinical practice. The qualitative instruments included in this review covered the widest range of outcomes. In general, qualitative research aims to understand the full experiences of research participants, 82 so qualitative evaluation can provide insight into a fuller understanding of the educational intervention, including leading to a wider range of targeted outcomes. Qualitative research is often criticized for a lack of validity or rigour. 82,83 However, qualitative evaluation methods have been found to use a number of best practices through the use of direct data sources, structured data collection instruments, nonleading questions, and expert evaluators. 82 The analysis of best practices in qualitative evaluation in ethics education revealed that other best practices were missing: Data should be collected using multiple sources, methods, raters, and time points, and triangulation analyses should be used to assess convergence. 82 These weaknesses need to be addressed by qualitative evaluation methods in CIM education to include multiple sources (students, patients or standardized patients, faculty), multiple methods (e.g., free-text options in questionnaires, short interviews, focus groups), multiple raters (e.g., faculty, examiners, students themselves), and time points (e.g., mid-course, at the end of the course, months or years later). Qualitative evaluation methods make an important contribution to identifying the advantages and disadvantages of different educational interventions. 84 However, evaluation methods need to be chosen according to their appropriateness for the purpose, particularly in relation to the resources available and the intended use. 85
Quality assessment
The quality assessment of the 12 validated questionnaires showed that most of them were of poor to medium quality. Relatively high scores were obtained for internal consistency. This may be because only a small proportion of the total number of instruments were categorized as validated and therefore included in this quality assessment. The lack of criterion validity, reproducibility, responsiveness, and interpretability may indicate that even many of the validated instruments described were developed to assess the quality of individual courses at the respective medical faculty and therefore do not claim general validity. The focus of some studies may not have been primarily on developing a valid instrument that could be used universally. As a result, there are few gold standards, group comparisons, or references to the general interpretation of the results. This would be necessary if instruments are to be used in medical education research to provide cross-cutting evidence, for example, by comparing the effectiveness of different educational interventions. Other systematic reviews evaluating questionnaires also show low scores, possibly for the same reason. For example, Orfanos et al. 25 used the same quality assessment tool 24 to rate the quality of eight therapeutic group process questionnaires. Overall scores ranged from 2 to 6 out of 18, with an average score of 3. Quartey et al. conducted a systematic review of CIM training for health professionals 14 and assessed the overall quality of the 10 relevant studies identified using the Medical Education Research Study Quality Instrument. 86 They found only moderate study quality, with the lowest mean scores for validity of evaluation tools and sampling. Another reason may be that further validation of some instruments is planned, but has not yet been carried out or published. However, the validation process of some of the validated instruments we identified was not continued in further applications in the assessment of CIM teaching. 11,12 Zhao et al. 87 used an instrument designed for the assessment of cross-sectional studies for their systematic review on CIM teaching among nursing students. The assessment tool did not include items to assess the survey instrument. However, in all 26 included studies also no follow-up was obtained.
Almost all identified instruments were originally developed and published in English. Some of the validated instruments were translated. Translation requires revalidation, 88 which has only be done in the German version of the CHBQ. 65 Validation in general is not about properties that can be determined once and for all; even a series of studies cannot achieve this. Reliability, and especially validity, are incremental, never-ending processes, as scales are continually used with different groups of people and in different settings, and their psychometric properties need to be established with them. 89 It is therefore important for researchers and teachers to embark on this journey and to develop their instruments in a theory-based and informed way. The evaluation instruments identified in our review were often not used a second time, and when they were, they were not revalidated despite other target groups and modifications.
Future research
Based on our findings, future research should develop high-quality and comprehensive instruments for evaluating undergraduate medical education in CIM. The currently validated questionnaires, which mainly capture general attitudes toward CIM, contribute little to the development of CIM teaching. However, they can help to identify learning needs. Currently, the courses offered in the faculties vary widely in scope and content, so it is important to find out which teaching methods and content are useful and actually contribute to students’ skills development and improve patient care. This will require more intervention studies, ideally with a before and after survey and follow-up. To demonstrate the benefits of CIM teaching, it is necessary to include questions which not only cover the intended change in behavior but also the actual change in behavior and patient care outcomes. In addition, comparative studies could be conducted to examine the effects of different didactic implementations and settings. This also requires researchers to look beyond their own teaching responsibilities and (further) develop and validate instruments that can be used across faculties. Structured multiple use of these newly developed instruments could contribute to a deeper knowledge of CIM teaching programs and their impact on student competencies and patient care.
Conclusions
This systematic review shows that there are still no reliable, comprehensive instruments for the evaluation of CIM programs. Although many instruments have been identified that can be used in undergraduate medical education, there are still no validated instruments to assess higher learning outcomes such as behavioral changes. The structured tabulation of the 100 identified studies with evaluation instruments and the identification of existing gaps is a first milestone for the structured development of high-quality and comprehensive questionnaires. The identified qualitative instruments cover the broadest range of outcomes and could be used for the development of new instruments. A systematic evaluation of CIM training and its impact is necessary to further develop CIM training and to demonstrate what works for whom and under what conditions.
Footnotes
Acknowledgment
The authors would like to thank Katrin Schüttpelz-Brauns for her methodological advice on conducting the systematic review.
Authors’ Contributions
A.H.: Conceptualization, data curation, investigation, methodology, validation, visualization, and writing—original draft. G.R.: Data curation, investigation, methodology, and writing—review and editing. M.T.: Data curation, investigation, methodology, and writing—review and editing. K.F.: Data curation, investigation, methodology, and writing—review and editing. B.S.-S.: Conceptualization, investigation, methodology, supervision, validation, and writing—review and editing.
Author Disclosure Statement
The authors declare that they have no conflicts of interest.
Funding Information
Language editing was supported by the German Association for Medical Education. The authors received no funding.
Supplementary Material
Supplementary Table SA1
Supplementary Table SA2
Supplementary Table SA3
Supplementary Table SA4
Supplementary Table SA5
Supplementary Table SA6
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
