Abstract
Multisource feedback (MSF) is widely used in performance management and leadership development, yet there is still no consensus on whether it predicts improvements in employee performance. This study presents a protocol for a systematic review and meta-analysis investigating whether the use of person-moderated MSF predicts changes in individual work performance. The review is guided by an integrative framework drawing on feedback intervention theory and existing MSF models. A systematic search and meta-analytic synthesis of the empirical literature will be conducted, and potential moderators of the MSF–performance relationship will be examined to identify under what conditions effects occur. By clarifying whether and when MSF contributes to performance improvement, the study aims to provide evidence-based insights for researchers and practitioners designing feedback systems in organizations.
The review will be guided by the following questions: 1. Does the use of MSF predict changes in work performance? 2. Do certain moderators affect the relationship between MSF and changes in work performance? The potential moderators are: • The number of feedback sources • The characteristic of feedback (positive vs. negative vs. evenly balanced) • Employee perceptions of and reactions to feedback • Psychological individual differences • Follow-up activities • The intended purpose of MSF • The format of feedback • The delivery mode of feedback • Socio-cultural individual differences
Keywords
Background
Description of the Condition
Executives in organizations seek ways to maintain and improve their employees’ performance. One of them is providing employees with feedback on how much progress they have made toward the goals they are supposed to accomplish. Some organizations try to enrich this feedback by offering information from multiple sources: supervisors, co-workers, subordinates, clients, etc. Multisource feedback (MSF), also known as 360-degree feedback, multi-rater feedback or full-circle feedback, gained popularity in 1990s and nowadays it appears to be a common and well-consolidated management practice (Bracken et al., 2016; Church et al., 2019; Lawrence & Bachkirova, 2023; Lobo Moreno et al., 2021). However, estimates on its prevalence vary substantially. For example, survey research has variously found MSF to be used by 34% of HR leaders in the UK (CIPD. Chartered Institute of Personnel and Development, 2016), a third of US companies (Bracken et al., 2001), “up to 50% of medium and large organizations” in the US (Silverman et al., 2005) and up to 90% of Fortune 500 companies (Alexander, 2006; Edwards & Ewen, 1996). Despite different estimates, several sources confirm an increase of use of MSF. For instance, according to 3D Group, the use of MSF grown from 27% in 2003, to 48% in 2013 (as cited in Bracken et al., 2016). The use of MSF seems to intensify with the recent rapid technological development. Sources claim that the current MSF software market of 1,202.30 billion USD is expected to double in the next eight years, reaching 2,539.99 billion USD by 2033 (Straits Research, as cited in 9cv9, 2025).
The management literature contains differing views on value that MSF provides. Some popular commentators recommend it as a tool that can make employees feel they have a say in the managing process, help managers enhance their skills, and lead to overall improved performance (e.g., Arruda, 2023; Forbes Coaches Council, 2017; Kaplan, 2011; Zenger, 2018). Others criticize its subjective nature and difficult implementation and claim that, when not used properly, MSF may do more harm than good (e.g., Buckingham, 2011; Mauer, 2025; Ryan, 2015). According to recent surveys, also the employee attitudes towards MSF appear rather negative. For instance, LiveCareer reports that 74% of employees that received MSF felt that it was “unfair, biased or inaccurate”, and 79% would choose not to participate in MSF practice if they the option (Escalera, 2025).
The academic sources on MSF also indicate that its results are not straightforward, but depend on a number of moderators, possibly explaining some of the contradictory opinions published. One source of complexity lies in the diverse purposes for which MSF is used, which are typically categorized as either developmental or decision-making in nature (Church et al., 2019). Subsequent research has further explored the contextual and boundary conditions under which MSF is effective, including organizational culture, feedback orientation, leadership level, rater composition, and the availability of coaching or follow-up interventions. As a result, the MSF literature has expanded to cover a wide range of specific topics, such as psychometric properties of MSF instruments, operationalization choices, rater training and preparation, and the use of MSF across different organizational and cultural contexts (Church et al., 2019).
Despite these advances, the impact of MSF on work performance remains unclear. Divergent outcomes associated with MSF can be attributed to variation in how MSF is conceptualized and defined, insufficient clarity regarding its intended purpose, misalignment between purpose and decision-making, differences in implementation practices – including different measurement approaches – and a lack of accountability mechanisms (Bracken et al., 2016; London & Smither, 2019). Furthermore, research on MSF has often been characterized by overgeneralizations and misinterpretations driven by methodological limitations and the inappropriate use of prior studies. One example discussed by Bracken et al. (2016) is Kluger and DeNisi’s (1996) meta-analysis of general feedback interventions, which was not designed to examine MSF yet is often invoked as evidence for its (in)effectiveness. Bracken et al. (2016) also challenge previous meta-analyses, such as the one by Smither et al. (2005), because of methodological decisions such as combining studies with different methodological designs. These issues complicate the interpretation of MSF’s outcomes, potentially explaining the contradictory conclusions found in both scholarly and practitioner-oriented sources.
Given these ambiguous outcomes and the complexity of the problem, the need for a rigorous meta-analysis on the relationship between MSF and performance becomes particularly salient. MSF is widely used in contemporary organizations, and the market for MSF tools and platforms continues to grow, reflecting sustained managerial interest and significant organizational investment, including software platforms, administrative resources, feedback processes, and coaching activities. It is therefore important to understand whether these investments translate into measurable performance improvements and to clarify what performance-related outcomes organizations can realistically expect from MSF, and under what conditions such outcomes are more likely to emerge. Notably, as early as 2016, Bracken et al. explicitly called for more meta-analyses addressing specific MSF-related questions, arguing that such work is essential for moving the field beyond polarized debates and anecdotal claims, however, such studies remain scarce. A systematic review and meta-analysis can therefore play a critical role in integrating fragmented findings and identifying key moderators that explain when and how MSF contributes to performance.
Description of the Intervention
MSF is a process that attempts to provide feedback to individuals (‘ratees’) from multiple feedback providers (‘raters’). The MSF process frequently starts with data collected through a multi-item survey. Raters are asked to evaluate ratees either quantitatively (e.g. on a Likert scale) or qualitatively (i.e. open text comments). The data are then aggregated and presented in a report, that may include evaluative and descriptive components. The report is provided to the ratee through different delivery formats, such as a written or digital report (e.g., email or online platform), or through facilitated feedback sessions involving a coach, supervisor, or workshop. In addition to differences in delivery mode, MSF systems also vary in expectations regarding how the feedback should be used. In some cases, the report is primarily informational, whereas in other recipients are expected to discuss the results, develop action plans, or integrate the feedback into development or performance management processes. The feedback may therefore serve developmental purposes (e.g., informing learning goals), administrative purposes (e.g., supporting personnel decisions), or both. The process may conclude with a follow-up review or discussion intended to evaluate progress or the usefulness of the feedback.
In the literature, there is no single definition of MSF. The main differences refer to the source characteristics and number. For example, 360-degree feedback typically requires ratings of a manager from the key constituencies representing the full circle of relevant viewpoints – subordinates, peers, supervisors (possibly including higher-level supervisors along with the direct supervisor), customers and suppliers who may be internal or external to the organization, and self-ratings. On the other hand, upward feedback, which according to several authors (e.g., Smither et al., 2005) is also considered a form of MSF, is based on evaluations from multiple subordinates. However, regardless of differences in defining and applying the MSF, this intervention must contain the following elements: a. Evaluation of ratee’s behavior or performance, b. Evaluation from at least two different sources (people), c. The action of sharing the evaluation results with the ratee.
In the section “Methodology: Criteria for including and excluding studies” we describe more in detail what definition of MSF intervention is used in the current review.
How the Intervention Might Work
Feedback Intervention Theory
The theory of MSF draws on the broader theory of feedback intervention developed by Kluger and DeNisi (1996) is useful. This theory lays the ground for how feedback works, in general, to bring about a change in several outcomes, including performance. The theory states that the start of the feedback process is the cues present in the feedback message, which prompt self-regulation (Baumeister et al., 1994; Carver & Scheier, 1981). Depending on the situation and on the recipient’s developmental needs, feedback cues can direct recipients’ attention towards action according to one of three levels of regulation processes (Kluger & DeNisi, 1996): a. Meta-task processes which focus on rateesʼ core behaviors and psychology: these processes link the feedback cues to goals that are higher level for the recipient’s self than the task that prompted the feedback, such as maintaining self-esteem or control. The results can include a focus on self or affective processes, but the result is that fewer attentional resources are available for solving the task. Feedback cues that may activate meta-task processes include normative information, discouraging/praising messages, or feedback messages received from a human (versus via a computer interface). b. Task-motivation processes that compare the feedback cues to the standards for performance and identify any discrepancies. Based on the result, ratees may increase their effort (for negative discrepancies) or decrease it (for positive discrepancies). Furthermore, these processes monitor what happens as a result of the effort adjustment, and further maintain or adjust its levels. An example of cues that may activate task-motivation processes is a message about the progress regarding performance on a previous task. c. Task-learning processes that link the feedback cues to details of the task at hand, creating hypotheses about how to better perform the task and trying them out while monitoring the results. If results are as expected, then the hypothesis is confirmed and a new strategy for the task at hand has been created. Feedback cues that activate these processes are focused on the components or details of the task at hand.
According to feedback intervention theory, meta-task processes lead to decreases in performance, because they divert the attention away from the task. On the other hand, task-motivation processes and task-learning processes both increase performance. However, these links depend on moderators, such as task characteristics. If the task is known to the feedback recipient and therefore can be correctly executed with few attentional resources, then meta-task processes might actually improve performance. Figure 1 presents a schematic overview of the feedback intervention theory. Reproduced from Kluger & DeNisi (1996); Figure 5 A schematic overview of the feedback intervention (FI) theory
In the case of MSF, feedback intervention theory is useful in understanding what might be the mechanism behind action regulation following feedback delivery. The fact that feedback cues come from several sources can introduce other factors in interpreting these cues and activating subsequent action regulation processes. Some authors have developed theoretical models specific to MSF to explain its possible links to performance. We will describe one such model below.
Theoretical Model for Understanding Performance Improvement Following Multisource Feedback
The central assumption of MSF is that aggregated information from several raters will result in a more accurate and/or fuller representation of one’s work behavior or performance than feedback from a single rater, and therefore, will lead to superior improvements in performance. In this review, we focus on the extent to which it predict change in performance as an intervention in its own right. We do not examine the accuracy of MSF as a measurement tool or gauge of performance.
Various attempts have been made to understand better the processes underlying the MSF-performance relationship. Some research suggests that the impact of MSF on behavior change or performance improvement is not direct; instead, there are certain factors that may influence the orientation and intensity of this relation. For example, Smither et al. (2005) proposed a theoretical model to explain under what conditions MSF may improve performance (Figure 2). The model organizes existing research that attempts to explain the MSF-performance relationship. According to the authors, eight aspects of MSF are relevant for the extent of behavior improvement: characteristics of the feedback, initial reactions to feedback, ratee personality, feedback orientation, perceived need for change, beliefs about change, goal setting and taking action. Reproduced from Smither et al. (2005; Figure 1: Theoretical Model for Understanding Performance Improvement Following Multisource Feedback.)
The first element of the model is the characteristics of feedback (i.e., positive or negative feedback), which affect the ratee’s initial reaction (positive or negative) to the received feedback. Reaction to feedback, in turn, influences the goals that the ratee sets. For example, Kluger and Denisi’s meta-analysis (1996) demonstrated that strongly negative feedback which threatens the ratee’s self-esteem and drives negative reactions, may be rejected or may lead ratees to abandon their goals. Goal setting has an impact on taking actions, which determine changes in ratees’ performance. In other words, people who have clearly set goals are more likely to change their behavior to improve performance (Locke & Latham, 1990). Additionally, reactions to feedback, goal setting, and action-taking are influenced by ratees’ personality and feedback orientation. For example, personality variables as conscientiousness, and openness to experience were found to be positively related to setting and attaining goals after receiving feedback, and to performance improvement following feedback (e.g., Dominick et al., 2004). Moreover, people who are feedback-oriented (i.e., have a predisposition to seek and use feedback; London & Smither, 2002), show greater acceptance of feedback they receive (Rutkowski et al., 2004). Goal setting and action-taking depend also on beliefs about change and need for change presented by the ratee. Specifically, beliefs that the change in behavior is possible and will result in some positive outcomes, as well as a perception that the change is necessary, are suggested to enhance the likelihood of setting performance improvement goals and action-taking (Smither et al., 2005).
Theory-Based Moderators
Based on the theoretical model by Smither et al. (2005) we propose the following list of moderators:
We complement the list of potential moderators with factors commonly discussed in the relevant literature as shaping the outcomes of MSF:
Why It Is Important to Do This Review
Several systematic reviews and meta-analyses relevant for the topic have been conducted. Among them, we can distinguish the studies which focus on MSF as a predictor of change in performance, and the studies that examine use/feasibility and psychometric properties of MSF. 1. Reviews on MSF as a predictor of performance outcomes:
Smither et al. (2005) published a systematic review and meta-analysis of 24 longitudinal studies on the effects of MSF on performance improvement. They found that improvement in direct report, peer ratings, and supervisor ratings over time is generally small (i.e., there was little variation in these measures over time). The review lacks explicit and transparent quality appraisal of primary studies and does not address the risk of publication bias. The authors suggested further research could investigate “under what conditions and for whom is multisource feedback most beneficial”.
Ferguson et al. (2014) developed a systematic review following PRISMA guidelines. They included 16 studies published up to November 2012, involving physicians in healthcare settings from Canada, the UK, the Netherlands, and the US. The review aimed to assess the impact of MSF on the professional practice of medical doctors and ascertain under what conditions MSF is most successful. The review found that MSF can lead to improved performance of medical doctors, contingent upon moderators (e.g., the credibility of raters, factors affecting acceptance of feedback, facilitation of the feedback restitution). Overall, the findings confirm MSF as a promising intervention to improve in-role competence of physicians in healthcare settings. The main limitations of the study were an exclusive focus on healthcare settings, risk of publication bias, and design which does not allow to investigate MSF as a predictor of change in performance. 2. Reviews that examine use/feasibility and psychometric properties of MSF:
We also note systematic reviews on the use and/or psychometric properties of MSF. This is not the focus of the proposed review, but it is a crucial assumption in our research questions. Al Khalifa et al. (2013) conducted a systematic review of 8 studies, published in English between 1975 and 2012, which aimed to describe the use and psychometric characteristics of MSF in healthcare settings. The review had an exclusive focus on surgeons in healthcare settings, contained a risk of publication bias (only published research in English was sought), and has limited external validity beyond this context. Overall, the study found that MSF is a feasible, reliable, and valid means of assessing surgeons on a broad range of soft skills.
Donnon et al. (2014) produced a systematic review with the purpose of investigating the reliability, generalizability, validity, and feasibility of MSF for the assessment of physicians. Forty-three English-language articles were included. The review’s results indicated that MSF was a valid method for providing feedback to physicians from a multitude of specialties about their clinical and nonclinical (i.e., professionalism, communication, interpersonal relationship, management) performance. Some limitations of the study were an exclusive focus on healthcare settings and the risk of publication bias.
Certainly, the above-mentioned research has made a significant contribution to a better understanding of changes in performance, following MSF intervention. Nevertheless, the identified meta-analyses and systematic reviews have serious limitations. First, the methodological quality of the included primary studies was often not assessed. Second, the existing reviews often did not address the risk of publication bias. Third, the context of the research was limited – three out of four identified studies were conducted in a healthcare setting. The review of Smither et al. (2005), which addressed a broader organizational context was published more than fifteen years ago. Thus, a review that would include state-of-the-art evidence on MSF and performance in an organizational context is needed.
The current review aims to investigate whether MSF predicts changes in work performance. The research will include evidence from broad organizational contexts and apply a high-quality, standardized procedure for a systematic review with meta-analysis, following the Campbell standards. Moreover, we will explore what moderators may enhance or diminish the relationship between MSF and work performance. We trust that the results of our research will give relevant and practical insights to practitioners, and will help them make better-informed and more accurate decisions on MSF use in their workplace.
Objectives
This systematic review aims to investigate whether the use of person-moderated MSF predicts work performance. The review will be guided by the following questions: 1. Does the use of MSF predict changes in work performance? 2. Do certain moderators affect the relationship between MSF and changes in work performance? The potential moderators are: • The number of feedback sources • The characteristic of feedback (positive vs. negative) • Employee perceptions of and reactions to feedback • Psychological individual differences • Follow-up activities • The intended purpose of MSF • The format of feedback • The delivery mode of feedback • Socio-cultural individual differences
Methods
Criteria for Considering Studies for This Review
Types of Studies
This systematic review focuses on MSF as a predictor of change in performance. For this reason, the designs of the included studies need to be adequate to establish a time-order relationship between variables. Studies must thus measure ratees’ performance at least two points in time, with at least one of these performance measures obtained before MSF and another obtained afterwards.
We include intervention studies that contain relevant control groups (either single source feedback or no feedback) to compare with the MSF treatment, as well as observational studies that have no comparison groups. Studies that meet these criteria could include: • Experimental designs: randomized controlled trials • Quasi-experimental designs: non-randomized controlled studies, non-controlled before-after studies, and interrupted time series.
Whereas authors emphasize the importance of separating the performance measure from MSF evaluation (e.g., Bracken et al., 2016) for intervention studies comparing MSF with control conditions, complete independence between the feedback intervention and the outcome measure is rarely achievable because the feedback typically derives from the same MSF instrument used to assess change. Therefore, rather than requiring strict independence, we included studies where performance outcomes were measured using the same MSF instrument across time, provided that ratings were aggregated across multiple raters and collected at a subsequent measurement wave. To mitigate potential bias, we will explicitly code the independence of the outcome measure, as well as other relevant methodological features (e.g., time lag between feedback and follow-up) and will examine them as a methodological moderator and as part of the study quality assessment.
Studies conducted in laboratory settings will be analyzed separately from those conducted in employment settings on the grounds that they are likely to have lower ecological validity.
The following study designs will be excluded from this systematic review: case studies, cross-sectional studies, and qualitative studies.
We will include studies that are published or unpublished, in any form, in any language, as long as they meet all other eligibility criteria.
Types of Participants
In the current study, we will include adult participants (18 years old or older) who are employees at any level of seniority, working in any industry, on any type of employment contract (e.g. employed, self-employed), from any demographic or socio-economic groups; and adults recruited specifically in order to undertake work tasks in laboratory experimental studies outside of regular employment conditions.
Types of Interventions
The review will include studies which focus on MSF, understood as feedback aggregated from at least two sources. One individual who assesses ratee’s performance (e.g., immediate supervisor, peer, subordinate, client, self-assessment) is considered one source of evaluation. The raters must be human – we will exclude both automated feedback generated by systems (e.g., KPI reports) and AI-based feedback produced by algorithms. While AI-supported feedback systems are becoming increasingly common in organizations and represent an important area for future research (e.g., Biswas et al., 2024; Kaliisa et al., 2026), the interaction between human and AI-based feedback constitutes a distinct and complex topic that warrants separate investigation. In this review, we focus specifically on human multisource feedback interventions, as understanding the effects of feedback aggregated from multiple human raters is an important step for evaluating and designing future hybrid feedback systems that may incorporate AI.
Both quantitative and qualitative feedback will be included. Quantitative feedback refers to numerical (or quantified) ratings of behaviors or competencies (e.g., Likert-scale evaluations), whereas qualitative feedback refers to narrative comments or open-text responses provided by raters. Feedback may be delivered in different formats, including written or digital reports (e.g., via email or an online platform), oral feedback only (e.g., through facilitated feedback sessions involving a coach, supervisor, or workshop) or mix of both. Because MSF interventions often include multiple components (e.g., type of provided feedback or delivery mode), we will extract and code key features reported in the primary studies.
For intervention studies, eligible control groups include: no feedback; or single-source feedback.
Types of Outcome Measures
Primary Outcomes
The primary outcome of the review is individual performance, meaning the extent to which a person accomplishes their goals and produces the intended results, without consideration of the costs or inputs needed for achieving these results, or follow-on impacts on wider organizational performance. More specifically, our definition of performance will cover:
In the current review, for all four types of performance outcomes, we will include studies using objective measures (e.g., financial performance, number of correct answers), as well as studies in which performance is measured subjectively (e.g. through ratings by individuals, including as part of the MSF tool).
Performance outcomes can be measured using either standardized or unstandardized instruments. The primary focus of this review is on behavioral change, measured as changes in performance ratings over time within observational studies; ratings must be collected at multiple time points using the same instrument. In observational studies, these measurements can be a part of the MSF instrument itself (i.e., changes in MSF ratings over time) or independent assessments collected in addition to the MSF tool; we will combine these types of measures in the meta-analysis. In intervention studies, outcome measures should ideally be independent from the MSF intervention and consistent across comparison groups. However, because intervention studies using fully independent performance measures are rare in the MSF literature, we will combine results from these studies with those from studies in which performance outcomes are derived from the MSF instrument itself. If sufficient high-quality intervention studies with independent performance measures are available, we will conduct a secondary analysis to examine the causal impact of MSF on performance, analyzing these studies separately. To address the potential dependency between intervention and outcome, we will code whether performance measures are independent or MSF-derived and examine this feature as a methodological moderator in the analysis. In all cases, outcome variables must be measured at different time points (at least two measurements) using the same tool. If a performance measure is reported at more than two time points, we will select one time point before and one after the action of sharing feedback, prioritizing the points with the largest number of observations.
We expect that most measures of performance outcomes will be subjective, i.e., a result of self-evaluation and/or evaluation by other individuals, but some studies might provide objective measures of performance (e.g., sales results, financial performance). The main focus of this meta-analysis is on objective performance. However, we will also consider studies that in the second measurement point use surveys on perception of change in performance, and run additional analysis in which we will study whether MSF predicts perception of changes in performance. Studies that use qualitative measurement methods or studies that measure only attitudinal outcomes (e.g. satisfaction, commitment) will be excluded.
When multiple measures of the same outcome are reported, we will prioritize those with the largest sample size (i.e., lowest missing data), objective rather than subjective outcomes, and measures with stronger psychometric properties.
Secondary Outcomes
None
Potential Adverse Effect
While MSF is generally intended to improve performance and development, we will also consider its potential adverse effects. These may include, but are not limited to, negative emotional reactions (e.g., distress, embarrassment, frustration), decreased motivation, perceived unfairness or bias in ratings, and, declines in work performance. We will code and report any such adverse outcomes described in the included studies to provide a balanced and comprehensive assessment of the effects of MSF.
Search Methods for Identification of Studies
Our search strategy will focus on the overall HR management activities linked to MSF literature, and limit results to the following: • Research reported from 1975 to 2026. • Research in any of these languages: English, Italian, Polish, Romanian, and Spanish (languages spoken by the authors/reviewers). • Research focused on MSF use for improvements in work performance.
The search strategy will be used in a number of electronic information sources to obtain a comprehensive body of research. These sources will include subject-specific, multi-disciplinary bibliographic databases and special grey literature databases and repositories, alongside selected websites.
Additionally, we will manually browse the reference lists of previous literature reviews, and the tables of contents of relevant journals. We will run a ‘known-item’ search in Google Scholar using the titles of selected articles, to follow up their citations: the documents which cite articles selected in the previous search rounds will be reviewed for inclusion.
Finally, we will set search alerts in key bibliographical databases to update the initial set of bibliographic records obtained from the systematic search, to be as exhaustive as possible for the last year of the chronological coverage established in the search strategy (2022). The records from the alert will be used only if they arrive before the end of the studies selection process through the screening of all titles and abstracts.
Electronic Searches
Searching Online Bibliographic Databases
A broad range of bibliographic databases will be the primary search outlets used to gather relevant studies. The databases to be included in our search are the following:
Specialized Databases
• ABI/INFORM (ProQuest) • BazEkon (https://bazekon.uek.krakow.pl/en/pomoc) [Polish bibliographic and full-text literature database in the field of economics and related discipline] • Business Source Ultimate (EBSCO) • The Campbell Library • The Cochrane Library • CINAHL complete (EBSCO) • EconLit (American Economic Association via ProQuest) • ERIC (U.S. Department of Education via EBSCO) • ESSPER: periodici italiani di economia, scienze sociali e storia (https://www.biblio.liuc.it/scripts/essper/default.asp) • Medline (NLM-NIH via ProQuest) • PAIS Index (ProQuest) • PubMed (NLM-NIH) • Psychology & behavioral sciences Collection (EBSCO) • PsycINFO (APA via ProQuest) • PubPsych (ZPID-Leibniz Institute for Psychology Information) • Sociological abstracts (ProQuest)
Multidisciplinary Databases
• Clase: Citas Latinoamericanas en Ciencias Sociales y Humanidades (https://clase.unam.mx/) • CORE: The world’s largest collection of open access research papers (https://core.ac.uk/) [includes a vast number of journal portals (like Dialnet, Redalyc, Scielo, etc.) or metadata aggregators (like Directory of Open Access Journals-DOAJ): https://core.ac.uk/data/providers/] • Emerald Insights (includes all “Emerald Management” sets alongside with all the Emerald content) • Indices CSIC (https://indices.csic.es/) • International Bibliography of the Social Sciences (IBSS) (ProQuest) • Pascal & Francis (CNRS) (https://pascal-francis.inist.fr/) • SCIPIO: Romanian editorial platform (https://www.scipio.ro/en/) Social Sciences Premium Collection (ProQuest)
Citation Indexing Databases
We will perform subject searches, as well as searches for the papers that cited studies eligible for inclusion. • Google Scholar (https://scholar.googlecom/.cv9) • Lens. org (https://www.lens.org/) • Scopus (Elsevier) • Web of Science Core Collection (Clarivate) [includes Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), Arts & Humanities Citation Index (AHCI), Emerging Sources Citation Index (ESCI), and Conference Proceedings Citation Index (CPCI)]
Special Databases (Dissertations, Working Papers, Reports and Grey Literature in General)
Nowadays, special databases cover extensive spectrum of grey literature, including conference proceedings, preprints, working papers, white papers, reports, among others. We will thus search: • BASE: Bielefeld Academic Search Engine (https://www.base-search.net/) Dissertations and Theses (ProQuest) • NDLT: Networked Digital Libraries of Theses and Dissertations (https://search.ndltd.org/) • OpenGrey.eu (https://www.opengrey.eu/) • Research Papers in Economics (RePEc) IDEAS: https://ideas.repec.org/ • Social Sciences Research Network (SSRN): https://www.ssrn.com/en/
Terms, Descriptors, Keywords for the Search Queries
Each bibliographical database tends to have certain specificities related to the query language, the search interface and the level of terminological control used in the subject fields devoted to content description. As a general approach to this issue, we will use the best combination of subject headings/thesaurus terms and keywords for the formulation of the search queries, independently from the functionalities of each database. The searches will prioritize, when available, subject field of titles, abstracts, keywords and subject headings/descriptors. For the grey literature websites and repositories, advanced keyword searching will be used if available.
The searches will be performed in English, Italian, Polish, Romanian and Spanish, using the intersection of a set of related, specific or synonymous terms to “multisource feedback” that represents the tool object of research in this review, with a set of terms related to the outcomes linked to the use the tool, e.g., “performance” or “attainment of goals”, or with the intervention settings where the tool is used, e.g., “work settings” or “companies” (for more details see Appendix 1. Keywords that cover the main concepts being examined in the review).
The search strategy developed for PsycINFO is included in the appendix as an example, which could be considered a pattern for other databases (see Appendix 2). This search will be translated/adapted across the aforementioned database platforms. The searches will be limited to articles reported from January 1, 1975 onward, and in the languages specified above.
Searching Other Resources
Hand Searching
Apart from searching for additional references in the reference lists of the included studies and relevant meta-analyses, we will screen tables of contents in the following journals for additional relevant studies: • Human Resource Management (ISSN 1099-050X): https://onlinelibrary-wiley-com-443.web.bisu.edu.cn/journal/1099050x • Human Resource Management Journal (ISSN 0954-5395): https://onlinelibrary-wiley-com-443.web.bisu.edu.cn/journal/17488583 • Journal of Applied Psychology (ISSN, 1939-1854): https://www.apa.org/pubs/journals/apl/ • Journal of Business and Psychology (ISSN 1573-353X): https://www.springer.com/journal/10869/ • Personnel Psychology (ISSN 1744-6570): httpsa://onlinelibrary.wiley.com/journal/17446570 • The International Journal of Human Resource Management (ISSN 1466-4399): https://www-tandfonline-com-s.web.bisu.edu.cn/toc/rijh20/current
We will also manually search for proceedings of management conferences for reports or publications. Many of them are indexed in the aforementioned electronic bibliographic databases, but the coverage may be uneven. The following conference proceedings/publications are not fully indexed in databases and will be screened for relevant studies on their openly available websites, including only research from January 2000 onward: • Annual Meeting of the Academy of Management: https://aom.org/events/annual-meeting/past-annual-meetings • EAWOP Congress: https://www.eawop.org/past-congresses • International Conference on Advances in Management Sciences (ICAMS): https://www.icams.org/ • SIOP Annual Conference: https://www.siop.org/Annual-Conference
Moreover, in order to complete a full coverage of relevant information resources, we will be consulting with field experts and practitioners. We will use a snowball sample technique, asking which are the most important seminal works for them, which work impacted the field most in the last 10 years, and the most recent work that captured their attention.
Application of Inclusion Criteria - Examples
In this section, we provide examples of studies likely to be included and excluded in the review.
One study likely to be eligible for inclusion in the review is a study by Bailey and Fletcher (2002), which explores the impact of multiple source feedback on management development. This longitudinal study involved 104 managers from large private-sector service organization. The participants were evaluated at two time points with two years between administrations. At both measurement points, the managers received written feedback report with ratings from one superior, one or more first level subordinates and one or more second level subordinates. The raters evaluated the target managers on four competency dimensions related to their interpersonal behaviors. The outcome variables studied were: (1) changes in co-workers’ perceptions of their target manager’s competence, (2) changes in target managers’ development needs over time, (3) factors influencing a target manager’s revised self-assessment and co-workers ratings, (4) changes in congruence between self and co-workers ratings and (5) the relationship of feedback to the organization’s formal performance appraisal process. The authors found that significant increases in managers’ competence were perceived by the managers’ themselves and by their subordinates, development needs were seen to reduce and self and co-workers’ ratings were largely seen to become more congruent. However, contrary to the authors’ hypothesis, the co-workers’ feedback at Time 1 was not predictive of targets’ self-assessments at Time 2.
A paper by Heslin and Latham (2004) provides another example of a study which appears to meet our inclusion criteria. In this quasi-experimental study, 70 managers in the Australian taxation division of an international professional services firm received upward feedback from 3 to 9 of their subordinates. The job performance of the managers was measured twice with a behavioral observation scale as part of an MSF intervention. Thirty-five managers from the treatment sample received feedback right after the first assessment (Time 1), then, six months later their performance was evaluated again (Time 3). Moreover, between these two assessment points, the authors measured managers’ learning goal orientation and self-efficacy (Time 2). The managers from the comparison group received feedback only after the second performance evaluation (Time 3). Results revealed that the subordinates perceived their managers’ performance to be significantly higher in the second assessment, compared to both initial performance and subordinate ratings of a comparison group. Self-efficacy moderated this finding, suggesting that it plays a key role in determining behavioral reactions to upward feedback. Moreover, the managers’ learning goal orientation correlated significantly with their subsequent performance.
An example of an ineligible study is Lockyer et al. (2003). The purpose of this study was to examine the likelihood that professionals (surgeons) would change their practice in response to feedback from multiple sources they receive, which is consistent with the objectives of our SR. However, surgeons’ performance was evaluated only once, on the occasion of the MSF intervention (Time 1). At the second measurement point (Time (2) surgeons were asked to self-assess whether the feedback they received was likely to lead them to implement a change. According to our inclusion criteria, the information which indicates ratees’ perception of change in their behavior is insufficient to investigate whether MSF predicts change in performance.
Data Collection and Analysis
Selection of Studies
The screening procedure will include two steps. First, the titles and abstract of all candidate studies will be reviewed by two reviewers (two out of the three authors: EW, IC or JG). The reviewers will use the criteria included in Appendix 3, Level 1: Screening of titles and abstracts, in order to decide if the study should go to the next phase of screening. Second, the papers included in the first stage will be screened based on the full article text. Again, a double coding procedure will be used. The relevance of the study will be assessed according to the checklist in Appendix 3, Level 2: Eligibility decision based on information retrieved from full text. The articles which pass both screening stages will be included in the review and will be a basis for data extraction. Excluded studies will be assigned a specific reasons for exclusion. Furthermore, we will produce a PRISMA flow chart to illustrate the flow of studies through the process.
The two-step screening procedure explained above will not exclude articles based on whether the reported estimates are usable in the review or not.
In case of disagreement between the reviewers at any stage of the screening procedure, the full review team will discuss the inclusion decision until it is solved.
Data Extraction and Management
The research team will extract the relevant information from each of the selected papers. The information will be entered into a Study Coding Form (Appendix 4). The information from each study will be retrieved twice, by two different reviewers (two out of the three authors: EW, IC or JG); similarly, a risk of bias and quality of each study will be assessed separately by two reviewers. Any discrepancies in extracted information and evaluation will be discussed and solved by the review team. The Study Coding Form and screening checklists will serve as the coding manual. Coders will be trained by jointly reviewing these materials and pilot-coding a small set of studies to ensure consistency before commencing full data extraction. If any relevant information is missing from reports of included studies, we will contact the authors of the article to request it. Furthermore, information about each included study will be presented in a descriptive table, including all the elements that will be extracted using the information extraction form.
Assessment of Risk of Bias in Included Studies
All the studies included in the review will be assessed for risk of bias. In order to analyze the quality of evidence related to each of the key outcomes we will use (1) the adaptation of the Cochrane Collaboration’s tool for assessing risk of bias (Higgins et al., 2011) the draft of this adapted version is provided in Appendix 5; (2) the GRADE approach (GRADE Working Group, 2004; Guyatt et al., 2008).
The assessment of the risk of bias related to the included the following potential sources of bias: 1. Selection bias (any variables that may be confounded with predictors at the outset), e.g., participants in MSF programs are only high-performing leaders in the organization; 2. Performance bias (any variables confounded with predictors during exposure), e.g., participants in MSF programs also benefit from other developmental interventions like training or mentoring; 3. Reliability and validity of measures for each key construct (unreliable/invalid measures produce results that are unreliable/invalid), e.g., survey to collect MSF has weak psychometric properties, questions used to collect qualitative input are leading or double-barreled; 4. Detection bias (any indicators that outcome measures differ depending on exposure to predictors), e.g., MSF participants are evaluated by a different number of sources; 5. Attrition bias (any evidence that people exposed to interventions are retained in the study at higher or lower rates than others) – applicable for controlled studies; e.g. MSF participants have higher or lower job turnover rates than those who do not go through MSF; 6. Conflict of interest (any reason that authors have a vested interest in results of the study), e.g., the authors are commercially selling the MSF tool that they present data on; 7. Selective reporting (any evidence that some results are not reported): e.g., the authors mention that several outcomes will be measured to assess performance, but then they do not report data on all of them.
We added the domains ‘Reliability and validity of measures’ and ‘Conflict of interest’ to address specific risks in MSF studies, where performance outcomes may be derived from the intervention itself or influenced by vested interests of authors or organizations.
Assessment of Reporting Biases
To assess heterogeneity, we will calculate τ2, the Q statistic, and I2, as these indicators provide complementary information about between-study variability (Higgins et al., 2003). We will also report prediction intervals, which estimate the range of true effects in future studies and provide a more interpretable measure of heterogeneity than the Q-test alone (Borenstein et al., 2021). We expect to find both before-and-after single-group studies and before-and-after controlled studies that meet the eligibility criteria. We will conduct a moderator analysis to test if the results from the two types of study design are similar. If the results are homogeneous, we will pool the effect sizes across designs; otherwise, data will be analyzed separately in the meta-analysis.
Assessment of Publication Biases
To assess publication bias, we will examine funnel plot asymmetry for each meta-analysis, as asymmetry may indicate selective publication of studies with significant or favorable results (Sterne et al., 2011). We will also conduct statistical tests for small-study effects, such as Egger’s regression test and Begg’s rank correlation test, where applicable. We expect to include both before-and-after single-group studies and before-and-after controlled studies that meet the eligibility criteria. If sufficient studies are available, we will conduct subgroup or sensitivity analyses to evaluate the robustness of the findings to potential publication bias. When asymmetry or small-study effects are detected, we will discuss their possible impact on the meta-analytic conclusions and interpret the pooled effect sizes accordingly.
Unit of Analysis Issues
We consider the primary unit-of-analysis for this review a research study understood as a distinct sample of study participants involved in a common research project. Multiple reports (e.g., publications, technical reports, etc.) from a common research study will be coded as a single study. That is to say, a research study will only be treated as unique if the study sample does not include participants included in any other coded study. In case of doubt, we will contact the author for clarification. Multiple effect sizes will be coded, if possible, from each study.
If a study includes more than two intervention arms, only information about intervention (MSF) and control groups (single-source feedback, no feedback) that meet the eligibility criteria will be included. In case of the studies with more than two comparison groups or measures repeated more than once, the results will be synthesized using multivariate methods where possible, and separate meta-analyses will be conducted for different common end points. If a study includes several eligible MSF interventions (e.g., groups with different numbers of feedback sources), these interventions will be combined to create a single pair-wise comparison for the analysis corresponding to the first objective, and then analyzed separately, if necessary, for moderator analysis.
We will consider inclusion of studies with non-standard designs that can violate the principle of non-independence (cluster-randomized trials and cross-over trials), with the following procedure: in case of cluster-randomized trials the information will be analyzed at the individual level, while accounting for the clustering in data. For the studies that correctly performed randomization on the clusters, the analysis will be based on a direct estimate of the required effect measure (e.g., an odds ratio with its confidence interval) from an analysis that properly accounts for the cluster design. These effect estimates and their standard errors will be meta-analyzed using the generic inverse-variance method in R. Determination of the model appropriate for the analysis (e.g., multilevel model, variance components analysis, generalized estimating equations) will be consulted with statisticians.
For the studies in which the randomization was performed on the individuals rather than the clusters, the analysis will be based on the following information: • the number of clusters (or groups) randomized to each intervention group; or the average (mean) size of each cluster; • the outcome data ignoring the cluster design for the total number of individuals (for example, number or proportion of individuals with events, or means and standard deviations); and • an estimate of the intracluster (or intraclass) correlation coefficient (ICC).
Following recommendations of Rao and Scott (1992) the “effective sample size” (original sample size divided by the design effect: 1 + (M – 1) ICC, where M is the average cluster size and ICC is the intracluster correlation coefficient) will be calculated. Additionally, the sensitivity analyses will be conducted.
Regarding the cross-over trials, we will first assess the risk of bias related to this study design, mainly carry-over and period effects, and on the data provided in the study. Basing on this information we will choose one of the recommended types of analysis: 1. A paired t-test (Elbourne et al., 2002), if neither carry-over nor period effects are present, and at least one of the following information is available: − individual participant data from the paper or by correspondence with the trialist; − the mean and standard deviation (or standard error) of the participant-specific differences between experimental intervention (E) and control intervention (C) measurements; − the mean difference and one of the following: (i) a t-statistic from a paired t-test; (ii) a P value from a paired t-test; − (iii) a confidence interval from a paired analysis; − a graph of measurements on experimental intervention (E) and control intervention (C) from which individual data values can be extracted, as long as matched measurements for each individual can be identified as such. 2. Analysis of data only from the first period, in case of carry-over effect. 3. Imputation of missing standard deviations.
Handling Dependent Effect Sizes
We expects that some of the included studies may report multiple effect sizes per study (e.g., multiple performance outcomes, or multiple rater sources), which creates statistical dependence. To account for this, we will use a multilevel meta-analytic model, treating effect sizes as nested within studies. This approach allows us to model within-study correlations, produce unbiased estimates, and include all relevant effect sizes without discarding data. Additionally, we will conduct sensitivity analyses using robust variance estimation (RVE) to ensure that our results are not sensitive to the unknown dependence structure.
Measures of Treatment Effect
To establish whether MSF predicts change in performance, we will calculate the effect size between the pre-intervention and the post-intervention measures of outcomes, i.e., performance.
We expect to find a variety of approaches to calculating and reporting effect sizes in the included studies, such as correlation coefficients, standardized mean differences, and odds ratios. We will extract all the information regarding effect sizes reported by the studies, as well as all the raw data and valid n values. Furthermore, we expect that some studies will report values such as regression coefficients. If this is the case, we will consult the Campbell policy on synthesizing bivariate and partial correlations (Aloe et al., 2016) and will act accordingly. We will subsequently use the information extracted from included studies to calculate one effect size estimate for each independent sample. The effect size indicator which we will use for this is the correlation coefficient r. For determining it, we will use the effect size calculator developed by David Wilson (https://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-Home.php).
For the purpose of the meta-analysis, we will transform the correlation coefficients into Fischer’s Z values, as advised in Erez et al. (1996). Afterwards, the results will be converted back to correlation coefficients to ease interpretation.
Dealing With Missing Data
If any relevant information is incomplete, we will contact the authors of the article, in order to obtain the missing data, and search for additional reports of this study. We will consult our advisory team in order to choose of the most suitable strategy for dealing with missing data.
Data Synthesis
To synthesize the results, we will first calculate one effect size for each independent sample identified in the primary studies that will be included in the systematic review. To pool the effect sizes across studies, we will calculate the inverse variance weight for each study, which will be used to calculate a weighted average effect size. This procedure aims to give more weight to studies that have more precise estimates (larger samples and lower SE) compared to studies that offer less precise estimates (smaller sample and higher SE).
For the meta-analytic calculation, we will use a random-effects model, because we do not expect all studies to estimate the same population parameter. To graphically display the pooled effect size, together with descriptive information about the included studies, we will use forest plots.
Planned Moderator Analyses
The primary list of moderators includes those suggested by theory and previous literature. As already listed above, these are: a. The number of feedback sources b. The characteristic of feedback (positive vs. negative) c. Employee perceptions of and reactions to feedback d. Psychological individual differences e. Follow-up activities f. The intended purpose of MSF g. The format of feedback h. The delivery mode of feedback i. Socio-cultural individual differences
For psychological individual differences and sociological individual differences, we will conduct the moderator analysis on any variable in each category for which we find enough studies reporting data to grant the analysis.
In addition, we plan to explore methodological moderators, such as study design, type of outcome measure, or risk of bias, to assess whether these features influence the estimated effects.
We will use meta-regression to test moderator effects. Continuous moderators will be included directly, and dichotomous or categorical moderators will be dummy coded. Where data permit, multiple moderators will be included simultaneously in the same model to explore joint effects.
Treatment of Qualitative Research
We do not plan to include qualitative research.
Software
In the current study we will use the following programs and tools: 1. Zotero – for the collection, organization, and citation of reviewed studies. 2. Abstrackr – for the initial screening of titles and abstracts. 3. Covidence – for manual screening of studies, coding and data extraction. 4. Microsoft Office (MS Word) – for writing and documenting the review. 5. R/R Studio – for analyzing study results. The packages that we plan to use for meta-analytic analyses in R/R Studio include metafor for multilevel random-effects meta-analysis and meta-regression, and robumeta or clubSandwich for sensitivity analyses using robust variance estimation (RVE).
Subgroup Analysis and Investigation of Heterogeneity
We will examine potential sources of heterogeneity across studies. Specifically, we will analyze whether differences in effect sizes are associated with study characteristics, for example, “risk of bias”. Additionally, we will explore the effect of theory- and literature-driven moderators (see section ‘Theory-based moderators’) as well as methodological moderators, including study design type (before-and-after single-group vs. controlled studies) and outcome measurement features.
To quantify heterogeneity, we will calculate τ2, Q, and I2 statistics, which provide complementary information about between-study variability. We will also report prediction intervals to indicate the range of true effects expected in future studies. If heterogeneity is low and results across study designs are similar, effect sizes will be pooled; otherwise, analyses will be conducted separately by study design.
Sensitivity Analysis
We will perform sensitivity analysis to investigate whether the findings from our systematic review are not affected by any arbitrary or unclear decision that may have been made. Specifically, sensitivity analyses using robust variance estimation (RVE) will be conducted to account for dependent effect sizes. Additionally, we will examine the impact of excluding studies with high risk of bias, extreme effect sizes, or specific methodological choices (e.g., selection of time-points for pre- and post-intervention measures) to ensure that results are robust to these decisions.
Supplemental Material
Supplemental material - Multisource Feedback and Work Performance
Supplemental material for Multisource Feedback and Work Performance by Emilia Wietrak, Iulia Alina Cioca, Jonny Gifford, Cristóbal Urbano in Campbell Systematic Reviews.
Footnotes
Acknowledgements
We thank Julia Littell, systematic review expert, for her guidance in protocol writing, Michael Borenstein, Hannah R. Rothstein, and Ariel Aloe for their advice on meta-analysis, and the Campbell editor and reviewers for their valuable comments and constructive feedback on drafts of this protocol.
Author Contributions
C
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
Iulia Cioca collaborates as a freelancer on projects with a consultancy company which delivers multisource feedback services, such as design of feedback questionnaires and coaching sessions delivering results to individuals. However, she is not personally involved in such work, and her current and past projects from the last 2 years do not focus on multisource feedback. She could potentially be involved in future commercial work leveraging on the results of the current review. The other authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Preliminary timeframe
We plan to submit a draft review by 30 September 2026 and the final draft review by 30 December 2026.
Sources of support - Internal sources
No sources of support provided
External sources
No sources of support provided
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
