Abstract
Given recent calls for enhancing multilevel leadership research, the present study uses event-level data to test the proposition that follower leadership perceptions result from aggregated personal observations of leader behavior. Although this proposition implies causality, it has been mostly tested with between-person retrospective data, coupled with correlational study designs. Our data refer to the perceived nature of leadership messages communicated during the most recent conversation with followers. These event-level data were used in a feedback intervention coupled with goal setting and population norms devised to change leader verbal behavior. The study was designed as a randomized field experiment in a manufacturing company in which supervisors in the experimental group received two feedback sessions regarding the extent to which conversation participants perceived their messages as transformative, transactive (corrective), or passive. Supervisors in the control group received no feedback. The data indicated significant changes in verbal leader behavior for experimental group supervisors, remaining unchanged in the control group. Such changes resulted in matching changes in follower leadership perceptions, measured 8 weeks before and after the intervention. Furthermore, performance outcomes improved in the experimental group, remaining unchanged in the control group. Implications for leadership research are discussed.
Keywords
One of the key propositions in leadership research is that observable leader behavior gives rise to follower leadership perceptions and job performance (Avolio, Walumbwa, & Weber, 2009; Epitropaki & Martin, 2004; Lord & Maher, 1991). In other words, it is assumed that aggregated perceptions of leader behavior serve as a proximal antecedent of follower leadership perceptions. Such aggregation can occur by combining retrospective data collected cross-sectionally from members in multiple groups, or by combining event-level data collected repeatedly from a smaller sample of individual group members. The former design uses between-group or between-individual variability for testing this proposition whereas in the latter design, within-group/individual variability resulting from leader behavior across different events is being used for testing the same proposition (Hoffman & Lord, 2013; Kaiser, Hogan, & Craig, 2008; Morgeson, 2005). In terms of the level-of-analysis framework, the former approach uses person- or group-level data, whereas the latter approach uses event-level data (Fisher, 2008; Hoffman & Lord, 2013; Morgeson, 2005).
Although using a person/group level of analysis simplifies research design, such data are subjected to a number of validity threats as follows: (a) Stored leadership memories relating to different events are likely to be biased due to situational variability factors such as event salience and followers’ affect or expectations (Dinh, Lord, & Hoffman, 2012) and (b) the assumed encoding of observed leader behaviors into basic leadership categories (e.g., implicit leadership theory; Lord & Maher, 1991) compromises the reliability of memory-based follower perceptions (Fisher, 2008; Morgeson, 2005).
By contrast, using event-level data to measure leader behavior offers more valid information due to the following reasons. First, being temporally proximal to the observed leader behavior, event-level data are affected more by actual action characteristics than by its stored representations. Event-level ratings offer a more reliable source of data for testing the leader behavior–follower perceptions relationship (Fisher, 2008). Second, given that leader behavior is characterized by complexity, having to deal with a variety of competing operational demands, event-level perceptions can partial such complexity by virtue of being targeted on a single, time-bounded event (Uhl-Bien, Marion, & McKelvey, 2007). Third, event-level information is stored in episodic, rather than semantic memory, which minimizes the effect of cognitive schemata and implicit leadership categories on recall and subsequent leadership perceptions (Martell & Evans, 2005).
At the same time, given that leadership theories inherently involve different levels of analysis, the first conceptual multilevel frameworks for studying managerial behavior were published by prominent leadership scholars (Dansereau, Alutto, & Yammarino, 1984; Dansereau & Yammarino, 1998a, 1998b; Hunt & Dodge, 2000). Leadership research has thus become a primary antecedent of the multilevel paradigm in organizational behavior studies at large (Kozlowski & Klein, 2000). Several reviews of the leadership literature have revealed, however, that less than 50% of conceptual or empirical leadership papers have applied the appropriate level of analysis (Dionne et al., 2014; Markham, 2010; Yammarino, Dionne, Chun, & Dansereau, 2005). By default, there is a need in correcting such paucity of multilevel leadership studies.
Our study was designed, therefore, to test the leader behavior–follower perception relationship with event-level rather than person-/group-level data (Fisher, 2008; Hoffman & Lord, 2013). In addition to filling the gap in multilevel leadership studies, our study was designed to test (cross-level) causality between observed leader behavior and follower leadership perceptions and job performance. Namely, it was designed as a randomized field experiment in which verbal leader behavior was manipulated in the experimental group, remaining unchanged in the control group.
Practically speaking, given the scarcity of event-level research, this study should provide a supplemental test of the leader behavior–follower perception relationship. For reasons described below, this study used informal daily conversations between leaders and their group members as its target event category. Doing likewise, this study is expected to demonstrate that everything a leader says (or fails to say) is observed by exchange participants and influences their leadership perceptions and job performance. In effect, this study puts the following statement to an empirical test: “Leaders live in fish bowl and are always being watched. They should always be conscious of that fact and take advantage of it” (Gene Klann, Center for Creative Leadership). Making managers better aware of this leadership aspect constitutes its primary practical implication.
Choice of Leader–Follower Conversations as Focal Events
Observational studies of managers have indicated that workplace leaders spend most of their time communicating with others in the organization (Kotter, 1982; Samra-Fredericks, 2000). This agrees with the fact that leadership has been traditionally characterized in terms of relational dynamics in which someone influences other group members toward the achievement of group goals (see reviews in Hosking, 1988; Uhl-Bien, 2006). Considering that language constitutes the primary medium of social interaction (Deetz, 2003; Phillips & Oswick, 2012), it follows that much of leader behavior is observed during verbal exchanges with followers. The present study uses, therefore, the most recent leader–follower conversation as the target episode category for collecting event-level data.
This choice is based on three arguments. First, daily supervisory exchanges during informal work-related conversations offer multiple opportunities for observing leader (verbal) behavior. Second, given the high frequency and spread of conversations with group members, within-group variation is to be expected, allowing the use of event-level data for testing the leadership behavior–follower perception relationship. Third, by asking respondents to rate leader (verbal) behavior based on the last conversation (subject to the condition that it has taken place during the last few hours), such temporal proximity improves data reliability for the reasons discussed above. Using Hofmann and Lord’s (2013) event taxonomy, daily conversations can be characterized by the following event dimensions: micro-level events that are embedded within larger macro-level events, dynamic events that continue to unfold in time, familiar events that are amenable to interpretation using memory-based schemas, and ordinary events that constitute part of expected work practices.
Learning Goals Strategy for Leadership Training
The study design reflects our intention to combine the use of event-level data with testing of causality between leader behavior and follower perceptions. Although causality can be tested with a number of experimental designs, we decided to frame our study as a leadership-training project, allowing modification of leader behavior in the experimental group while leaving it unchanged in the control group. Post-training changes in the former group’s leadership perceptions would offer evidence of causal relationships between the two variable categories. It is to the description of our training strategy that we turn next.
Our strategy for leadership training involved follower feedback regarding observed leader behavior, coupled with learning goals associated with the development or acquisition of new leadership (verbal) practices as leverage for change. Feedback data were based on the distinction between the manifest content of verbal behavior during conversations, relating to the explicit vocabularies used by the leader, and latent content, relating to implicit meanings as construed by exchange partners (Phillips & Oswick, 2012; Suddaby & Greenwood, 2005). Given that followers are assumed to collect evidence regarding (subjectively construed) managerial behavior and use it to deductively shape their leadership perceptions, it follows that the latent content of such behavior must serve as the pertinent information (Suddaby & Greenwood, 2005). Consequently, our feedback data have been based on managerial behavior as perceived or interpreted by exchange partners.
Given the robust effect of goal setting on motivation and performance (Latham & Locke, 1991), feedback data in our intervention were accompanied by learning goals focusing on the acquisition of new leadership (verbal) practices. This was done based on meta-analytic results indicating that feedback intervention effects were greater when accompanied by goal setting (Kluger & DeNisi, 1996). Unlike performance goals, learning goals focus on knowledge or skill acquisition (e.g., develop new customer-service strategies) rather than on improving performance outcomes (e.g., greater speed or lower costs). The purpose of learning goals is to stimulate exploration of new ways of doing things, whereas the purpose of performance goals is to keep doing things in the same proven ways yet with greater effort and/or persistence (Seijts & Latham, 2001; Winters & Latham, 1996).
Learning goals are considered more appropriate for individuals who have not yet learned how to perform a complex task or are being expected to identify new ways for performing their task. Given that goal-setting effectiveness depends on goal specificity (Latham & Locke, 1991), learning-goal specificity can be based on quantifying the number of new skills to be acquired (e.g., develop three new strategies for improving customer service) or achieving a certain mastery score for newly acquired skills (Seijts & Latham, 2001; Winters & Latham, 1996). As noted below, we adopted the latter approach by asking leaders to set goals specifying aspired levels for each of the three leadership behavior dimensions, that is, transformational, transactional, and passive behaviors (Bass, 1985).
Choice of Targeted Leadership Behaviors
Given the availability of different leadership categories with which implicit leadership theories can be formulated (House et al., 2002; Lord & Maher, 1991), our choice was guided by two selection criteria. First, the selected training framework had to be well established in leadership research, allowing for the incorporation of population norms as benchmarks helping participants to interpret their feedback data. Second, leadership categories in the selected training framework had to be compatible with the kinds of worker performance outcomes expected to improve following the intervention.
Starting with the second criterion, given a series of near misses and minor injuries, senior management in the manufacturing plant where our project had taken place expected it to improve supervisory leadership in general and safety leadership in particular. Combining generic and safety leadership is scientifically justifiable given meta-analytic results indicating a strong relationship between generic and safety leadership outcomes (Nahrgang, Morgeson, & Hofmann, 2011). Such data support the often-espoused assumption that leaders who care for followers’ development and psychological welfare extend it under hazardous work conditions by caring for their physical well-being (Barling, Loughlin, & Kelloway, 2002; Hofmann, Morgeson, & Gerras, 2003; Zohar, 2003). Given such expectations regarding outcomes of our training, leadership facets targeted for change in our training project had to be compatible with those of followers’ safety behavior.
Analyses of safety behavior indicate that it has two dimensions, identified interchangeably as safety discipline or compliance and safety participation or citizenship (Hofmann et al., 2003; Neal & Griffin, 2006; Zacharatos & Barling, 2003). The first dimension, safety discipline/compliance, refers to adherence to company safety procedures and/or government regulations. Such adherence provides a behavior-based layer of risk control (Reason, 1997). The second dimension, safety participation/citizenship, concerns safe conduct in dynamic work environments in which self-regulated behavior becomes no less important than adherence to safety rules and procedures. Meta-analytic results indicate that when jobs vary in terms of levels of routinization, as was the case in the present company, leadership-based improvement of safety behavior affects both behavioral dimensions (Christian, Bradley, Wallace, & Burke, 2009; Clarke, 2012; Nahrgang et al., 2011).
Considering the dimensionality of safety behavior and the first selection criterion mentioned above, we chose the full-range leadership (FRL) model (Bass, 1985) as the intervention’s conceptual framework. Our choice was based on the fact that its two primary leadership facets, that is, transactional and transformational leadership, can be naturally mapped with the two safety behavior dimensions. Furthermore, the FRL model is one of the most widely used in leadership research, offering population norms that can be used as benchmarks in feedback-based training interventions (Bass & Avolio, 2003).
Transactional leadership, the first FRL facet, is composed of management-by-exception (MBE) and contingent reward (CR) practices, consisting of monitoring or enforcing follower compliance to (safety) rules and regulations and providing supportive recognition (or enforcing obedience) for working by the rules (Bass, 1985). Bass and Avolio labeled the former as corrective leadership and the latter as constructive leadership. Both practices seemingly relate to the first dimension of safety behavior, that is, safety compliance. That is, by focusing on deviations from safety rules and regulations and rewarding attempts aimed at reducing such deviations, transactional leadership is expected to promote safety compliance.
Although Bass (1985) identified corrective and constructive leadership practices as dimensions of transactional leadership, meta-analytic studies using the FRL model indicate that the correlation between transformational and constructive leadership equals or exceeds .80 (DeRue, Nahrgang, Wellman, & Humphrey, 2011; Judge & Piccolo, 2004; Van Knippenberg & Sitkin, 2013). Such a strong correlation implies substantial overlap between the two FRL model elements, casting doubts about its discriminant validity.
As noted by DeRue and colleagues, although transformational leadership focuses on change-oriented behavior and constructive leadership on task-oriented behavior, its strong correlation apparently results from a shared set of relational-oriented behavior (DeRue et al., 2011). Such overlap apparently results from the fact that leaders must identify needs, desires, and individual capabilities to offer motivationally relevant consequences. It follows, thus, that constructive practices must be based on a mixture of task and relational orientations. Such a mixture makes it harder to discriminate between transformational and transactional practices. Relevant empirical evidence for this line of reasoning is provided by meta-analytic results, indicating that the correlation between transformational and corrective practices is substantially weaker (r
Recalling that our study was framed as a leadership-training project using feedback intervention coupled with goal setting, we decided to exclude constructive behavior from our intervention. This decision was based on meta-analytic results indicating that one feedback characteristic affecting training outcomes is ease of comparison with training goals (Kluger & DeNisi, 1996). Namely, when feedback data reflecting one’s current situation can be easily compared with the desired situation, it increases feedback intervention effectiveness. A similar idea is expressed by the proximity compatibility principle (Wickens & Carswell, 1995), which argues that feedback must be matched to the processing requirements of the task to be performed. When feedback data closely resemble the kind of desired behaviors, it enhances training outcomes (Atkins, Wood, & Rutgers, 2002; Kulhavy, White, Topp, Chan, & Adams, 1985). Given such evidence, we decided to increase feedback intervention effectiveness by excluding constructive practices from transactional leadership. From now on, we will refer to it as transactional (corrective) leadership, characterized by actively keeping track of irregularities, deviations, or infractions of (safety) rules and procedures.
Offering feedback data regarding perceived transactive (corrective) supervisory behavior, coupled with learning goals motivating its amplification, is expected to increase its frequency. Followers witnessing or observing cross-situational increase in supervisory attention to (safety) rule violations are expected to modify their leadership perceptions, resulting in higher transactional (corrective) leadership scores. These arguments lead to the following hypothesis:
Such change is expected to motivate followers to enhance safety compliance behavior while performing their job. This expectation is based on findings indicating the effect of leadership on safety compliance (Zohar, 2002) as well as meta-analytic results supporting the same relationship (Christian et al., 2009). Increased safety compliance by team members is expected, in turn, to reduce controllable risks and hazards around workstations (Glendon, Clarke, & McKenna, 2006), resulting in higher safety audit scores. These arguments lead to the following hypothesis:
The second FRL facet transformational leadership focuses on empowering followers by broadening their needs and abilities, such that they can develop new ways of performing the job and overcome unanticipated interferences or deal with obstructions to job performance (Bass & Avolio, 1997). Such practices have been shown to promote organizational citizenship behavior at large (Podsakoff, MacKenzie, Moorman, & Fetter, 1990) and safety citizenship behavior in particular (Hofmann et al., 2003). Consequently, an increase in transformative leader behavior in routine conversations due to our intervention is expected to raise follower transformational leadership ratings, leading to the following hypothesis:
Such a change is expected to motivate followers’ safety citizenship behavior, resulting in higher scores. This line of reasoning has been supported in a meta-analysis indicating that transformational leadership was more strongly related to safety citizenship than to safety compliance, with the reverse being true for transactional leadership (Christian et al., 2009). Furthermore, given the demonstrated effect of group members’ safety citizenship on reduction of risks and hazards around their workstations (Glendon et al., 2006), it is expected to result in higher safety audit scores. These arguments lead to the following hypothesis:
Passive leadership, constituting a secondary FRL facet, is also known as non-leadership or Laissez-Faire (LF; Bass, 1985). It denotes evasion of any kind of interaction with followers or failing to take leadership actions. In that sense, although passive leadership is negatively correlated with transformational and transactional leadership (Judge & Piccolo, 2004), it does not simply represent the lower end of these two leadership facets. Rather, its (reversed) scores capture the extent of engagement, motivation, or commitment to exert one’s leadership role by influencing group members’ behavior (Bass, 1985). Previous research indicated that passive leadership predicts traditional performance criteria (Komaki, Zlotnick, & Jensen, 1986; Yukl, Wall, & Lepsinger, 1990), as well as safety performance criteria (Komaki, Barwick, & Scott, 1978; Reber & Wallin, 1984; Zohar, 1980; Zohar & Fussfeld, 1981).
Although passive leadership does not map directly on either safety behavior dimension, given its demonstrated effect on safety performance outcomes, we decided to include it in our intervention, preserving the three-facet structure of FRL model. As a result, an intervention-induced decrease in the prevalence of passive leader behavior in routine conversations is expected to modify follower leadership perceptions, resulting in lower passive leadership scores. Such a change leads to the following hypothesis:
Given the evidence regarding negative relationship between passive leadership and safety performance, it is expected that decreased passivity will increase follower safety motivation, leading to higher safety behavior and audit scores. These arguments lead to the following hypothesis:
Team Processes as Covariate
Given that team processes have been shown to mediate the effect of team leadership on safety performance, it was included as a covariate in our statistical models, allowing a more stringent test of our hypotheses. The three primary team processes (i.e., coordination, cooperation, and communication) were shown by a recent meta-analysis to be among the strongest contextual factors affecting safety behavior and injury outcomes (Clarke, 2010). Such meta-analytic data reinforce previous literature reviews indicating that sharing any kind of task-related information among team members resulting in improved coordination or cooperation is likely to improve the team’s safety performance (Hofmann & Stetzer, 1996; Zacharatos, Barling, & Iverson, 2005).
Method
Participants and Procedure
The project took place in a mid-size heavy manufacturing company in Europe, producing components for large outdoor metal structures such as bridges or cranes. Due to the ever-changing design of its products, worker jobs are characterized by low routinization. Safety behavior in such work environments requires workers to exhibit both safety compliance and initiative (Glendon et al., 2006). The manufacturing division employs 364 workers, divided into 26 work all-male teams. Team size ranges between 8 and 16. Average worker age was 42.7 (SD = 8.9) with an average tenure of 9.3 (SD = 4.1) years in the company. Supervisors, serving as team leaders, were slightly older and had longer tenure at the company (Mage = 47.1; SD = 7.7 and mean tenure 14.5; SD = 6.9). Supervisors were all male, and they reported directly to the manufacturing division manager who held the position of vice president and belonged to the company’s senior-management team.
Experimental Design
The project was designed as a randomized field experiment in which half the supervisors in the manufacturing division (N = 13) were randomly assigned to the experimental group, receiving two individual feedback sessions 6 weeks apart. The other half (N = 13) received no feedback, although their workers were equally contacted for data collection purposes (see detailed description in the next section). Randomized assignment to study groups counteracts or significantly reduces threats to internal validity such as limitation of treatment effects, compensatory equalization of treatment, and small sample size (Cook & Campbell, 1976). In effect, random assignment with control group inclusion is the gold standard of field research due to its ability to test causality while minimizing most sources of bias in reported results.
Altogether, the 12-week intervention phase was composed of two data collection and feedback cycles, each lasting 6 weeks. Data collection from workers was conducted during the first 3 weeks per cycle, followed by 3 weeks for feedback provision. In the project’s conclusion, supervisors in the control group received printed individualized feedback based on the two cycles of their worker ratings, accompanied by a standard explanation regarding the interpretation of these data. Furthermore, supervisors in both groups received at that time a brief summary of intervention results, which they were asked to share with their workers.
Employing a before-after (i.e., pre- and post-test) design, all company workers were asked to complete a questionnaire delivered 6 to 8 weeks before and 6 to 8 weeks after the end of intervention. Given that the intervention phase lasted 12 weeks, questionnaires were filled out 28 weeks (i.e., 7 months) apart. The questionnaire included scales measuring transformational, transactional (corrective), and passive leadership, safety behavior, and team work. Questionnaires were filled out during work hours with workers arriving at pre-arranged times without their supervisor. Questionnaires were completed anonymously and collected immediately by members of the research team, who guaranteed absolute confidentiality before aggregating the data for group-level analyses. The overall response rate was 86%, resulting in a sample of 313 respondents filling out both questionnaires.
Given that the project was initiated with a public announcement by senior management, describing it as a leadership-training project designed to improve supervisory safety leadership, supervisors in both experimental groups were equally aware of this project and their role in it. As will be noted in the Discussion section, exposing control-group supervisors and workers to some intervention effects offered a more stringent test of our hypotheses by exposing control-group members to vicarious learning effects. Overall, 22 workers refused to participate in this process, resulting in a sample of 175 individuals in the experimental group and 167 in the control group.
Training Strategy
As noted above, the intervention phase consisted of two cycles involving data collection and feedback delivery, lasting 6 weeks each. The data collection procedure used the following steps. First, the sequence of calling workers by their cell phone in each workgroup was randomly determined to ensure random event sampling. Having answered the phone, the worker was asked if he has 3 to 5 min to answer a short list of questions regarding the last conversation with direct supervisor. Once they agreed, the caller started to read each of the six checklist items, requesting the responder to rate each item on a 3-point scale. This procedure was adopted following a pilot study, indicating that by reducing the number of checklist items to 6 (i.e., using two items for measuring leader behavior associated with each of the three FRL facets) and reducing the customary 5-point rating scale to a 3-point scale, it was possible to complete the interview in 3 to 5 min. Phone calls continued until seven to nine workers have responded in each work team in experimental and control groups, resulting in discourse-based data for supervisory feedback sessions.
The feedback delivery process proceeded as follows. Averaged follower ratings for each of the three leadership behavior facets, presented as bar graphs, offered each supervisor with individualized feedback. Three comparison benchmarks were incorporated into the feedback display as follows: (a) the median score of the other supervisors in the experimental group, (b) population norms for each leadership behavior facet (adapted from Bass & Avolio, 1999), and (c) learning goals, identifying the aspired score for each leadership behavior facet as set by each participant after reviewing his first set of feedback data. Using meta-analytic results regarding factors affecting feedback effectiveness (Kluger & DeNisi, 1996), feedback provision in this project incorporated the following features: (a) making recipients believe that feedback information is accurate, (b) making the feedback non-threatening, (c) suggesting a need for change by using a personally relevant benchmark criterion, (d) making recipients believe that change is feasible, and (e) offering feedback by means of a neutral facilitator (Smither, London, & Reilly, 2005).
Feedback sessions lasted between 30 and 45 min, and they were conducted at each supervisor’s office. We used 4 hourly paid undergraduate students for data collection and two graduate students for conducting the feedback sessions. Feedback facilitators received a brief training session, focusing on three issues: (a) helping supervisors to interpret their individual bar graphs and compare it with benchmarks, (b) highlighting the fact that the data represent leader behaviors as perceived by followers during recent conversations, and (c) demonstrating how to select self-set goals designed to increase desired leadership behavior scores for future leader–follower conversations.
Measures
Leader behavior ratings were obtained during structured phone interviews in which interviewers used a checklist whose six items referred to leader behaviors associated with the three facets of the FRL model (Bass & Avolio, 1997). Interviewed workers had to respond to each item by using a 3-point rating scale, indicating the extent to which their supervisor had displayed that behavior during their last conversation. The scale ranged from 0 (none) to 1 (little) to 2 (much). Instructions indicated that there was no wrong answer as the rating had to do with perceived supervisory behavior. Transformative behavior items included “Expressed confidence in my ability to solve problems on my own” and “Listened attentively to my concerns.” Transactive (corrective) behavior items included, “Spoke about falling behind schedule or not working hard enough” and “Talked about a mistake or failure to work by the book.” Passive behavior items included, “Remained nonspecific or vague; did not speak clearly” and “Made me feel he doesn’t care about the way I do my work.” Given two items for each facet, internal consistency was tested by computing the correlation between items across the entire sample (i.e., workers belonging to experimental and control groups), based on data collected before the first feedback session. Pairwise correlation coefficients were as follows: .70 (transformative behavior), .84 (transactive/corrective behavior), and .76 (passive behavior). Consequently, behavior scores were computed based on the average rating of both items per each respondent. Supervisory feedback was presented graphically as a three-bar diagram representing the mean ratings of seven to nine randomly chosen respondents in each team.
Transformational leadership was measured before and after intervention using the 12 highest-loading items related to the four factors of this construct in Multifactor Leadership Questionnaire (MLQ) (Bass & Avolio, 1997). Given meta-analytic results indicating strong relationships between the four factors (Judge & Piccolo, 2004), we followed the practice of combining them into a single score (e.g., Carless, 1998; Howell & Hall-Merenda, 1999; Judge & Bono, 2000). Items were accompanied by a 5-point rating scale ranging from 1 (not at all) to 5 (completely agree). Sample items include, “My supervisor talks enthusiastically about what needs to be accomplished,” “Acts in ways that build our respect,” and “Suggests new ways of looking at how to complete assignments.” Scale items were based on a validated MLQ translation that has been widely used in this country. Coefficient alpha reliability of this scale was .79.
Transactional (corrective) leadership was measured before and after intervention using six items of the MBE (active) sub-scale of MLQ (Bass & Avolio, 1997). Items were accompanied by the same 5-point rating scale described above. Sample items include, “Focus attention on irregularities, mistakes, exceptions, and deviations from standards”; “Keep track of all mistakes”; and “Watch for any infractions of rules and regulations.” Coefficient alpha reliability of this scale was .92.
Passive leadership was measured before and after intervention using six items of the LF sub-scale of MLQ (Bass & Avolio, 1997). Items were accompanied by the same 5-point rating scale described above. Sample items include “Avoid making decisions,” “Delay responding to urgent questions,” and “Fail to follow-up requests for assistance.” Coefficient alpha reliability of this scale was .79.
Safety behavior was measured before and after intervention using a six-item scale developed by Griffin and Neal (2000). Items were accompanied by a 5-point rating scale ranging from 1 (completely disagree) to 5 (completely agree). Scale items refer to the two dimensions of safety behavior, that is, compliance and initiative. Given strong correlations between both dimensions in previous studies (Griffin & Neal, 2000) and in the current study (r = .80), they were combined into a single safety behavior score. Sample items include, “I carry out my work in a safe manner,” “I use the correct safety procedures for carrying out my job,” and “I put in extra effort to improve the safety of the workplace.” Coefficient alpha reliability of this scale was .81.
Safety audits were performed before and after intervention by two externally affiliated safety experts who remained unaware of this study rationale or work-team allocation to experimental and control groups. Audits were performed independently by each auditor following the European Commission Guide (1995), requiring auditors to judge risk protection based on three to five walk-around tours. Prior to the start of the project, team supervisors were asked to go over the risk/hazard checklist in the audit guide and identify items that were under their team members’ control. Risks affected by senior-management action were removed from the list, resulting in a list of 13 items accompanied by a 10-point rating scale ranging from 1 (poor risk protection) to 10 (excellent risk protection). Items cover issues such as horizontal/vertical risks, hand-held equipment risks, fire and electrical risks, hazardous material exposure, and collective and personal protection. Exploratory factor analyses revealed a single global factor after the removal of two items, with an internal consistency of .77. Such auditing allows comparison between work teams performing different kinds of jobs. Scoring reliability was tested by comparing the audit scores for each work team between the two auditors, resulting in rs = .81 (p < .01).
Team work was measured with eight items taken from the team interaction frequency and openness sub-scales of the Team Climate Inventory (Anderson & West, 1998). Items were accompanied by a 5-point rating scale ranging from 1 (not at all) to 5 (completely agree). Sample items include, “We keep in touch with each other as a team,” “We interact frequently,” and “We share information generally in the team rather than keeping it to ourselves.” Coefficient alpha reliability for this scale was .79.
Results
Descriptive statistics for variables in our statistical models are presented in Table 1. Data in this table are presented separately above and below the diagonal, allowing comparisons between experimental and control groups and pre- and post-intervention measurements. As can be observed, the pattern of correlations remains stable for both groups, indicating that our intervention had no effect on relationships among variables, just their means. Furthermore, whereas group means for the experimental and control groups were quite similar at Time 1 (T1; pre-intervention) across all variables, the mean of transformational leadership increased and that of passive leadership decreased quite substantially at Time 2 (T2; post-intervention) for the experimental group, remaining essentially stable for the control group. Contrary to expectation, however, the mean of transactional (corrective) leadership remained unchanged for the experimental group.
Inter-Correlations and Descriptive Statistics of Variables Used for Before- and After-Training Comparisons.
Note. Time 1 (T1; pre-training) data are above diagonal; Time 2 (T2; post-training) data are below diagonal. Experimental group n = 154; control group n = 159. If r > .19, then p < .05; If r > .29, then p < .01.
Table 2 presents the descriptive statistics of within-group differences between the two feedback sessions concerning leadership behaviors during sampled conversations. The data in this table indicate that whereas at baseline (i.e., first feedback session), there were no observable differences between the experimental and control groups, such differences emerged from the second feedback session. ANOVA results for transformative behavior were as follows: Group (Experimental vs. Control; F = 238.2; p < .01), Time (Feedback 1 vs. 2; F = 182.1; p < .01), and Interaction (Group × Time; F = 337.3; p < .01). Results for transactive (corrective) behavior were as follows: Group (F = 16.8; ns), Time (F = 26.1; ns), and Interaction (F = 42.9; ns). Finally, results for passive behavior were as follows: Group (F = 268.7; p < .01), Time (F = 377.1; p < .01), and Interaction (F = 345.2; p < .01). These results support Hypotheses 3a and 5a, as transformative behavior increased and passive behavior decreased significantly, but the hypothesized increase in transactive (corrective) behavior (Hypothesis 1a) did not take place. As evidenced by the mean scores in Table 2, the latter can be attributed to the high baseline level for transactive (corrective) behavior at the beginning of intervention, resulting apparently in a ceiling effect for both experimental and control groups.
Descriptive Statistics of Supervisory Behaviors as Perceived by Workers During Routine Leader–Member Exchanges.
Note. Numbers refer to group means; numbers in parentheses refer to standard deviations. Numbers are derived from a 3-point rating scale (0-1-2) filled out by exchange recipients. Given that supervisors in control group received no feedback, their data reflect exchange recipients’ ratings in line with the two feedback sessions.
Table 3 presents the within-unit intraclass correlation coefficient (ICC[1]) values measured before and after intervention for each of the (group-level) training outcome variables (Bliese, 2000; Kozlowski & Klein, 2000). The table also includes ICC values for team work, serving as a control variable in the statistical models described below. As can be seen in this table, although ICC values remained significant before and after intervention, its level increased after intervention for the transformational and passive leadership variables. Such increase indicates greater within-group homogeneity resulting apparently from our intervention.
Within-Unit ICC Values Measured Before and After Training for Group-Level Variables: Transformational, Transactional (Corrective), and Passive Leadership, Team Work and Safety Behavior.
Note. ICC values refer to the ICC(1) statistic. ICC = intraclass correlation coefficient.
p < .001.
Given the hierarchical structure of our data (i.e., subjects nested in time due to repeated measurements and in organizational units) and its within-unit homogeneity, it was analyzed with a mixed-effects model, using SAS Mixed Procedure software (SAS version 9.3). The effects of intervention on the subsequently measured levels for each of the three leadership perception facets were tested with linear mixed models. The independent variables were as follows: Time (T = 0 before intervention; T = 1 after intervention), Group (G = 1 for experimental group; G = 2 for control group), the interaction between time and group, and a control variable identified as team work. The repeated option offered in SAS Mixed Procedure was used to control for differences between measurement times and organizational units and for correlations between measurement times within units. The same statistical model was used for testing the remaining hypotheses regarding expected main and interaction effects between time and group for each of the dependent variables.
Results of our mixed-effects model analysis, as presented in Table 4, compare the effects of training on leadership perceptions using its respective variance terms, such that the sources of non-independence can be determined. The interaction term in this table (i.e., Time × Group) represents the difference of change between Time = 0 (before training) and Time = 1 (after training) for the experimental and control groups. Interaction effects in this table depend on the size (and statistical significance) of differences between the two experimental groups whose repeated measurements are nested in time (i.e., from Time = 0 to Time = 1). As can be seen in Table 4, the Time × Group interaction terms were statistically significant with regard to transformational and passive leadership perceptions but not transactional (corrective) leadership. These results support Hypotheses 3b and 5b, as transformational leadership scores increased and passive leadership scores decreased significantly following the intervention, but the hypothesized increase in transactional leadership (Hypothesis 1b) due to intervention did not happen. The shapes of these interactions are presented graphically in Figures 1 and 2.

Interaction between Group and Time for transformational leadership change.

Interaction between Group and Time for passive leadership change.
Results of Mixed-Effect Models Comparing the Effect of Training on Leadership Dimensions in Experimental and Control Groups.
Note. Time concerns the main effect of training (T = 0 before training; T = 1 after training); Group concerns the main effect of experimental design (G = 1 for experimental group; G = 2 for control group); T × G interaction concerns the differential effect of intervention on both groups. T = Time; G = Group.
p < .05. **p < .001.
The same mixed-effects statistical model was used for testing the effect of training-induced changes in supervisory leadership behavior on followers’ safety behavior. Given the fact that both safety behavior dimensions (i.e., compliance and citizenship) were strongly related, we tested relevant hypotheses using a single score representing their mean level. Table 5 presents the main and interaction effects of Group and Time intervention variables on safety behavior as well as the incremental effect of transformative, transactive (corrective), and passive leadership behavior on safety behavior after controlling for the main and interaction effects of intervention. As can be seen in this table, main and interaction effects of training on safety behavior were significant, supporting Hypotheses 2a, 4a, and 6a. In addition, transformative and passive leadership behaviors offered incremental prediction of worker safety behavior, while transactive (corrective) leadership behavior had no such effect. These results add support for Hypotheses 4a and 6a while failing to support Hypothesis 2a.
Results of Mixed-Effect Models Comparing the Incremental Effect of Leadership Dimensions on Safety Behavior in Experimental and Control Groups.
Note. Time concerns the main effect of training (T = 0 before training; T = 1 after training); group concerns the main effect of experimental design (G = 1 for experimental group; G = 2 for control group); T × G interaction concerns the differential effect of intervention on both groups. T = Time; G = Group.
p < .05. **p < .001.
Finally, testing the main and interaction effects of intervention on the (independently measured) safety audit scores led to the following data: Time (F = 4.21; p < .05), Group (F = 5.87; p < .01), Time × Group interaction (F = 4.41; p < .05), and Team work (F = 4.10; p < .05). Figure 3 describes the shape of interaction for safety audits. Tests of the incremental effect of modified leadership behaviors on safety audits (i.e., after controlling for Time, Group, and Time × Group interaction) led to the following results: transformative leadership behavior (F = 3.81; p < .05), transactive (corrective) behavior (F = 8.14; p < .01), and passive behavior (F = 1.16; ns). These results support Hypotheses 2b and 4b, while failing to support Hypothesis 6b. Notably, unlike the case with safety behavior as dependent variable, these data support the incremental effect of transformative and transactive (corrective) leadership behaviors while failing to support the effect of passive behavior on safety audits.

Interaction between Group and Time for safety audit change.
Effect-size estimates, using the
log LM is the maximum log likelihood of the model and log L
Effect-Size Estimates Using
Note.
p < .001.
Discussion
Results of this field experiment support the inferred causality between perceived changes in verbal leader behavior and corresponding changes in follower leadership perceptions and task (safety) performance. Specifically, induced changes in leadership messages resulting from the feedback and goal-setting intervention produced subsequent changes in follower leadership perceptions and role behaviors. Given that most empirical evidence concerning this proposition has been based on correlational data, our study design, allowing testing of causal relationships between variables, offers a contribution to the leadership literature. An additional contribution concerns the incorporation of levels-of-analysis framework into leadership research. Using event-level data referring to randomly sampled events (i.e., leader–follower exchanges), our study offers incremental evidence-based support for the leader behavior–follower leadership perceptions proposition that has been offered by the traditional between-person design.
The fact that our training proved effective in increasing transformational and decreasing passive leadership while failing to change transactional (corrective) leadership suggests that contextual variables may affect the outcomes of our intervention strategy. In the present case, relevant contextual variables seem to relate to the framing of our intervention as a safety leadership-training project. In the context of safety leadership, given the prevalence of workarounds under regular (if hazardous) work conditions (Halbesleben, 2010), feedback data indicating high transactive (corrective) message scores are likely to be considered an effective managerial practice, even if it exceeds population norms. This interpretation is supported by two outcomes in our study. First, the pattern of our participant self-set learning goals was characterized by setting higher-level goals for transformative messages, lower-level goals for passive messages, and same-level goals for transactive (corrective) messages, resulting in corresponding mean difference scores of 0.85, −0.76, and 0.09 on a 3-point scale (see Table 2). Such data implicate satisfactoriness of the current (high) level of transactive (corrective) messages in the context of safety management.
Second, our results indicated that, unlike the case for follower safety behavior as the outcome criterion, transactive (corrective) leadership messages offered the strongest effect on safety audits. Given that safety audits, focusing on the extent of deviation from rules, regulations, and standards, have been shown to offer strong prediction of accidents (Glendon et al., 2006), its score constitutes a reliable safety metric. Hence, having framed our training in the context of safety leadership, trying to reduce transactive (corrective) messages to its normative level made, apparently, little sense to participating supervisors.
An added contribution of this study concerns its use of verbal (i.e., discourse-based) data, which has been little incorporated into leadership research despite the primacy of social interaction in leadership roles (Phillips & Oswick, 2012). By using feedback based on such data, this study expands the small body of research focusing on language as a primary dimension of leader behavior. Given the premise that leaders must talk with group members to influence and organize their behavior (Phillips & Oswick, 2012), and availability of quantitative discourse analytic methodologies (Alvesson & Karreman, 2000; Phillips & Oswick, 2012), inclusion of organizational discourse in leadership research offers an important contribution. We hope that our study stimulates leadership scholars to incorporate discourse-based research into their laboratory or field-based studies.
It is worth noting that, although our feedback data were event based, it offered no contextualized information and episodic details. Although such information could have increased feedback effectiveness (Amabile, Schatzel, Moneta, & Kramer, 2004; Dinh & Lord, 2012), we decided to avoid using it based on pragmatic and ethical considerations. Namely, given that the inclusion of specific episodic details would have identified each respondent’s identity, it would have become impossible to maintain their anonymity and confidentiality. Based on reporting relationships between leaders and followers, this would have resulted in significantly reduced (or positively biased) response rates, affecting the intervention efficacy.
If our results are replicated and shown to be generalizable, this study offers two practical implications. First, our intervention was designed to be cost effective. Whereas most leadership interventions are conducted by experts recruited within or outside the organization, our intervention was conducted by undergraduate and graduate students who have undergone minimal training. Direct costs of the project were largely based on hourly student payment rates, whereas indirect costs were largely associated with lost employee time upon taking our 5-min phone interviews. Costs were further curtailed by limiting the intervention to two individualized feedback sessions at participant offices. Effect-size estimates provided in Table 6 indicated that our intervention increased explained variance of transformational leadership by 16% and passive leadership by 18%. Although, due to the use of incompatible statistics, it is difficult to compare our effect sizes with the meta-analytic estimates reported in Avolio, Walimbwa, et al. (2009), our results suggest rather substantial effects in terms of explained variance.
Second, our intervention was designed to expand leadership-training strategies. As noted by literature reviews and meta-analyses, most leadership development studies used a lecture accompanied by informal discussion, role playing, and/or vignette-based exercise (Avolio, Reichard, Hannah, Walumbwa, & Chan, 2009; Collins & Holton, 2004). By contrast, our strategy offers event-based feedback and goal-setting intervention, using in-field leader behavior during routine conversations as raw data. Our use of in-field data, as opposed to classroom workshops, resembles the experience-based leadership development strategy (McCall, 2010; McCall & Hollenbeck, 2007).
Study Strengths and Weaknesses
Methodological strengths of this study concern (a) designing the study as a field experiment consisting of random assignment of supervisors to the experimental and control groups; such design allowed for the testing of causal relationships between leader behavior and follower leadership perceptions; (b) inclusion of a level-of-analysis framework into leadership research by using event-level data coupled with random event sampling (i.e., random sampling of both interviewed group members and interview timing); and (c) inclusion of verbal data in leadership research that have been little studied despite being the primary medium for interacting with followers.
The fact that our project was designed as randomized field experiment in which supervisors were randomly allocated to the two study groups might have resulted in a mix of study strengths and weaknesses. Given such design, supervisors and workers could have observed or communicated with members of the opposite experimental groups. Such design could have affected our results in a number of ways: (a) reducing effect sizes for the experimental group due to vicarious learning by control-group members, (b) offering a more stringent test of our hypotheses, and (c) resulting in oppositely signed outcomes of experimental group selection on follower ratings; that is, higher ratings for those expecting the intervention to work and/or lower ratings for those who have had higher expectations due to the intervention. Given such complexity, there is a need for replicating our results to test its reliability.
Study weaknesses relate largely to five issues. First, our intervention was framed as safety leadership rather than team leadership training. This resulted from the fact that company management approved our project based on its expected safety benefits, considering that safety served as one of the company’s strategic goals at that time. As noted above, such framing seems to have affected our outcomes in terms of participant resistance to decrease their transactive (corrective) leadership behavior despite its high baseline level.
Second, our feedback intervention was confined to shop-floor supervisors. Given that they constitute the lowest level of management, there is a need for expanding this intervention to include higher-level managers in the organization. Assuming a top-down process in which senior-management messages influence lower-level managers, it can be expected that subjecting higher-level managers to our intervention would result in changes filtering down to lower levels, promoting system-wide changes in the organization.
Third, our intervention adopted the leadership profile recommended by the FRL model (Bass, 1985; Bass & Avolio, 1999), which has recently come under conceptual and methodological criticism (Van Knippenberg & Sitkin, 2013). Although we took into account some of these issues, stemming largely from significant overlaps between model elements, there is a need for replicating this study with other leadership models.
Fourth, although event-level analysis offers a number of methodological advantages, it also suffers a number of limitations. Foremost among them is recency error due to which leader’s behavior in the most recent conversation serves as the unit of analysis for feedback data. Although we have tried to minimize such error by averaging some 10 conversations with followers, this unit of analysis is likely to misrepresent leader habitual, schemata-based practices. Therefore, there is a need to study cross-level relationships by using group- and event-level data to better understand processes affecting the emergence of follower leadership perceptions (Morgeson, 2005).
Finally, our intervention was confined to face-to-face exchanges between leaders and their followers. Given the prevalence of written communication via email or smartphones, there is a need to replicate this study while using other communication media.
In conclusion, this study used event-level data and randomized field experimental design to test causal relationships implicated by the leader behavior–follower leadership perceptions proposition. We hope that our study will stimulate leadership scholars to employ multilevel methodology in the investigation of long-held propositions and new conceptual models.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Associate Editor: William Gardner
