Abstract
Activism seeking to improve labor conditions in global supply chains has led many transnational corporations to adopt codes of conduct and to monitor suppliers for compliance. Drawing on thousands of audits conducted by a major social auditor, the authors identify structural contingencies in the institutional environment and program design under which codes and monitoring are more likely to be associated with improvements in conditions. At the institutional level, suppliers improve more when they face greater risk that nongovernmental organizations and the press will expose harmful working conditions. They also improve more when their buyers have experienced negative publicity for supply chain labor abuses. At the program design level, suppliers improve more on average when audits are pre-announced, when auditors are highly trained, and especially when both elements are present. Extended analysis of variations across violation types reveals nuances to these findings. For instance, pre-announced audits were followed by greater improvement in occupational safety and health practices but not child labor practices. These findings can inform strategies for improving supply chain working conditions.
Widely reported labor abuses and pressure from consumers and labor activists have prompted high-profile brands to adopt formal organizational structures, such as codes of labor conduct and social monitoring, to address risks in their global supply chains. Nike developed an extensive program of codes and monitoring following reports of child labor in its supplier factories; Apple strengthened its supplier monitoring program in response to worker suicides and reports of rampant labor abuses at its supplier Foxconn’s factory in China; and prominent European apparel brands enhanced their existing programs following the collapse of the Rana Plaza building that housed their suppliers, killing more than 1,000 Bangladeshi factory workers.
Beyond such high-profile cases, codes and monitoring have become ubiquitous in global value chains. To avoid negative publicity generated by accidents and activism, thousands of transnational corporations (TNCs), including all US Fortune 500 companies, have adopted codes of conduct that require their suppliers to meet specified workplace standards (McBarnet 2007), and many conduct social audits to monitor and assess suppliers’ adherence to those codes (Short, Toffel, and Hugill 2016). Codes and monitoring are also used by multi-stakeholder initiatives—such as the Roundtable on Sustainable Palm Oil, the Electronics Industry Citizenship Coalition, and the Ethical Trading Initiative—that provide collective fora for private regulation of supply chain practices. Hundreds of thousands of audits are conducted on behalf of TNCs and multi-stakeholder initiatives each year (Gould 2005). Some activist groups have claimed that social auditing is an $80 billion/year industry (AFL-CIO 2013).
While TNCs generally adopt codes and monitoring for business purposes, including information gathering and reputation management, these organizational structures have been embraced by nongovernmental organizations (NGOs) and other international organizations as part of a broader strategy to improve conditions for supply chain workers (Utting 2005; LeBaron and Lister 2015). Codes and monitoring are central to the strategy of NGOs that alternately agitate against and partner with TNCs to encourage ever-stricter standards and more robust monitoring. United Nations initiatives in the area of business and human rights, such as the Global Compact and the Guiding Principles on Business and Human Rights, rely on private standards and monitoring to improve companies’ global labor and other human rights practices (Ruggie 2008). Responding to these calls for improvement, leading social auditing firms advertise that their monitoring services will help “both suppliers and customers in implementing sustainable business practices and improving workplace conditions in global supply chains” (UL Responsible Sourcing 2015 [emphasis added]; see also Elevate 2016; Intertek 2016).
However, it is not clear when, or even whether, codes and monitoring improve supplier labor practices. Many observers have argued that such organizational structures are, at best, window dressing (Esbenshade 2004; Frynas 2005; Seidman 2007; Barkemeyer 2009) or, worse, “organized hypocrisy” (Lim and Tsutsui 2012: 69)—that is, a calculated ploy to undermine labor organizing efforts (Justice 2006) and fend off more stringent state regulation (Utting 2005; Barkemeyer 2009; Shamir 2011). Certainly, codes and monitoring bear all the hallmarks of symbolic structures that are likely to be decoupled from actual supplier practices (Meyer and Rowan 1977).
Nevertheless, we argue that decoupling is not inevitable. We extend the literature on decoupling by investigating the structural contingencies that account for variation in the extent to which supply chain codes and monitoring are associated with improvement in working conditions. Specifically, we ask: How do suppliers’ compliance improvement rates differ depending on variations in institutional compliance pressures and in the design of monitoring programs? We also examine the interaction among program design elements. Identifying the conditions under which codes and monitoring are associated with actual improvements in labor conditions is critically important because they are the dominant mode of labor standards regulation in global value chains. We make no claims about their efficacy relative to other vehicles for improving labor standards, including more robust state-based regulation or worker organizing and empowerment. Rather, our study explores how the efficacy of the existing, highly institutionalized system of codes and monitoring could be enhanced.
We test our hypotheses using a novel data set of thousands of audits for code-of-conduct compliance conducted at nearly 5,000 factories spanning 13 industries in 66 countries by one of the world’s largest supply chain auditing firms. We use regression analysis to estimate a model that predicts a factory’s improvement in code compliance based on our hypothesized institutional pressures and program design attributes, controlling for audit and institution characteristics that might also affect improvement. By identifying structural contingencies that favor coupling, our findings challenge the assumption that codes and monitoring will inevitably be decoupled from practices and suggest key considerations that should inform the design and implementation of monitoring strategies to improve conditions in global supply chains.
Decoupling of Labor Codes and Labor Practices
Meyer and Rowan (1977) defined formal organizational structures as sets of practices and procedures that embody rationalized concepts of organizational legitimacy and theorized the conditions under which such structures would be decoupled from actual organizational practice. The decoupling literature strongly suggests that formal organizational structures like labor codes of conduct and social monitoring are likely to be adopted symbolically and decoupled from the suppliers’ actual labor practices.
First, there is a strong consensus in the literature that organizational structures adopted to gain legitimacy with external stakeholders rather than to satisfy the task-related efficiency demands of production will be implemented symbolically and decoupled from practices (Meyer and Rowan 1977; Boiral 2007; Bromley and Powell 2012). It would be difficult to find organizational structures that more “dramatically reflect the myths of their institutional environments instead of the demands of their [organizations’] work activities” (Meyer and Rowan 1977: 341) than codes and monitoring. Companies initially adopted codes and monitoring to deflect negative publicity and preserve brand reputation (Bartley and Child 2014). However, some of the substantive changes in production practices that such structures demand—minimum wage requirements, overtime restrictions, and freedom of association rights—are fundamentally at odds with the economic logic of global value chains that seek to minimize production costs (Locke 2013).
Second, symbolic structures are more likely to be decoupled in contexts in which efficiency demands are strong and not tempered by countervailing institutional pressures (Meyer and Rowan 1977; Meyer, Boli, Thomas, and Ramirez 1997). Suppliers to global value chains face intense efficiency demands to produce high volume at low cost (Gereffi, Humphrey, and Sturgeon 2005), and many are in countries with weak regulatory institutions and lax enforcement of labor standards.
Finally, resource constraints impede substantive implementation of formal organizational structures (Meyer et al. 1997; Bromley and Powell 2012; Lim and Tsutsui 2012). Suppliers operate on razor-thin margins and many lack the resources to effectively implement code requirements. Thus, the axioms that emerge from the decoupling literature suggest that formal organizational structures like codes and monitoring are likely to be ceremonial window dressing, “implemented, evaluated, and monitored so weakly that they do little to alter daily work routines” (Bromley and Powell 2012: 489) in ways that improve workers’ conditions.
Against this grain, a growing research stream focuses on the conditions under which formal organizational structures adopted symbolically are nevertheless implemented substantively or are coupled with organizational practices (Bromley and Powell 2012; Bartley and Egels-Zandén 2015). Consistent with the decoupling literature, most studies that find coupling attribute it to coercive institutional pressures, particularly to forms of state power, such as regulatory inspection and enforcement (Dobbin and Kelly 2007; Short and Toffel 2010; Marquis and Qian 2014). Other studies identifying successful coupling of symbolic structures have been of voluntary programs implemented in the context of broader, legally backed state regulatory regimes, such as US anti-discrimination law (Kalev, Dobbin, and Kelly 2006) and environmental law (Potoski and Prakash 2005).
Studies specifically investigating suppliers’ compliance with labor codes of conduct similarly find that codes and monitoring are associated with better working conditions when combined with robust government regulatory efforts (e.g., Rodríguez-Garavito 2005; Seidman 2007; Amengual 2010; Locke, Rissing, and Pal 2013; Toffel, Short, and Ouellet 2015; Amengual and Chirot 2016; Distelhorst, Hainmueller, and Locke 2017). In addition to the coercive power of the state, studies have found that institutional pressures from unions (Bartley and Egels-Zandén 2015; Oka 2015), a free press (Toffel, Short, and Ouellet 2015), NGOs (Seidman 2007; Fransen 2012; Zajak 2017), and brands (Oka 2010a; Bartley and Egels-Zandén 2015) can induce suppliers to couple their symbolic commitments to codes with their actual labor practices.
Some qualitative studies have expanded the decoupling literature’s traditional focus on coercive institutions to investigate how the activities of individual actors inside organizations can create contingencies that promote coupling (Espeland 1998; Hallett 2010; Overdevest 2010; Tilcsik 2010; Bartley and Egels-Zandén 2015). For example, in a study of Indonesian apparel and footwear factories, Bartley and Egels-Zandén (2015) found that the coupling of labor codes of conduct and supplier labor practices was contingent on union members’ ability to forge and leverage relationships with brands, international NGOs, and global unions to pressure suppliers to meet their code commitments.
Bartley and Egels-Zandén (2015) highlighted the importance of identifying contingencies that can promote coupling in contexts where theory conventionally predicts a high likelihood of decoupling. To date, such research has focused largely on the “thorny, on-the-ground processes” ((Bartley and Egels-Zandén 2016: 233) that couple organizational structures and practices in a single firm (Hallett 2010; Tilcsik 2010) or in a few firms in the same institutional context (e.g., Bartley and Egels-Zandén 2015). We extend this research with a large-scale study that examines structural contingencies of coupling across a range of institutional contexts. Our hypotheses investigate the coupling potential of structural contingencies at two levels: institutional and programmatic. Specifically, we hypothesize how the coupling of labor codes of conduct and supplier labor practices is likely to be associated with 1) institutional pressure from civil society groups on suppliers and brands and 2) key design features of the monitoring programs that brands adopt. We focus on these factors because they are structural contingencies over which brands and/or activists might exercise influence. Brands, for instance, must choose how to design their auditing programs, and activists must choose where to target their limited resources and what types of pressure to apply. Our hypothesized variables yield insights that can inform these decisions.
Our work builds on existing large-scale studies of coupling and labor codes of conduct in three important ways. First, we examine contingencies at the level of program design that other studies have overlooked (e.g., Toffel, Short, and Ouellet 2015; Bird, Short, and Toffel 2019) or have rejected as explaining compliance outcomes. Indeed, recent studies that explain coupling outcomes based on local institutions (Distelhorst, Locke, Pal, and Samel 2015) or social structures governing labor relations (Distelhorst et al. 2017) explicitly note that monitoring system design is not driving these outcomes. This gap in the literature mirrors the broader literature on decoupling where, with few exceptions (e.g., Kalev, Dobbin, and Kelly 2006), little attention has been paid to the influence of program design on coupling outcomes. Second, we examine and find contingencies operating at multiple levels: from the details of program design to broad institutional pressures. Existing studies tend to focus on one level—for instance, either institutional (Toffel, Short, and Ouellet 2015) or organizational (Distelhorst et al. 2017; Bird, Short, and Toffel 2019)—or they fail to identify hypothesized contingencies at multiple levels (Distelhorst et al. 2015).
Finally, we measure coupling by the improvement in suppliers’ compliance with the labor standards contained in codes of conduct. Most research in this field has measured levels of supplier compliance at a moment in time rather than improvement over time (e.g., Egels-Zandén 2007; Locke, Qin, and Brause 2007; Oka 2010a, 2010b; Ang, Brown, Dehejia, and Robertson 2012; Bartley and Egels-Zandén 2015; Toffel, Short, and Ouellet 2015). The extent to which suppliers change their practices over time to conform to codes of conduct is a critical marker of whether codes and practices are coupled within supplier organizations. In our empirical context, however, little is known about why some suppliers improve their compliance more than others. A few studies have observed that, in the aggregate, supplier compliance with codes has improved over time (Locke, Qin, and Brause 2007; Shea, Nakayama, and Heymann 2010; Nadvi, Lund-Thomsen, Xue, and Khara 2011; Ang et al. 2012; Locke, Rissing, and Pal 2013; Toffel, Short, and Ouellet 2015). However, these studies have not hypothesized conditions associated with improvement.
We build on a handful of studies that have examined empirically the factors associated with code compliance improvement. Some measure improvement at the national or regional level, leaving open the question of what accounts for variation in individual suppliers’ improvement (Weil and Mallo 2007; Ang et al. 2012). Others identify characteristics shared among small samples of firms that all improved their code compliance, but these studies fail to identify factors that explain why some factories improved more than others (e.g., Egels-Zandén 2007; Locke, Rissing, and Pal 2013). A few studies have revealed variation in improvement across factories based on the internal management structures in supplier organizations (Oka 2015; Distelhorst et al. 2017; Bird et al. 2019), but they do not look at influences outside the organization. Our study contributes to this literature by identifying both institutional and programmatic contingencies that explain variation in supplier improvement.
Institutional Compliance Pressures
In their fight to improve supply chain labor conditions, “[a]ctivists’ main weapon against corporations is their ability to threaten corporate reputations by exposing malfeasance” (King 2014: 203). Global buyers are particularly sensitive to the possibility that their suppliers’ labor abuses will be exposed, because even one incident in the supply chain can damage carefully cultivated corporate reputations (Oka 2010a). We explore two distinct but related sources of institutional pressure that increase reputational risk for global buyers: 1) the ability of civil society actors to discover and expose supplier abuses and 2) buyers’ past exposure to negative publicity about their suppliers’ practices.
Institutional Pressure in Suppliers’ Domestic Environments
Although government inspection regimes are often weak in countries where suppliers are located, research suggests that civil society actors, such as NGOs and the press, can provide monitoring functions and expose wrongdoing (Seidman 2007; Fransen 2012; Zajak 2017). The local press and local NGOs play symbiotic roles in transnational advocacy networks that promote global norms such as labor standards and human rights (Keck and Sikkink 1998: 3). The high-profile, international NGOs that are often at the center of such networks depend on local NGOs to collect information about violations of global norms by local actors. In fact, some recent studies have found local organizations to be more important than their global counterparts in transnational advocacy campaigns (Zajak 2017). The strategy of the global anti-sweatshop movement has been to work with local NGO partners to identify which local firms supply targeted global brands and to do the painstaking investigative work required to reveal exploitive labor practices at these suppliers (Bartley and Child 2014; Zajak 2017). Local NGOs, in turn, depend on domestic media and domestic channels of communication to make exploitive practices known to their more powerful international counterparts in the advocacy network (Keck and Sikkink 1998; Bartley and Child 2014; Zajak 2017). The more free and open these information channels, the more likely local abuses are to attract local and ultimately international attention, condemnation, and discipline (King 2014).
Research on civil society pressure and standards compliance has focused on how monitoring by civil society actors affects compliance levels at a particular point in time. Toffel et al. (2015), for instance, demonstrated that suppliers in countries with more press freedom exhibit greater compliance with codes of conduct. However, it is not clear that the mechanisms fostering high compliance levels will also foster improvements in compliance. Indeed, in high-compliance environments, there may be less room to improve because of a lack of low-hanging fruit (Chatterji and Toffel 2010), or because there may simply be less pressure to improve if conditions are not so bad. Similarly, it is not clear that the presence of civil society actors like NGOs will be related to compliance improvement in the same way that it is related to compliance levels. For instance, domestic environments with very low or very high labor-standards compliance might attract more NGOs, but a distinct and important question is whether their presence will be associated with improvements in compliance.
We argue that institutional pressure generated by civil society actors such as the press and NGOs, and particularly those two in combination, will be associated with improvements in supplier compliance. Suppliers whose failure to meet the global norms prescribed by codes of conduct is documented in audits become attractive targets for transnational advocacy networks seeking to raise international labor and human rights standards. 1 These networks seek to identify violators of global norms and induce them to change (Keck and Sikkink 1998: 3). Local NGOs in these networks see their role as “promoting change by reporting facts” that can attract the attention and support of international NGOs, press, and policymakers (Keck and Sikkink 1998: 19). A free local press helps them discover and publicize labor abuses at suppliers, increasing the prospect that the suppliers—and others like them—will be disciplined by their buyers or will suffer domestic political, legal, or economic consequences (Fransen 2012; Berliner et al. 2015; Zajak 2017).
We therefore hypothesize:
Targeted Institutional Pressure on Buyers
The impact of the institutional pressures described above depend not only on the probability that suppliers’ wrongdoing will be exposed but also on buyers’ reaction to such revelations. The reputational stakes of exposure are higher for some buyers than for others. Buyers with particularly high-value reputations might be acutely sensitive to negative publicity (Abito, Besanko, and Diermeier 2016). Indeed, in the supply chain labor context, research has demonstrated that highly reputation-conscious buyers are more likely to work with suppliers that better comply with labor standards (Oka 2010a).
Less is known, however, about how buyers that have experienced negative reputation events might respond to the threat of additional reputational damage. It has been theorized that firms facing the prospect of reputational threats from activists, as many multinational buyers do, will attempt to forestall trouble by investing in self-regulatory activities (Abito et al. 2016). Buyers that have already suffered reputational shocks through criticism by or confrontation with activists are especially likely to invest in self-regulatory measures to avoid additional reputational harm (Abito et al. 2016). Because a damaged reputation can invite more activism, firms that have suffered reputational shocks are further incentivized to protect themselves through self-regulation (Abito et al. 2016; Dorobantu, Henisz, and Nartey 2017). Empirical studies confirm that companies with more reputational damage in the past are more likely to take actions to protect their reputation (Kotchen and Moon 2012; McDonnell and King 2013; McDonnell et al. 2015). What these studies do not reveal, however, is whether such protective measures reduce the social harms that gave rise to the reputational harm.
We argue that buyers who have had their reputation compromised by past revelations about their suppliers’ harmful practices are more likely than other buyers to make efforts to ensure that their suppliers correct abuses. We therefore hypothesize:
Design of Monitoring Program
A monitoring system has numerous components, from the stringency of the underlying substantive standards to the frequency and rigor of inspections to the composition of the inspection team. All of these design features have implications for suppliers’ social compliance and compliance improvement. We focus on auditor training and the pre-announcement of audits because, as we argue below, these are features that might facilitate the transfer of compliance-related knowledge from auditors to suppliers, thus promoting improved labor practices.
Auditor Training
There is much skepticism about whether social auditing can foster improvement, and questions have been raised about the competence of auditors and the integrity of the auditing process (O’Rourke 2002; Esbenshade 2004; LeBaron and Lister 2015). Critics charge that auditors lack the knowledge and independence to detect labor abuses (O’Rourke 2002; Esbenshade 2004; Locke, Amengual, and Mangla 2009; AFL-CIO 2013), that they shade their findings depending on the client’s perceived interests (LeBaron and Lister 2015), and that some are outright corrupt (Clean Clothes Campaign 2005). Others believe that auditors are easily misled by suppliers that maintain fake wage and hours records and coach their workers to lie to auditors about working conditions (Power, Ng, and Singh 2008; AFL-CIO 2013; LeBaron and Lister 2015). Some, including critics of social auditing, have suggested that more highly trained auditors could be more effective (Locke et al. 2009; AFL-CIO 2013), and research has indeed found that better-trained auditors identify more violations (Short et al. 2016).
We argue here that training will likewise enable auditors to help suppliers improve following an audit. Studies have documented that social auditors play an important pedagogical role, often instructing factory managers how to remedy the violations discovered (Amengual 2010). Our conversations with auditors and managers at social auditing firms indicate that the auditors’ training typically teaches them how to find violations and what conditions tend to cause them. Such training helps auditors identify root causes and develop compliance solutions. Recent evidence indicates that government inspections can prompt improved working conditions (Levine, Toffel, and Johnson 2012), suggesting that inspectors might play a dual role of assessing conditions and suggesting how to improve. Studies in the knowledge transfer literature find that certain types of training can improve the ability to convey information in personal interactions (Thompson, Gentner, and Loewenstein 2000; Loewenstein, Thompson, and Gentner 2003; Nadler, Thompson, and Van Boven 2003), and that information is more likely to be absorbed and acted upon when it comes from a source perceived to have expertise (Borgatti and Cross 2003; Thomas-Hunt, Ogden, and Neale 2003; Reinholt, Pedersen, and Foss 2011). We therefore hypothesize:
Signaling a Cooperative Approach to Social Auditing
Significant debate surrounds the approach buyers should take to audits, including whether they should be conducted in a policing style or a more cooperative style and, relatedly, whether they should be announced in advance. Some argue that pre-announcing gives suppliers time to cover up bad behavior (Clean Clothes Campaign 2005; AFL-CIO 2013; LeBaron and Lister 2015), and there is empirical evidence for that (Gray 2006; Marks 2012; Toffel et al. 2015). Worker rights advocates therefore long have favored unannounced audits (Frenkel and Scott 2002).
Although it seems clear that unannounced audits will reveal more information about supplier wrongdoing, it is less clear whether they will foster improvement. It is possible that suppliers might be motivated to improve their practices if they know that they can be caught at any time through a surprise audit. However, most buyers do not conduct audits regularly enough for unannounced audits to operate as a serious deterrent. Moreover, studies have suggested that a punitive, policing-style approach to monitoring can undermine compliance by dampening intrinsic motivations to comply (Short and Toffel 2010). That approach can also foster resentment among the regulated community that can lead to backlash against regulatory requirements (Bardach and Kagan 1982; Ayres and Braithwaite 1992). Social auditors report that conducting unannounced audits “to catch managers unaware . . . aggravates the relationship between buyers and suppliers” and that such “tensions make it difficult to achieve any sustainable change” (Gould 2005: 28).
A consensus has begun to emerge among academics and practitioners that suppliers are more likely to improve with a less punitive, more cooperative approach to monitoring (Locke et al. 2009). Rather than using audits to detect violations and threaten sanctions, the cooperative approach provides an opportunity “to engage in a process of root-cause analysis, joint problem solving, information sharing, and the diffusion of best practices that is in the mutual self-interest of the supplier, the auditors, and the global corporations for which they work” (Locke et al. 2009: 321). The underlying theory, developed most extensively by Ayres and Braithwaite in Responsive Regulation (1992), is that regulators’ signals of cooperation will be reciprocated with compliance. Studies have suggested that a cooperative approach to monitoring can help buyers, suppliers, and auditors develop trusting relationships that are more likely than punitive, arms-length approaches to improve compliance (Frenkel and Scott 2002; Locke and Romis 2007).
Although we do not observe the micro-level interactions among the buyers, suppliers, and auditors in our data and so cannot assess whether these parties have trusting or cooperative relationships, we argue that buyers formally signal trust and a cooperative approach to monitoring when they give suppliers advance notice of audits. Our conversations with ethical supply chain managers and social auditors consistently indicated that unannounced audits convey distrust and a punitive or policing approach to monitoring, with auditors sometimes denied entry to factories, whereas announced audits convey a more trusting and cooperative approach. At the very least, announcing an audit indicates trust in the formal economic sense of making the buyer vulnerable to the possibility of opportunism on the part of suppliers who might use the time afforded by advance notice to hide their misdeeds (Mayer, Davis, and Schoorman 1995). To be clear, our argument is not that providing advance notice of audits will cause suppliers to improve more rapidly; instead, we argue that announced audits signal to suppliers that they are subject to a more cooperative (less punitive) monitoring regime. These suppliers are more likely than others to perceive that they are trusted by their buyers and, consequently, these suppliers will be more willing to reciprocate that perceived trust by investing in improvement. Thus, although an announced audit might uncover fewer violations, we hypothesize:
Auditor Training in the Context of a Cooperative Approach to Auditing
Auditing might be particularly effective in improving supplier practices when more knowledgeable auditors engage with suppliers that are willing and able to receive the information. A substantial literature suggests that individuals and organizations share and absorb knowledge more effectively in collaborative, cooperative, and trusting relationships (Szulanski 1996; Dyer and Chu 2003; McEvily, Perrone, and Zaheer 2003). For instance, Cheng, Yeh, and Tu (2008) showed that the transfer of green production practices from buyers to suppliers is most effective when buyers let suppliers participate in decision making and when those buyers and suppliers trust one another. Buyers and suppliers surveyed by Oka (2010b) similarly reported more learning about compliance with workplace standards in trusting relationships. As we argued earlier in the article, announcing audits signals trust in the supplier; we therefore expect suppliers who receive advance notice to be more receptive to the knowledge auditors convey in those audits. Because highly trained auditors likely will have more and/or higher-quality information to convey, we hypothesize the following moderated relationship:
Recognizing that the hypothesized conditions might lead different types of violations to improve at disparate rates, we include below an extension to our analysis that disaggregates our dependent variable—improvement across all labor practices—to better understand which types of labor practices improve more (or less) rapidly under which conditions.
Data and Method
Empirical Context and Sample
We tested our hypotheses using data from code-of-conduct audits that were carried out by a large social auditing company (henceforth, the “social auditor”) that requested anonymity. During our sample period, the social auditor served Fortune 500 companies and was accredited to conduct audits of several leading social compliance standards. It operated in more than 100 countries and its staff spoke more than 20 languages. The data include audits conducted from 2004 through 2009, the most recent six-year period for which we could obtain access. Various characteristics of the audits, auditors, and audited suppliers were provided, including unique identifiers (but not names) for the auditors, the suppliers, and the buyers on whose behalf the audits were conducted. While many buyers issue their own supplier codes of conduct, our discussions with the social auditor revealed that the differences between these codes are slight, which gave us confidence in treating all of them in the same manner.
Because our empirical specification requires data from a supplier’s focal (current) audit and its prior audit, our sample is limited to those suppliers for which our data include at least two audits. Our estimation sample consists of 8,677 focal audits conducted at 4,940 suppliers spanning 13 industries in 66 countries. In our sample, factories are audited an average of every 202 days, with an interquartile range of 83 to 293 days. Those categorized as annual audits are conducted every 344 days on average. The most common industries in our sample are garments, accessories, electronics, and toys (Table 1). The majority of the audits took place in China; many of the rest took place elsewhere in Asia (Bangladesh, India, Indonesia, the Philippines, and Vietnam) and in North America (Mexico and the United States) (Table 2). Auditors tend not to specialize by industry, but instead are assigned to audits largely based on their geographic proximity (to minimize travel costs and time) and their availability and to ensure that every audit team includes a trained lead auditor. 2 The average auditor in our sample conducted audits of factories in nearly 5 of the 13 industries in our sample, and nearly 25% conducted audits in 8 or more industries.
Industry Composition of Audits and Audited Suppliers
Location of Audits and Audited Suppliers
Dependent Variable
Our dependent variable measures the change in a supplier’s compliance between its prior and focal audits. Audit data are the only available large-scale measure of supplier compliance with private labor standards, and audits are the most commonly used measure of labor standards compliance in the literature on global value chains (e.g., Locke et al. 2007; Oka 2010a, 2010b; Ang et al. 2012; Distelhorst et al. 2015; Toffel et al. 2015).
To avoid undue influence of outliers, we used a metric akin to the difference in log violations but which is calculable even when violation counts are zero. To calculate improvement, we divide the number of violations from the focal audit plus 1 by the number of violations from the previous audit plus 1, take the natural log of that ratio, and then multiply the result by −1 so that higher values reflect greater improvement:
where V i,t is the number of violations for supplier i audited at time t that pertain to child labor, forced or compulsory labor, working hours, occupational safety and health, minimum wage, treatment of foreign workers and subcontractors, and disciplinary practices (there are 75 possible violations across these domains) and where Vi,t-1 is the comparable figure from that supplier’s prior audit. 3 We add 1, the minimum non-zero value in the data, to both the numerator and the denominator to avoid losing observations in which either the current or prior audit yielded zero violations. 4 This metric, rather than the simple difference in violations, facilitates proportional comparisons between suppliers. 5 It also provides a more reliable estimate than a percentage change metric, which can be overly sensitive to outliers and can inflate large changes. 6 Our log form reduces skewness 7 and enables a straightforward interpretation of our coefficients as elasticities. Multiplying the log ratio by −1 results in larger values corresponding to greater improvement, which eases interpretation.
Independent Variables
The risk that the labor abuses documented in a social audit will be exposed and sanctioned depends on press freedom and NGO presence. We measure press freedom using the Press Freedom Index from Reporters without Borders, which reflects the extent to which journalists in a given country faced direct and indirect threats such as imprisonment, physical attacks, and censorship in a given year, a metric used by others for the same purpose (e.g., Faccio 2006; Cannizzaro and Weiner 2015). We reverse-code the raw Press Freedom Index so that higher values indicate greater press freedom, rescale the result to range from 0 to 1, and take the log (after adding 1) to reduce skew. We measure NGO density as the number of NGOs in the supplier’s country per million population—an approach used by others (e.g., Hafner-Burton and Tsutsui 2005; Chih, Chih, and Chen 2010)—which we also log to reduce skew. We obtained NGO data from the Union of International Associations and population data from the US Census Bureau’s International Data Base. Both the press and NGOs are critical actors in the transnational advocacy networks that we theorize will generate exposure risks, thus it is important to include both variables in our analysis. Because press freedom and NGO density are highly correlated (ρ = 0.83), we use principle components analysis as a data reduction technique (Hair, Anderson, Tatham, and Black 1998; Kennedy 2008), an approach others have used for the same purpose (e.g., Gulati and Sytch 2007; Perkins 2014; Guillén and Capron 2015). The first component’s eigenvalue of 1.85 is the only one to exceed the conventional threshold of 1, and it explains 92.3% of the variance between press freedom and NGO density. We refer to this first component as supplier environmental pressure. 8
Institutional pressure to improve labor standards is directed not only at suppliers but also—in fact, largely—at multinational buyers. To operationalize that dimension of institutional pressure, we rely on negative media reports, as others have done for similar purposes (Fiaschi, Giuliani, and Nieri 2013; Kölbel, Busch, and Jancso 2017). In particular, we consider whether a supplier serves a buyer that had recently been associated with supply chain labor abuses revealed in a news article or NGO report. To measure this, we relied on the database of media articles and reports on supply chain labor abuses compiled by the Business & Human Rights Resource Centre (BHRRC). BHRRC serves as a labor abuse information clearinghouse and research organization to “track the human rights policy and performance of over 7000 companies in over 180 countries, making information publicly available” (Business & Human Rights Resource Centre 2017). It gathers news articles from around the world linking companies to human rights abuses and conveys this information to its 177,000 monthly website visitors and via its e-newsletter issued to thousands of subscribers, including activists, businesses, governments, global media, and investors. BHRRC invites companies to respond to any post that names them and reports that 86% of companies do so (Business & Human Rights Resource Centre 2017), suggesting that companies are attentive to these reports. We measure targeted buyer pressure by taking the number of times the buyer on whose behalf the audit was conducted appeared in articles in this database during the year prior to the audit, adding 1, and logging that sum to reduce skew. 9
We calculate auditor training as the number of audit training courses an auditor had taken, based on data provided by the social auditor. Auditors are trained on various topics, including: audit skills to help auditors identify violations; substantive issues relevant to a specific industry, region, or supplier (for example, subcontracting in the garment industry); and the requirements of specific auditing protocols that certain clients have adopted, such as SA8000. Because audits are typically conducted by an audit team, we measure maximum auditor training as the largest number of training courses that any one team member had undergone by the time the audit was conducted, which we log after adding 1 to reduce skew and then standardize to facilitate interpretation. 10 The maximum number of training courses for audit teams averaged 6.9. We use the maximum training of any one team member because this measures the greatest potential to identify code-of-conduct violations and to transfer knowledge on how to remediate them.
Whether an audit was expected or a surprise was measured by announced, a dichotomous variable coded 1 when the supplier had advance notice of the audit date and 0 for unannounced audits, based on data from the social auditor. Whether an audit is announced or unannounced is typically determined by the buyer. In our sample, 76% of the audits were announced. 11
Audit-Level Control Variables
We control for audit-level factors by constructing variables from data provided by the social auditor. We control for the violations in the prior audit because suppliers whose prior audit yielded many violations face an opportunity set that differs from those with a “cleaner” history, which may influence their likelihood of improvement. Audits in our sample have a maximum of 75 violations, but such high counts were very rare. Violations (prior audit) is the number of violations from a prior audit, top-coded at the 99th percentile of the sample distribution (25 violations) to reduce the potential impact of outliers and then logged (after adding 1).
Because prior research indicates that auditing is less stringent when suppliers pay their own auditors (Jiang, Stanford, and Xie 2012; Duflo, Greenstone, Pande, and Ryan 2013; Short and Toffel 2016), we created dummy variables to indicate who paid for the audit: paid by the supplier or third party, paid by the buyer (on whose behalf the audit was conducted), and paid by unknown entity (when we lacked information about who paid).
Re-audits typically have a more limited scope, tending to focus on concerns raised at the prior audit. Because this could mechanically affect improvement rates, we include three dummy variables as controls: prior audit was re-audit, but focal audit was not; focal audit was re-audit, but prior audit was not; prior and focal audit were re-audits. The baseline condition is that neither was a re-audit.
Our interviews with social auditors—at the firm that provided our data and at others—indicated that the staff hours required to conduct an audit is a reasonable proxy for factory size and complexity, which could be associated with improvement but for which we lack direct measures. In addition, more staff hours in a prior audit might offer more opportunity to transfer information between the audit team and the supplier. 12 We therefore control for audit duration (prior audit), which we calculated by taking the log (after adding 1) of the number of staff hours required to conduct the prior audit.
Audit teams including individuals who had previously audited the supplier have been shown to report fewer violations than teams whose members have no prior history there (Short, Toffel, and Hugill 2016). We therefore created previous auditor, a dummy coded 1 when at least one member of the audit team had participated in the prior audit of that supplier and 0 otherwise. Because suppliers may remediate compliance problems identified at prior audits and thus face increasing mitigation costs, we create audit sequence as a count variable to denote each supplier’s first audit in the estimation sample, its second audit, and so on. 13 In our models, we flexibly control for audit sequence by including a dummy for each value, which avoids imposing the assumption that audit sequence has a linear influence on improvement.
Because an audit team’s gender composition has been shown to affect audit results (Short, Toffel, and Hugill 2016), we include three dummy variables: all-female audit team, mixed-gender audit team, and all-male audit team. We use dummies instead of a ratio because 82% of our sample’s audit teams were single gender: all-female (50%) and all-male (32%). The remainder were evenly split (15%), or had another mixed ratio (3%).
We control for team experience, which has been shown to affect reported violations (Short, Toffel, and Hugill 2016). We measure the maximum auditor tenure of each team as the maximum years of service with the social auditor among all team members. We include in our model both maximum auditor tenure and its squared value because the influence of experience on reported violations has been found to be nonlinear (Short, Toffel, and Hugill 2016). 14
Institutional Control Variables
Several factors pertaining to the supplier’s institutional environment have been shown to affect violation rates (Toffel, Short, and Ouellet 2015) and could affect improvement rates; we therefore control for them at the prior audit. We include only the prior audit values for these country-level variables because they are very stable over the period of time between two consecutive supplier audits. A supplier country’s dependence on foreign direct investment (FDI) might influence the extent to which the supplier perceives the need to respond to international pressure to improve how its factories are managed. We therefore control for each supplier country’s percentage of gross domestic product (GDP) made up of FDI (FDI inflows) in the year of the prior audit, based on World Bank data. 15
Because domestic legal protections for labor rights could influence suppliers’ perception of how much pressure they are under we obtain labor laws scores from Mosley (2011). 16 These scores measure the extent to which domestic law provides collective labor rights such as the right to join unions and strike, whether government approval is required for collective bargaining, and whether laws restrict worker rights in export processing zones (Greenhill, Mosley, and Prakash 2009). Because these scores are available only through 2002—before our sample period begins—we use the 2002 values for all years of our analysis. Studies have used this index to measure the stringency of country-level workers’ rights protections generally (e.g., Greenhill, Mosley, and Prakash 2009; Dean 2015; Toffel, Short, and Ouellet 2015; Fransen and Burgoon 2017) on the basis that collective rights are foundational to other workers’ rights, for example, wages, benefits, and working conditions (Greenhill, Mosley, and Prakash 2009).
Because country-level wealth and differences in wealth between supplier and buyer countries could influence improvement rates, we control for GDP per capita (prior audit) and GDP per capita in buyer country (prior audit), obtained from World Bank data. We control for potential differences in coercive pressure that buyers might exert based on their size by creating buyer employment (prior audit), logging employment to reduce skew. To do so, we obtained annual values of employment from Amadeus, Capital IQ, Hoovers, or Thomson ONE Banker. 17
Summary statistics are reported in Table 3. Correlations are reported in Table A.1 in the Online Appendix.
Summary Statistics
Notes:† indicates logged. § indicates logged, then standardized. N = 8,677 audits, except 7,774 for targeted buyer pressure, 4,338 for audit duration (prior audit), 4,523 for previous auditor (prior audit), 8,668 for previous auditor (focal audit), and 5,355 for buyer employment. FDI, foreign direct investment; GDP, gross domestic product; SD, standard deviation.
Estimation and Results
We test our hypotheses by estimating a model that predicts improvement based on the independent and control variables described above and several additional control variables explained below. Our model uses log versions of our continuous independent and control variables (except supplier environmental pressure, which was created using principle components analysis) both to reflect our sense that they have a diminishing marginal influence on improvement and to diminish the potential impact of outliers. In our specification, each observation includes variables measured at an establishment’s focal audit and prior audit; therefore, all establishments in our estimation sample have had at least two audits.
Whereas our hypothesized variables pertain to a supplier’s prior audit, these same factors pertaining to the focal audit might influence the number of violations reported in that focal audit, which is used to construct our dependent variable. Since failing to account for these factors could bias our estimates, we also control for maximum auditor training (focal audit) and announced (focal audit). Controlling for maximum auditor training at the focal audit prevents us from misattributing a supplier’s reduction of violations to situations in which an establishment’s focal audit team was less highly trained than the prior one. Similarly, controlling for whether the focal audit is announced or unannounced prevents us from mistakenly attributing a supplier’s reduction of focal-audit violations to situations in which advanced warning allowed it to fix or hide problems. We do not include the focal-audit values of supplier environmental pressure or targeted buyer pressure because they are very stable over time—their respective correlations between prior and focal audits is 0.99 and 0.89—and including them would substantially increase multicollinearity while adding almost no new information.
Because several audit design elements and audit team characteristics at the prior audit could influence violations recorded in that audit, and because these same factors at the focal audit could influence violations recorded in that audit, we include most audit-level controls—paid by supplier or third party, paid by unknown entity, re-audit, previous auditor, all-female audit team, mixed-gender audit team, and maximum auditor tenure—in the model twice to control for them at both the prior and the focal audits.
We also include industry fixed effects and year fixed effects to control for potential differences in improvement rates between suppliers in different industries and between the years in our sample. Because suppliers might respond differently to buyers in institutional contexts exerting varying levels of pressure (Toffel, Short, and Ouellet 2015), we include fixed effects for buyer countries. 18 We log maximum auditor training (prior audit) and audit duration (prior audit) to reduce skew and then standardize them to facilitate an elasticity interpretation of coefficients in response to a one standard deviation change. We use the log form of all other continuous variables to facilitate their interpretations as elasticities.
Empirical Results
For context, we note that suppliers in our sample averaged 7.2 violations in their prior audit and 5.6 violations in their focal audit, an average improvement of 1.6 violations. This 22% improvement rate (calculated as 1.6 ÷ 7.2) corresponds to the sample average improvement rate of 0.22 reported in the summary statistics (Table 3). 19
We estimate our models using ordinary least squares (OLS) regression, clustering standard errors by the supplier’s country, the most aggregated level of our explanatory variables.
We test H1–H4 with model 1 and report results in Table 4. The statistically significant positive coefficient on supplier environmental pressure (prior audit) (β = 0.089; p < 0.01) indicates that suppliers tend to improve more in countries in which civil society monitoring has greater potential to expose noncompliance with codes of conduct, which supports H1. The coefficient magnitude indicates that a one-standard-deviation increase in supplier environmental pressure (prior audit) (such as a change from Vietnam to the Philippines) is associated with an increase in improvement from an average of 22% to 32.6%, based on average predictions across our sample. 20 This 32.6% improvement from the baseline average of 7.2 violations constitutes a reduction of 2.3 violations, which is nearly one and a half times the sample average reduction of 1.6. Support for H1 is robust to replacing supplier environmental pressure (prior audit) with its two underlying elements, which yields statistically significant positive coefficients on both standardized press freedom (prior audit) and standardized NGO density (prior audit). This provides further evidence that suppliers operating in institutional environments with greater press freedom and NGO pressure tend to improve more than suppliers in countries with less of those.
Regression Results
Notes: Ordinary least squares (OLS) regression coefficients with standard errors clustered by supplier country in brackets. N = 8,677 observations (each based on two consecutive audits) from 4,940 factories. † indicates logged. § indicates logged, then standardized. Baseline (omitted) categories are paid by the buyer for focal and prior audit when neither was a re-audit. All models include dummy variables to indicate instances in which the following variables were missing data and thus recoded to 0: targeted buyer pressure (prior audit) (903 audits), audit duration (prior audit) (4,339 audits), previous auditor (prior audit) (4,154 audits), previous auditor (focal audit) (9 audits), and buyer employment (3,322 audits). FDI, foreign direct investment; GDP, gross domestic product.
p < 0.01; *p < 0.05; +p < 0.10.
The statistically significant positive coefficient on targeted buyer pressure (prior audit) (β = 0.037; p < 0.01) illustrates greater average improvement for suppliers to buyers that have already been publicly exposed for harms to workers in their supply chain, which supports H2. The coefficient magnitude indicates that a one-standard-deviation increase in targeted buyer pressure (prior audit) is associated with an increase in improvement from an average of 22% to 23.2%, based on average predictions across our sample. 21
The statistically significant positive coefficient on the standardized maximum auditor training (prior audit) (β = 0.031; p < 0.01) indicates that greater improvement tends to follow audits conducted by better-trained audit teams, which supports H3. 22 The coefficient magnitude indicates that, on average, suppliers realize an additional 3.1 percentage point improvement when their prior audit was conducted by a team whose best-trained auditor had one standard deviation more training than the average team’s best-trained auditor (that is, 12.7 training courses versus the average of 6.9). Such suppliers average a 25.1% reduction (the sum of the 0.22 sample average and the 0.031 coefficient); a reduction of 1.8 violations from the prior to the focal audit, or 0.2 violations more than the average reduction of 1.6 violations.
The statistically significant positive coefficient on announced (prior audit) (β = 0.049; p < 0.01) indicates that greater improvement follows announced audits than unannounced audits, which supports H4. 23 Predictive margins indicate that suppliers whose prior audit was announced experienced an average 23.2% improvement, compared to 18.2% for suppliers whose prior audit was unannounced. Applied to the average 7.23 violations in the prior audit, this is an average decline of 1.67 violations following announced audits versus 1.34 after unannounced audits. This average differential of 0.33 violations per audit corresponds to one more violation being mitigated after three announced audits than after three unannounced audits. These results are largely identified based on differences between factories because only a small fraction of the factories in our sample have prior audits that are a mix of announced and unannounced audits.
To test H5, we add a term that interacts maximum auditor training (prior audit) and announced (prior audit) and report the results as model 2 in Table 4. The statistically significant positive coefficient on the interaction term (β = 0.051; p < 0.01) indicates that better-trained audit teams at prior audits tend to prompt more improvement when those prior audits were announced than when they were unannounced, which supports H5. Figure 1 plots the average predicted effects of maximum auditor training (prior audit) on improvement for observations in which prior audits were announced or unannounced. The upward-sloped dashed line indicates that for announced audits, better-trained auditors at the prior audit prompt more improvement. The relatively flat solid line indicates that for unannounced audits, suppliers’ improvement rates are largely unaffected by how well trained the prior audit team was.

Average Predicted Improvement Values Based on Varying Amounts of Maximum Auditor Training at Prior Audits That Were Unannounced or Announced
Supplementary Analysis
We conducted several additional analyses to assess the robustness of our results. Our results are robust to several alternative estimation approaches, including: two-way clustering standard errors at the supplier-country and buyer-company levels; a cross-sectional model (i.e., one observation per factory) using factory means of all variables; and a random-effects model using factory-level random effects (not reported). While we believe our dependent variable is a well-designed interpretable metric robust to outliers, we acknowledge its complexity. Therefore, we assessed whether our results were sensitive to this metric by estimating models that instead predict the number of violations cited in the focal audit, controlling for the number of violations cited in the prior audit and including all other independent and control variables from our primary specifications. These negative binomial regression 24 results, reported in Table A.2 in the Online Appendix, confirm all inferences from our primary models and thereby indicate that our results are robust to this alternative specification. 25
Because improvement might depend on the time between the prior and focal audits, we re-estimated our models predicting improvement rate, an alternative dependent variable that explicitly accounted for the time lag since the prior audit. Improvement rate is calculated by dividing improvement (our primary dependent variable) by the log of the number of days since the prior audit. These results, too, yield statistically significant support for our hypotheses except that auditor training remains a significant predictor of improvement only in the presence of announced audits.
We coded our variables irrespective of whether the focal and prior audit were conducted for the same brand. Estimating our primary models on the subsample of 6,458 observations in which an establishment’s focal and prior audits were conducted for the same brand yielded insights similar to those of our primary approach. The one difference is that the estimates on this subsample yield evidence of an auditor training effect only in the presence of an announced audit (H5), but not an overall effect (H3).
Because so many of the observations in our sample are from China, we examined whether our primary results held when estimating our models on just the 6,294 observations from the 3,378 factories in China. To do so, we omit our factory-country variables and the country-year-level variables (to avoid multicollinearity concerns, as they changed only slightly over time) and cluster standard errors by factory. All the hypothesized effects continue to yield statistically significant coefficients of the same sign and similar magnitude (though in some cases with larger standard errors, likely attributable in part to the smaller sample size).
We estimated models that accounted for the fact that some buyers always sought unannounced audits, some always sought announced audits, and some sought a mix. Specifically, we added two control variables to our primary models: one dummy variable indicating audits on behalf of buyers that always specified announced inspections and another indicating audits on behalf of buyers that always specified unannounced inspections. Audits conducted on behalf of a buyer that specified a mix was the omitted category. The results are virtually identical to those of our primary models. Overall, our primary results proved markedly stable throughout various robustness tests.
Extensions
Our primary analysis measures improvement based on the difference between the total number of violations cited in the focal audit and in the prior audit, aggregating several types of violation to capture improvement comprehensively. Recognizing that different types of violations might improve at different rates under the hypothesized conditions (Barrientos and Smith 2007; Ruwanpura 2012; Egels-Zandén and Lindholm 2015; Stroehle 2017), we disaggregated our dependent variable by violation type to better understand which categories are more likely to improve under which conditions. To explore how our hypothesized variables influence these categories, we estimated models that predicted improvement in each of the violation categories making up our improvement variable for which at least 10% of audits exhibited variation; these were child labor, working hours, minimum wage, and occupational safety and health (OSH). 26 We created an improvement metric for each category by applying the same formula used to create our primary improvement metric. The correlation between these four variables ranges from 0.20 to 0.34. We report in Table 5 the results of OLS regression models that predict each improvement metric based on the specifications used in our main models, except that our control for a supplier’s prior violations corresponds to the specific violation category being predicted. The results broadly validate the mechanisms we theorize above and also indicate that our hypothesized variables are associated with varying degrees of improvement across different violation types, thus highlighting important nuances of our primary results.
Extension Results
Notes: Ordinary least squares (OLS) regression coefficients with standard errors clustered by supplier country in brackets. N = 8,335 observations (each based on two consecutive audits) from 4,870 factories, slightly smaller than in Table 4 because of a few missing values of violation category counts. † indicates logged. § indicates logged, then standardized. See Table 4 for additional notes. FDI, foreign direct investment; GDP, gross domestic product; OSH, occupational safety and health.
p < 0.01; *p < 0.05; +p < 0.10.
First, we find that under conditions of greater institutional pressure in suppliers’ domestic environments, significantly more improvement occurs in all four subcategories (child labor, working hours, minimum wages, and OSH violations). This finding confirms the importance of institutional pressure to improvement on all of these dimensions.
Second, we find more improvement in working hours, minimum wage, and OSH violations among suppliers of buyers directly targeted by institutional pressure. These findings support our intuition that reputation-sensitive buyers will be more proactive in seeking improvements to supply chain working conditions. OSH violations have been a focus of activist organizing and media exposés, making them highly salient to brands with reputation concerns; these brands may apply more pressure to improve supplier conditions. Working hours and wages are dimensions that buyer sourcing practices can influence, and buyers facing more institutional pressure might be especially attentive to this. For example, such buyers might be more prone to avoiding frequent change orders or delaying orders, thus reducing suppliers’ need to work excessive hours. Such practices can also mitigate the risk of workplace injuries. Similarly, buyers concerned about their reputations might be more likely to avoid bargaining to minimize suppliers’ profit margins to the point that suppliers feel compelled to constrain wages in ways that violate codes of conduct. We were somewhat surprised that we did not find evidence of greater improvements in high-reputation-salience violations such as child labor among suppliers of buyers facing more institutional pressure. Nonetheless, we note that the coefficient on this violation type remains positive and its lack of significance may be due to statistical power: Factories in our sample were much less likely to exhibit variation over time in the number of child labor violations than in the other types of violations. Of the observations in our sample, 66% have the same number of child labor violations in the focal and prior audits, compared to 17 to 33% for violations of working hours, minimum wage, and OSH.
Third, we find that visits by highly trained auditors lead to significantly more improvement in child labor, wages, and OSH scores. These findings are consistent with our hypothesis that better-trained auditors can better convey compliance information. Improvements in child labor scores may be attributable to suppliers’ greater willingness to receive and follow advice from more highly trained auditors. Payroll and OSH practices can be complex and/or technical, and a knowledgeable auditor can provide useful guidance about how to maintain effective record-keeping systems and remedy workplace hazards.
Finally, we find that supplier improvement following announced audits is driven primarily by improvement in OSH violations—precisely the kind that may require transfer of knowledge between auditors and suppliers regarding buyer expectations and best practices. Notably, compliance with child labor restrictions improves significantly less following announced audits than unannounced audits. Because most factory managers are well aware that these are zero-tolerance violations that are grounds for contract termination, an exchange of information is typically not required to convey compliance expectations and practices. We therefore do not expect formal signals of greater trust to foster improvements in this area. In fact, our extension suggests that announcing audits could exacerbate such violations if it is taken as a signal of leniency rather than of trust.
Discussion
Our findings reveal several important structural conditions under which codes of conduct and monitoring regimes adopted by TNCs are associated with improvements in suppliers’ labor practices. Suppliers are more likely to improve when local and global institutional pressures generated by civil society activism create greater risk that harms to workers will be discovered and publicized; the more institutional pressure targeted at buyers from additional revelations, the more their suppliers improve. In addition, we find that suppliers improve more not only with external institutional pressures but also when monitoring programs are designed in ways that facilitate knowledge transfer. Suppliers improved to a greater degree following audits with advance notice, particularly in areas such as OSH, where compliance assistance can be most helpful. We also find suppliers are more likely to improve when their auditors are highly trained, but only when those auditors conduct pre-announced audits. These findings make important contributions to the literatures on supply chain labor standards regulation, private politics, and decoupling.
First, our finding that highly trained auditors are associated with greater improvement is a corrective to the literature’s pervasive auditor skepticism (O’Rourke 2002; Esbenshade 2004; Power, Ng, and Singh 2008; Locke, Amengual, and Mangla 2009; AFL-CIO 2013; LeBaron and Lister 2015), suggesting the important role auditors can play given the proper tools. While some have suggested that better-trained auditors can be more effective (Locke, Amengual, and Mangla 2009; AFL-CIO 2013), to our knowledge ours is the first study to empirically document this conjecture in this context. At the same time, our finding that even highly trained auditors add no significant compliance improvement through unannounced audits suggests the limitations of training and the need to consider program design holistically rather than piecemeal.
Second, our finding that pre-announced audits were followed by greater improvement in OSH practices but not in child labor practices adds nuance to the debate surrounding whether audits should be announced or unannounced. The improvement in OSH practices is consistent with 1) the prediction that compliance can develop iteratively in response to cooperative gestures by those implementing the rules (e.g., Axelrod 1984; Scholz 1984; Ayres and Braithwaite 1992) and 2) qualitative studies finding better compliance with labor codes of conduct by suppliers in trusting and cooperative relationships with buyers (Frenkel and Scott 2002; Locke and Romis 2007). But our finding that announced audits impeded improvement in compliance with child labor standards suggests that cooperative signals from buyers may lead some suppliers to believe they can get away with such violations, consistent with research that is skeptical of the rigor of pre-announced audits (O’Rourke 2002; Esbenshade 2004; Clean Clothes Campaign 2005; Gray 2006; Power, Ng, and Singh 2008; AFL-CIO 2013; LeBaron and Lister 2015; Short, Toffel, and Hugill 2016).
These findings add an important empirical dimension to this highly polarized and largely theoretical debate by documenting the difficult sets of tradeoffs involved in designing monitoring regimes. There can be a trade-off between violation discovery and compliance improvement. Thus, buyers must consider which to prioritize depending on the aims of their monitoring regimes, such as whether they primarily seek to collect the most complete and accurate information, to catch suppliers committing particularly harmful violations, or to improve working conditions by creating conditions for cooperation with suppliers. In addition, there is a risk that the signals buyers send to foster cooperation and improvement might be perceived by some suppliers as laxity, encouraging noncompliance. Buyers who seek to foster cooperation and improvement should think carefully about the signals they send to suppliers to ensure that those signals are not misinterpreted as license to violate the code. Our finding that highly trained auditors were associated with accelerated improvement in child labor practices as well as in other violation categories suggests that these tradeoffs might be reconciled to some degree through better training of auditors. Further research should explore how monitoring approaches can be deployed and combined to leverage their comparative advantages.
Third, the extension disaggregating our dependent variable contributes to a body of research addressing how improvement varies across different violation categories. Studies have shown, for instance, that labor code compliance tends to improve more rapidly in some categories, such as health and safety, than in others, such as freedom of association (Barrientos and Smith 2007; Ruwanpura 2012; Egels-Zandén and Lindholm 2015; Stroehle 2017). Our findings extend existing research that identifies conditions associated with improvement in wages and hours requirements but not OSH requirements. Oka (2015) found that unionized suppliers improve their compliance with wage, hours, and leave standards much more substantially than with OSH standards because unions in developing economies—where most suppliers grappling with code compliance are located—tend to prioritize pocketbook issues over OSH issues. Distelhorst et al. (2017) similarly found that wages and hours practices, but not OSH compliance, improve in factories that adopt lean manufacturing practices. We extend this work by identifying institutional and program design features associated with improvements in OSH violations as well as these other categories. These findings suggest the importance of alternative pathways of influence to improve working conditions overall.
Fourth, our finding of greater improvement among suppliers to reputation-compromised buyers suggests that exposure by activists can have substantive impacts beyond the largely symbolic responses identified in the private politics literature. Prior research documents that activism prompts firms to adopt symbolic structures such as “impression management tactics” (McDonnell and King 2013: 411), public “concessions” to conform to activists’ demands (Eesley and Lenox 2006; King 2008), and CSR officer positions or board committees (McDonnell et al. 2015). However, this research has not revealed whether activism is related to changes in organizational behavior that align it more closely with the activists’ normative goals. Our finding that buyers tainted by negative media coverage are especially prone to working with suppliers that are more rapidly improving their working conditions suggests that these reputational risks may prompt firms to take substantive and not merely symbolic measures to avoid further reputation damage. Moreover, our finding that the more pressure buyers experience in their institutional environment, the more likely their suppliers are to improve is an important strategic insight for activists considering how to target their resources. Research has shown that firms that are repeatedly targeted by activists become more receptive to future activist challenges, in part through the adoption of formal corporate structures to manage social responsibility issues (McDonnell et al. 2015). We demonstrate that frequent targets may go beyond formal organizational responses to actually improve the practices that prompted activists’ objections.
Finally, demonstrating the relationship between program design features and performance improvement adds an important dimension to understandings of decoupling. With few exceptions (e.g., Kalev, Dobbin, and Kelly 2006), program design has been ignored as a coupling determinant, and we are aware of no study that investigates how design features create contingencies for one another. We suspect that decoupling studies have devoted little time to the design features of organizational structures adopted in response to activism because this literature has long theorized that such structures are likely to be merely symbolic. Our findings challenge that theoretical premise by suggesting that the difference between symbol and substance may depend, in part, on how organizational structures are designed.
Limitations and Future Research
Our study has limitations but also invites promising future research. Because all the suppliers in our sample were audited, we address why some audited suppliers improve more rapidly than others, but not whether auditing is more effective than other interventions, such as more stringent government regulation, legally binding international standards, or labor union activities. These are vital research questions.
Our findings are subject to several data limitations. Violations cited in social audits are an imperfect measure of objective labor conditions. Audits present an incomplete snapshot of factory practices, and research suggests that social auditing suffers from biases due to the misaligned incentives of auditors, inadequate training of auditors, and suppliers’ elaborate efforts at subterfuge (O’Rourke 2002; Esbenshade 2004; LeBaron and Lister 2015). In our analysis, we make extensive efforts to address potential biases by controlling for several factors known to affect auditors’ detection of violations, including who pays for the audit, ongoing auditor-supplier relationships, and the gender composition of audit teams (Short, Toffel, and Hugill 2016) and by omitting from our data those violation categories such as freedom of association and collective bargaining where auditors have been found notoriously incapable of reliably identifying violations (Anner 2012; Egels-Zandén and Merk 2014).
Moreover, we examine factories that faced at least two social audits by a single firm. By omitting those audited just once, we exclude audits that buyers might have initiated as a first step toward establishing a supplier relationship that was subsequently abandoned. Our focus on a single auditing firm has the advantage of providing comparable auditor training data, but it does not enable us to compare practices between auditing firms. Omitting certain types of code violation from our analysis enhanced the reliability of our improvement measure, but it leaves future research to determine whether the factors we found to predict improvement would also do so with the types of violation we omitted, particularly those concerning freedom of association and collective bargaining. Data limitations also prevented us from controlling for some of the factors that prior studies have found to be predictive of regulatory compliance, such as firm size and regulatory enforcement practices. Although we believe that our proxies for key independent variables are reasonable, we cannot rule out the possibility that they are incomplete. Finally, it is possible that alternative governance structures might influence improvement, including whether audits are conducted by brand staff or by a third-party auditing firm and whether codes are sponsored by brands, multi-stakeholder regimes, or NGOs. Our study paves the way for others to examine these and other factors.
Conclusion
As the anti-sweatshop movement makes the “TNC into the central locus of struggle over labor rights and globalization” (Bartley and Child 2014: 657), it is crucial to understand whether formal organizational structures such as codes of conduct and supplier monitoring can produce meaningful social change. We identify conditions at the institutional and program design levels under which these formal organizational structures are associated with measurable improvements in working conditions. Our findings suggest key considerations that should inform social monitoring and activist targeting strategies aimed at raising labor standards in global supply chains.
Supplemental Material
ILRR_Short-et-al_Supplemental_Online_Appendix – Supplemental material for Improving Working Conditions in Global Supply Chains: The Role of Institutional Environments and Monitoring Program Design
Supplemental material, ILRR_Short-et-al_Supplemental_Online_Appendix for Improving Working Conditions in Global Supply Chains: The Role of Institutional Environments and Monitoring Program Design by Jodi L. Short, Michael W. Toffel and Andrea R. Hugill in ILR Review
Footnotes
Acknowledgements
We gratefully acknowledge the research assistance provided by Melissa Ouellet as well as by Chris Allen, John Galvin, Erika McCaffrey, and Christine Rivera. We appreciate the helpful comments provided by Xiang Ao, Andrew Marder, Bill Simpson, Benjamin van Rooij, and the participants of the University of Michigan Ross School of Business strategy seminar, the Ohio State University Fisher College of Business management sciences seminar, the MIT Sloan School of Management operations management seminar, the UC Irvine Long US–China Institute workshop, the Tulane Law School Murphy Institute workshop series, the Duke Strategy/EDGE Center Seminar, and the University of Texas–Austin McCombs School of Business operations management seminar. We appreciate the financial support provided by Harvard Business School’s Division of Research and Faculty Development.
For information regarding the data and/or computer programs used for this study, please address correspondence to
1
In a very different context, studies have suggested that activists and the press are less likely to target the worst-behaved companies than companies that already have strong reputations for social responsibility performance (Luo, Meier, and Oberholzer-Gee 2012; Bartley and Child 2014) or companies that have already adopted extensive organizational structures to implement their corporate social responsibility (CSR) initiatives (McDonnell, King, and Soule 2015). Those studies focused on activism directed toward branded multinational companies with reputations to protect and argued that activists can exercise more leverage over such firms because they face greater financial consequences of reputational damage. Such is not the case with suppliers in developing countries. They are unbranded, largely invisible to consumers, and thus more insulated from reputational threat. We believe that activists will select their targets very differently in these contexts and will attempt to identify the worst practices by local suppliers in order to gain the most leverage over their global brand targets.
2
Nearly 80% of the auditors in our sample conducted audits in just one country. Of those who worked in more than one, most went only to nearby countries.
3
The maximum possible number violations in each category were: child labor (7), forced or compulsory labor (5), working hours (7), occupational safety and health (31), minimum wage (15), treatment of foreign workers and subcontractors (4), and disciplinary practices (6). We excluded violation categories that, according to our data provider, do not apply to all suppliers (dormitory conditions and canteen violations) or were interpreted dissimilarly by auditors in different countries (freedom of association, the right to organize and bargain collectively, legal or client requirements), or where research has shown codes to diverge from one another in their details (freedom of association, nondiscrimination clauses) (O’Rourke 2002; Rodríguez-Garavito 2005). Our decision to omit violations of freedom of association and collective bargaining is also supported by research demonstrating that auditors often fail to properly identify these types of violations (Anner 2012;
).
4
Though only 4% of the prior audits in our sample had zero violations, such suppliers might be distinctively capable of exemplary performance and allowing these observations to drop out of the sample risks introducing bias. Adding 0.1 to the numerator and denominator, instead of adding 1, yielded nearly identical results.
5
For example, our metric considers the proportional reduction from 12 to 6 violations at a large supplier to be equivalent to a small supplier’s reduction from 4 to 2 violations, whereas a difference metric would consider the former to be three times the magnitude of the latter.
6
For example, skewness declines by a factor of 10, from a value of 4.2 for percentage change to a mere 0.4 for our improvement metric, and kurtosis declines by a factor of nearly 7, from 30.5 for percentage change to 4.5 for improvement.
7
The simple ratio of violations at an establishment’s focal audit to those at its prior audit is highly skewed: it ranges from 0 to 19 and has a mean of 1.1, a standard deviation of 1.3, skewness of 4.2, and kurtosis of 30.5. Models that estimate this simple ratio as a dependent variable would be quite vulnerable to outliers driving their results.
8
A robustness test that includes press freedom and NGO density in our models instead of supplier environmental pressure yields the same inferences as our primary models.
9
Because many buyers appeared in no such articles, we conducted robustness tests that measured this using a dummy (rather than a count) coded 1 for an audit of a supplier whose buyer was featured in at least one article in this database in the prior year, and coded 0 otherwise. Estimates yielded nearly identical results when we added this dummy to our primary specification and when we substituted it for our primary measure of pressure on buyers (prior audit).
10
Robustness tests (not reported) indicate that using a team’s average auditor training instead of its maximum auditor training yields nearly identical results.
11
We found no evidence that a supplier’s prior violation count or the duration of the buyer-supplier relationship affected the propensity for a supplier’s audit to be announced (versus unannounced). Specifically, we estimated a logistic regression that predicted whether an audit was announced (versus unannounced) based on the duration of the buyer-supplier relationship (proxied by whether an audit was the 2nd, 3rd, 4th, or 5th or more conducted of this supplier for the same buyer), buyer size (log employment), and country (dummies), controlling for supplier industry (dummies) and violations reported in the prior audit. Results indicate that buyer-supplier relationship duration is not a significant determinant of an audit being announced or unannounced. Larger buyers were more likely to have announced audits and the supplier-country dummies were jointly significant, as were the buyer-country dummies. The regression results that test our hypotheses (reported in
) are unlikely to be contaminated by omitted variable bias associated with factors that predict whether an audit will be announced or unannounced, because those regression models control for the statistically significant factors correlated with this decision.
12
We include in our model a dummy variable to denote the nearly 50% of observations for which the number of staff hours required to conduct the prior audit was missing from the database and where we thus recoded auditor exposure (prior audit) observations from missing values to 0. This common econometric approach is algebraically equivalent to recoding those missing values with the variable’s mean (
: 62).
13
A supplier’s first observation in the estimation sample (audit sequence = 1) incorporates information from its focal audit (i,t) and prior audit (i,t-1) because our dependent variable incorporates both of their audit scores. Audit sequence = 1 for 57% of the observations in our estimation sample, 2 for 22%, 3 for 10%, and 4 or more for 10%.
14
Robustness tests (not reported) indicate that using teams’average auditor tenure yields nearly identical results.
15
FDI inflows measures net inflows of foreign direct investment (that is, inflows less divestment during the previous year) used to acquire a lasting management interest (that is, 10% or more of a company’s voting stock was purchased by international entities) in companies in the supplier’s country. It is composed of equity capital, earnings reinvestment, and other short-term and long-term capital, as shown in the country’s balance of payments.
16
We find nearly identical results when, as a robustness test, we substitute for labor laws two alternative measures of the stringency of the domestic legal environment: the World Bank rule of law score and the number of ILO labor treaties the country has ratified.
17
The social auditor enabled us to append these variables to a list of buyer companies (which they provided to us without any other data) and subsequently provided the de-identified data set.
18
While we have 17 buyer countries in our sample, 89% of the observations correspond to just two. We therefore pursue a more conservative approach of including buyer-country fixed effects, controlling for differences in prosocial attitude in the buyer country as well as for all other buyer country attributes that are relatively stable during our sample period.
19
Of the variation in the number of violations, 16.3% comes from the factory-country level, 48.3% comes from the factory level, and the remaining 35.4% comes from other factors at the observation level. We also decomposed variation in improvement (our dependent variable) by using a mixed model to estimate our primary specification and found that 5.2% of the variation in the number of violations comes from the factory-country level, 5.0% comes from the factory level, and the remaining 89.9% comes from other factors at the observation level.
20
The 32.6% is calculated by adding to 0.22 the product of 0.089 (the coefficient on supplier environmental pressure) and 1.19 (the standard deviation of supplier environmental pressure).
21
The 23.2% is calculated by adding to 0.22 the product of 0.031 (the coefficient on supplier environmental pressure) and 0.40 (its standard deviation).
22
Our finding that a better-trained audit team at the prior audit leads to more improvement would risk being driven by regression to the mean if our specification only measured audit team training associated with the prior audit. However, our models also control for the focal audit team’s training, which mitigates that risk.
23
If announcing the prior audit gave factories time to hide or solve problems, prior audits would yield fewer violations than they otherwise would, which would bias against our hypothesized result; the falsely depressed baseline violation count would make it more difficult to observe subsequent improvement.
24
Because the dependent variable of this model, number of violations, exhibits overdispersion—the ratio of the variance to the mean is 4.4 (that is, 24.7 / 5.6)—we followed the conventional approach of using negative binomial regression rather than Poisson. Moreover, a likelihood-ratio test yields a chi-squared value of 6,931, which indicates that the probability that we would observe these data conditional on alpha equaling 0 (an assumption underlying the Poisson estimator) is virtually zero.
25
We also estimated OLS models that predicted two other alternative dependent variables: the difference in violations between the focal and prior audit and the difference between the logs of these values. These models yielded results substantially similar to those of our primary models and of the alternative models reported in Table A.2 in the
. Specifically, they yielded the same inferences for our institutional features and stronger evidence of auditor training, but only evidence of announced having an effect when it was interacted with auditor training.
26
The scores for disciplinary practices, forced or compulsory labor, and treatment of foreign workers and subcontractors changed in fewer than 10% of audits. To avoid generalizing from such limited variation, we did not estimate models to predict those three categories.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
