Abstract
We build on and extend methodological developments in computerized textual analysis and apply it for developing and validating a measure for the widely used concept of organizational culture. Although the organizational culture concept is widely used in a variety of domains of management scholarship, its measurement is primarily based on survey questionnaires. In this study, we extend computerized textual analysis by introducing the capabilities of natural language processing (NLP). NLP capabilities are artificial intelligence techniques for textual analysis that make it possible to conduct textual analysis at the multiple word level. We follow recommendations for establishing construct validity and demonstrate that the measure of organizational culture dimensions outlined in the study has content validity, external validity, dimensionality, and predictive validity.
Keywords
Computerized textual analysis has emerged as a viable measurement technique that can be further improved to develop useful and valid measures of management concepts. The emergence of computerized textual analysis as a practical tool is due to the nature of the data and significant advances in computational techniques. Short, Broberg, Cogliser, and Brigham (2010), for example, noted that computerized textual analysis has an advantage in entrepreneurship research because, compared with respondent willingness to respond to survey questionnaires, documents are readily available (p. 321). Illia, Sonpar, and Bauer (2014) pointed to the advantages of computerized textual analysis over manual content coding and develop and apply co-occurrence analysis using a computerized package for measuring impression management.
We build on and extend methodological developments in computerized textual analysis. Specifically, we follow and enhance the approach suggested in Short et al. (2010) by incorporating and illustrating the use of natural language processing capabilities for developing and validating a measure for the widely used concept of organizational culture. Although the organizational culture concept is widely used in a variety of domains of management scholarship, its measurement is based primarily on survey questionnaires. Van den Berg and Wilderom (2004) called for a broader array of approaches to measure organizational culture including “questionnaires, archived materials, observation schemes, and even field experimentation” (p. 576). Although archived documentary sources can be used to measure organizational culture, and other management concepts (B. Lee, 2012), they present a new set of challenges (see Illia et al., 2014; Short et al., 2010). These challenges are both methodological and computational and leveraging advances in other disciplines (e.g., computational linguistics) can help overcome these challenges.
Our review of management studies using computerized content analysis (e.g., Gibson & Gibbs, 2006; Illia et al., 2014; McKenny, Short, & Payne, 2013; Short et al., 2010) revealed an emphasis on word-level analysis for construct measurement. Short et al. (2010, p. 341) have highlighted the limitation of word-level analysis thus: “CATA is less context sensitive than human coders for detecting the meaning of a word within a sentence.” By using natural language processing (NLP) to identify larger chunks of meaningful text, we attempt to augment context sensitivity of computerized textual analysis in management scholarship.
The allure of NLP capabilities in textual analysis is that it makes it possible to conduct “multiword” phrase-level analysis. Although several text analysis programs allow searching for phrases, few programs allow generating a construct score using phrase-level dictionaries. But, the existence (or not) of phrase-level search functionalities in existing text-analysis programs is not the focus of the article.
We focus instead on the value of using phrase-level data dictionaries for generating construct scores. Thus far, management researchers have used single-word dictionaries (e.g., Illia et al., 2014; Short et al., 2010). Arguments and evidence in favor of phrase-level analyses have also been advanced by researchers using computerized textual analysis for other applications such as language translation and topic modeling. For example, Hirschberg and Manning (2015, pp. 261-262) identify use of phrase-based approach as the key advance in 50 years of machine translation research. This advance markedly improved the quality of machine translation of human languages and was instrumental in the development and success of Google Translate. Single words (also known as unigrams) generate inferior results when compared with phrases (N-grams; Arnon & Snider, 2010; Goel, Gangolly, Faerman, & Uzuner, 2010).
We believe this is because multiword phrase-level analysis retains the intended meaning better than single-word-level analysis. Consider the phrase “new product,” which refers to new product development. Single-word level analysis divides the phrase into the two component words “new” and “product” and uses this information for construct measurement. By treating each word as a separate analytical entity, single-word level textual analysis strips a word of its linguistic context. Neither word—“new” nor “product”—independently captures the intended meaning of the phrase “new product” as in new product development. Using the word “new” by itself may pose complications. For example, new may co-occur with words—such as issue, leadership, market, regulation—which would produce combinations distinct from the meaning implied in the phrase “new product.”
Proximity matching of phrases can allow searches of words within a set number of characters of another or within a set number of words of each other. As an example, this approach can allow “new and improved product” or “product was new” to be counted as “new product.” As we discuss in a later section, phrases are an essential linguistic component and NLP researchers have developed algorithms for linguistic component detection (e.g., Byrd, Justeson, & Katz, 1995). These artificial intelligence algorithms can identify language components, thus allowing us to further enhance extant text analysis methodologies by including “multiword” phrase-level analysis. Before discussing study contributions, we want to note that there are other prominent traditions in organizational research—especially discourse analysis—that approach linguistic and organizational analysis in profoundly different ways (Hardy, 2001, 2004; Phillips & Hardy, 2002; Phillips, Lawrence, & Hardy, 2004).
Our study makes two contributions. First, we introduce NLP capabilities into computer-aided content analysis methodology. The use of phrases for construct measurement extends techniques such as co-occurrence analysis (Illia et al., 2014) that seek to take advantage of improved computational capabilities. The second contribution is a substantive one. In proposing and developing a computer-aided text analysis measure of organizational culture, we pay careful attention to construct validity concerns (e.g., Neuendorf, 2002; Potter & Levine-Donnerstein, 1999; Short et al., 2010; Weber, 1990). We follow recommendations for establishing construct validity and demonstrate that the measure of organizational culture dimensions outlined in the study has content validity, external validity, dimensionality, and predictive validity. As a result, it is possible to measure organizational culture retrospectively for firms by using relevant archival and documentary sources.
This article is structured as follows. First, we provide a brief overview of NLP evolution and applications. Second, we review approaches for computerized textual analysis in extant management research. Third, we discuss NLP techniques, programs and linguistic components. Fourth, we develop a coding dictionary for organizational culture using deductive and inductive approaches. Finally, we conduct tests for content validity, external validity, dimensionality, and predictive validity and conclude by summarizing key contributions and future applications.
Natural Language Processing—Evolution and Applications
Nadkarni, Ohno-Machado, and Chapman (2011) provide a brief overview of the historical evolution of NLP and note that “NLP began in the 1950s as the intersection of artificial intelligence and linguistics…. Currently, NLP borrows from several, very diverse fields” (p. 544). Generally speaking, NLP is a research area that uses computers to comprehend and analyze human-produced texts (Liddy, 2001). Three excellent books for researchers new to NLP are Indurkhya and Damerau (2010), Jurafsky and Martin (2014), and Manning and Schütze (1999).
To take stock of NLP research within management and related disciplines, we identified 50 articles from the Business Source Premier database. All articles in our review sample included the keyword “natural language processing” in the article abstract. Given the broad domain of sources covered by the Business Source Premier database a large proportion of the results generated (despite using targeted search criteria) were not related to the management discipline. We, therefore, went through three rounds of screening the articles for our review.
Our review suggests that NLP techniques have been applied to diverse research topics such as biofuel patents (Kessler & Sperling, 2016), construction safety (Tixier, Hallowell, Rajagopalan, & Bowman, 2016), electronic negotiations (Sokolova & Szpakowicz, 2007), fraudulent annual reports (Goel et al., 2010), group behaviors (Crowston, Allen, & Heckman, 2012), news-driven trading (Pröllochs, Feuerriegel, & Neumann, 2016), and online product reviews (Ullah, Amblee, Kim, & Lee, 2016). Along with different topics, our review indicates varying research purposes and NLP applications.
At the most basic level researchers using NLP decide to adopt a “bag-of-words” model and/or a “bag-of-phrases model” (Ullah et al., 2016) with some studies suggesting that bag-of-phrases model is preferable (Arnon & Snider, 2010; Goel et al., 2010; T. Lee, 2007) because phrases better capture the linguistic context (Ong et al., 2005). With the bag-of-phrases model, a researcher must then decide which N-grams (contiguous sequence of n items) to work with, including bigrams and trigrams (Ullah et al., 2016).
Within our review sample we noted three key research purposes—classification, information retrieval, and decision making. The research purpose determines the type of NLP application and some researchers broadly classify NLP applications into shallow and deep (T. Lee, 2007). In shallow NLP applications the focus is on generating textual components, whereas in deep NLP applications the emphasis is on generating meanings. When NLP was applied for classification we noted two potential application scenarios. In the first scenario, researchers had predetermined categories and used shallow NLP (relying on linguistic algorithms) to build data dictionaries by extracting textual units. After extraction, textual units were manually assigned by multiple coders to the predetermined categories (e.g., Crowston et al., 2012; Pandey, Pandey, & Miller, 2017). NLP in this case was used to assist coders. In the second scenario, researchers used deep NLP to inductively generate ontologies by using textual unit (words or phrases) vectors for each document in the corpus and then applying clustering to identify initial concepts and unsupervised machine learning to identify subelements (e.g., T. Lee, 2007). The second NLP application within our sample was information extraction. Researchers used NLP for information extraction by applying dictionaries, manually coded rules, and statistical filters (e.g., Li, Cai, & Kamat, 2016). When relevant dictionaries did not exist, researchers used statistical algorithms for information extraction (Ong et al., 2005).
The third NLP application in our review sample was decision making. One example of decision making was sentiment analysis (Pröllochs et al., 2016) in which shallow NLP processing is a precursor step, followed by one of several different approaches for generating the sentiment scores. The approaches included the use of dictionaries, coded rules, or machine learning. Another example was the use of NLP for predicting outcomes (Sokolova & Szpakowicz, 2007; Tixier et al., 2016). As with sentiment analysis, the approaches for predicting outcomes included the use of dictionaries, coded rules, and unsupervised or supervised machine learning. One requirement with machine learning is a large enough sample for training; if the availability of a large sample is an issue, then rule based approach is the preferred option (Ku & Leroy, 2014). It is important to keep in mind that research on NLP algorithms, applications, and tools is actively ongoing and there is no consensus on which approach for prediction—dictionary-based, dictionary-plus-rule-based, supervised machine learning, or unsupervised machine generates more accurate and reliable outcomes. We present summary findings from our review in Table 1.
Summary Findings From a Review of 50 Articles From the Business Source Premier Database With the Keyword “Natural Language Processing” in the Abstract.
aIn the case of 5 articles, the term NLP was used but not applied.
bFor this group of articles, one article did not state whether words and/or phrases were used and one article used both types of machine learning (ML).
Several researchers and websites list NLP tools (e.g., eMerge, 2016; Haas, 2008; Neubig, 2016). Neubig (2016) lists almost 50 NLP tools for different NLP tasks such as dependency parser (e.g., Cabocha and MaltParser), finite state modeling (e.g., Kyfd, OpenFST), general NLP libraries (e.g., NLTK and OpenNLP), language modeling (e.g., ITSTLM and kenlm), machine learning (e.g., AROW++ and Classias), phrase structure parsing (e.g., Berkeley Parser and Stanford Parser), pronunciation estimation (e.g., KyTea and mpaligner), and speech recognition (e.g., CMU Spnihx and Juicer).
Our primary interest is in text analysis, therefore, we also reviewed contemporary lists of text analysis software. In one such comprehensive list, Klein (2014) lists 62 text analyses software grouped into two categories: (a) linguistic, dealing with language, and (b) content, dealing with human communication. The text analyses software dealing with content are further classified into qualitative, event data, quantitative with category systems, quantitative with no category systems, coding open ended questions, and text summarization. Of the 62 tools on Klein’s (2014) website, the popular ones are ATLAS.ti, Concordance, Diction, General Inquirer, QSR NVivo, Textstat, WordSmith, WordStat, and T-Lab. Interestingly, there was no overlap between the NLP tools list (Neubig, 2016) and text analysis software list (Klein, 2014). We expected at least the general NLP libraries—NLTK and OpenNLP—which are not application-specific, to be on the list. This suggests that NLP capabilities are in the early stages of being explored and used in textual analysis research.
Computerized Textual Analysis in Extant Research
The benefits of computer programs in textual analysis have been discussed over the past two decades (e.g., Illia et al., 2014; Krippendorff, 2004; Morris, 1994; Short & Palmer, 2008; Weber, 1990). Researchers agree that computer programs help overcome limitations in manual coding of text. With the use of appropriate computer programs textual analysis can be done faster with reduced rater bias and reliability concerns (Short et al., 2010). It is surprising then that there are only a limited number of studies—within the management discipline—that apply computerized textual analysis (for notable exceptions, see Gibson & Gibbs, 2006; Illia et al., 2014; McKenny et al., 2013; Short et al., 2010). B. Lee (2012) argues that textual analyses—both manual and computerized—are underutilized in organizational research in part due “to a belief that documents are too readily available to be of any real use” and also due to lack of clarity regarding “how to source, verify and analyse them to answer research questions” (p. 390).
Some scholars have responded to concerns about lack of procedural clarity for textual analysis and outlined guidelines for future research. For example, Short and Palmer (2008) outlined and illustrated use of the DICTION software for assessing mission statements. In a later study, Short et al. (2010) outlined steps for assessing construct validity when computerized textual analysis methodology is used for measuring constructs. In both studies, (a) the unit of textual analysis was at the single-word level and computations derived from word occurrences were used for construct measurement and (b) custom dictionaries, deductively and manually prepared a priori, were input into the computer program to facilitate faster and more reliable coding. In a more recent article, Illia et al. (2014) question the use of a priori defined dictionaries. They argue that text must be analyzed in context and advocate an inductive approach in which researchers develop dictionaries through analysis of the textual data. Their study introduces co-occurrence text analysis methodology with the software program ALCESTE. Illia et al. (2014) note that one benefit of co-occurrence text analysis methodology is that “The researcher analyses quotations only once these have been identified as being either typical or untypical of a specific co-use of words. … Thus, a significant advantage … is that human bias is controlled, as human coding is guided by an informative report.” (p. 356).
A comparison of the two approaches—Short and Palmer (2008) and Short et al. (2010) versus Illia et al. (2014)—can be understood by examining the characteristics of their data dictionaries. Table 2 presents different characteristics of data dictionaries employed in textual research and is based on our review and understanding of textual analysis across multiple-disciplines. The data dictionary for entrepreneurial orientation in Short et al. (2010) was manually generated with the Rodale’s thesaurus as the primary source of dictionary items. The dictionary comprised individual words and was evaluated using construct validity tests. The data dictionary for impression management in Illia et al. (2014) was manually generated with machine help (co-occurrence analysis) with a corpus of documents as the primary source of dictionary items. The dictionary comprised individual words. The data dictionary we develop in this study is manually generated with machine help (applying NLP capabilities to extract phrases) with the Rodale’s thesaurus and a corpus of documents as two primary sources of dictionary items. Our dictionary comprises of phrases and was evaluated using construct validity tests. 1
Characteristics of Data Dictionaries Used in Computerized Textual Analysis.
Note: Short et al. (2010) suggest using an inductive approach to complement the deductive approach.
Our measurement approach—like prior studies that have used data dictionaries for construct measurement—is rooted in principles of compositionality. In an essay on compositionality in the Stanford Encyclopedia of Philosophy, Szabó (2013) emphasizes compositionality, a foundation of contemporary semantics, by noting that “anything that deserves to be called a language must contain meaningful expressions built up from other meaningful expressions.” The principle of compositionality postulates that the meaning of an expression in a language can be determined by the meanings of the constituents of the expression (i.e., semantics) and its structure (i.e., syntax, logic). Although the compositionality thesis is widely accepted, we acknowledge that there are dissenting voices who argue that “meanings of larger expressions seem to depend on the intentions of the speaker, on the linguistic environment, or on the setting in which the utterance takes place” (Szabó, 2013).
On a similar note, Velardi, Fasolo, and Pazienza (1991, p. 154) note that “despite the interest that semantics has received from the scholars of different disciplines since the early history of humanity, a unifying theory of meaning does not exist.” Thus, compositionality…is about bottom-up meaning-determination,” whereas the context principle (Frege, 1953, Reck, 1997, Wittgenstein, 2009) is about “top-down meaning-determination” (Szabó, 2013). The researcher (e.g., psychologist, lexicographer, text engineer) and his or her research objectives determine how semantic knowledge is modeled (Velardi et al., 1991, p. 154). Text analysis researchers have systematically coded textual data by analyzing “patterns of emphasized data in text” (McTavish & Pirro, 1990, p. 245). It is a common practice in content analysis to define and develop dictionaries for targeted latent meanings (e.g., Hart, 1984; McTavish & Pirro, 1990; Tetlock, 1981) which are then applied to textual data. In the next section we discuss the two types of phrases we extracted for developing the data dictionary.
The Linguistic Components Noun Phrases and Verb Phrases
According to the Dictionary of Linguistics and Phonetics (Crystal, 2011, p. 432) a sentence is the largest structural unit/linguistic component in the grammar of a language. There are several competing models for analysis of the sentence structure (Colmerauer, 1975; Pereira & Warren 1980; Woods, 1970), but all models include two of the most frequently occurring linguistic components—noun phrases (NPs) and verb phrases (VPs). NPs are “the constructions into which nouns most commonly enter, and of which they are the head word” (Crystal, 2011, p. 333). VPs are “equivalent to the whole of the predicate of a sentence, as is clear from the expansion of S as NP+VP in phrase-structure grammar” (Crystal, 2011, p. 510). Given the critical role of NPs and VPs in the grammar of a language, in this study we extract and work with these two linguistic components.
NPs and VPs can be extracted from textual content using NLP software applications; NLP applications are available as freeware and also as commercial software packages. At the core of all NLP software applications are algorithms for tokenization, sentence segmentation, part-of-speech tagging, chunking, and parsing. NLP algorithms have been developed and patented by artificial intelligence, linguistics and text engineering researchers (e.g., Barker & Cornacchia, 2000; Byrd et al., 1995; Popescu & Etzioni, 2007). These algorithms have been incorporated into NLP software applications, including OpenNLP, MuNPEx, LingPipe, FastNPE, and NPtool. We tested three NLP software applications—two freewares and a commercial software on the following quote from Apple’s 10-K filing for 2012.
“Due to the highly volatile and competitive nature of the industries in which the Company competes, the Company must continually introduce new products, services and technologies, enhance existing products and services, and effectively stimulate customer demand for new and upgraded products.”
We found OpenNLP software application in the General Architecture for Text Engineering Environment (Cunningham et al., 2011) to be the most effective in extracting NPs and VPs; the other tools did not extract as many phrases. In Table 3 we compare how our input statement was parsed by textual analysis programs at the single-word level and the phrase level. Parsing at the single-word level resulted in 37 unique words, whereas parsing for NPs and VPs extracted eight NPs and four VPs.
Results of Text Parsing at the Single Word and Multiword Phrase Levels.
Note: Single words were extracted with ATLAS.ti software. Multiword phrases were extracted with the OpenNLP English software.
The parsing and extraction results in Table 3 underscore the benefits of NLP capabilities. As the results in Table 3 show, with “multiword” level NP and VP extraction we can differentiate between “new products” and “existing products.” With single-word-level parsing the three words “new,” “existing,” and “product” do not retain the complete linguistic meaning. In a recent study, Arnon and Snider (2010) found that text “comprehenders” are sensitive to the frequencies of multiword phrases and that this effect is not reducible to frequencies of individual words or substrings. Their finding further supports our emphasis on including “multiword” phrases instead of limiting analysis to single words for computerized textual analysis. In the next section, we illustrate how NLP capabilities can be used in computerized content analysis by developing a measure for organizational culture.
Natural Language Processing Techniques in Textual Analysis: An Illustration Using the Organizational Culture Construct
Two recent studies offer comprehensive guidelines for conducting research using documents (B. Lee, 2012) and for computer assisted textual analysis (Short et al., 2010). Whereas B. Lee’s (2012) guidelines broadly apply to approaches for document analysis, Short et al.’s (2010) guidelines apply specifically to studies that use computerized textual analysis for construct measurement. According to Short et al. the following need careful attention when using computerized textual analysis: (a) content validity, (b) external validity, (c) dimensionality (for multidimensional constructs), and (d) predictive (nomological) validity. Short et al. synthesize prior studies on computerized textual analysis, highlighting conflicting or vague recommendations, and propose their recommended alternative for future studies. In this study, we follow the approach suggested in Short et al. with appropriate modifications as necessary. Our approach is outlined in Table 4 and we discuss each step in the subsections below.
Outline of Steps Followed for Preparing Data Dictionary for Organizational Culture.
Preparatory Steps—The Organizational Culture Construct and Its Definitions, Dimensions, and Measurement
To take stock of definitions, dimensionality and measures of the organizational culture construct we conducted a comprehensive review of recent empirical articles. Using the Business Source Premier Database, we searched for all peer-reviewed, scholarly articles published on the topic of organizational culture from January 2008 to December 2012. We did not use filters to limit search results based on research discipline or journal ranking. But, we ascertained that organizational culture was the research focus of each study—the article title or article abstract needed to contain the word “organizational culture.” Our search resulted in 356 articles from multiple disciplines, such as management, marketing, psychology, sports marketing, software engineering, and health care. We then narrowed our focus to empirical articles and reviewed conceptualization and operationalization details of the organizational culture construct; studies that did not provide satisfactory operationalization details were excluded from our review sample.
We found that all studies in our review sample referred to seminal organizational culture studies (e.g., Deal & Kennedy, 1982; Denison, 1990; Hofstede, 1980; Kotter & Heskett, 1992; Martin, 1992; Schein, 1985) for conceptual definition of organizational culture. Although more than 100 variant definitions of organizational culture exist (Detert, Schroeder, & Mauriel, 2000), we found common themes in definitions of the organizational culture construct. These common themes include assumptions, attitudes, beliefs, values, and norms of behavior shared by individuals in organizations and manifested through perceptions, behavior, work practices, and organizational artifacts. Our review also indicated that researchers have advanced the notion of heterogeneous organizational cultures and multiple subcultures within organizations (e.g., Sackmann, 1992; Trevino & Nelson, 2010). Organizational subcultures can exist at different levels—groups, teams, departments, and hierarchical levels in an organization. These subcultures compete for dominance; at the same time they avoid an organizational collapse (e.g., Martin, 1992). Despite the subcultures that exist within organizations, top managements’ espoused organizational culture predominates and it permeates and influences subcultures at different organizational levels. It is shared by and enacted on by lower level employees and work units (Meyerson & Martin, 1987).
Several instruments for measuring organizational culture have been proposed in the four decades of academic interest in the topic (e.g., Cameron & Quinn, 1999; Denison & Mishra, 1995; O’Reilly, Chatman, & Caldwell, 1991; Van den Berg & Wilderom, 2004). Although all instruments operationalized organizational culture as a multidimensional measure there were differences in the research focus and in the organizational level at which culture was measured. According to our review the three most widely used measurement instruments (for the years 2008 through 2012) were the Denison Organizational Culture Survey (DOCS; Denison & Neale, 2000), Organizational Culture Assessment Instrument (OCAI; Cameron & Quinn, 1999), and Organizational Culture Profile (OCP; O’Reilly et al., 1991). In Table 5 we provide a summary—research focus and organizational culture dimensions and subdimensions—of measurement instruments which have been used in recent empirical studies. In Table 6 we compile/synthesize organizational culture dimensions which are common across the DOCS, OCAI and OCP measurement instruments. After a review of the three measurement instruments—original versions and subsequent variants—we found that six dimensions of organizational culture—competitiveness, control- and coordination-oriented, customer-oriented, human-resource-oriented, innovation- and learning-oriented, and team-oriented—have been most frequently used in recent empirical studies of organizational culture. We work with these six dimensions of organizational culture as we develop the computerized content analysis measure of organizational culture.
Summary of Measurement Instruments for Organizational Culture Most Frequently Used in Recent Research (2008-2012).
Compilation of Common Organizational Culture Dimensions From the DOCS, OCAI, and OCP Instruments.
Deductively Generated Data Dictionary Items
After establishing a working definition and dimensionality for the organizational culture construct, we proceeded with the dictionary development. There are two primary approaches—deductive and inductive—for developing the coding dictionary for construct measurement using computer-aided text analysis. The deductive approach entails two sequential steps: (a) preparation of an initial coding dictionary from theoretical dimensions established in prior research and (b) expansion of the initial coding dictionary by adding an exhaustive list of items from thesauruses and synonym finders (e.g., McKenny et al., 2013; Short et al., 2010). In contrast, the inductive approach entails generating word lists from a corpus of documents (e.g., Illia et al., 2014). We decided to adopt both approaches—deductive and inductive—for generating a comprehensive coding dictionary for the six dimensions of organizational culture.
We first tried to replicate the deductive approach illustrated in recent research (e.g., McKenny et al., 2013; Short et al., 2010). We started with an initial set of words and phrases extracted from the DOCS, OCAI, and OCP instruments (please see last column in Table 3) list and looked up synonyms in Rodale’s Synonym Finder, which has been used in recent studies (e.g., McKenny et al., 2013; Short et al., 2010), and therefore we began with this particular resource. We found, however, that Rodale’s Synonym Finder and other similar sources were poor matches for the words and phrases in Table 6—matching entries for more than half of our initial words and phrases were unavailable. As a result, we were partially successful in compiling coding dictionary items using the deductive approach. We next focused on the inductive approach which entailed generating items for the coding dictionary from a sample of relevant organizational documents (e.g., Illia et al., 2014).
Inductively Generated Data Dictionary Items
Various organizational texts and documents have been used for content analysis (for a comprehensive review, please see Duriau, Reger, & Pfarrer, 2007). These include mission statements (e.g., David, 2003; Morris, 1994; Pearce & David, 1987), letters to shareholders (e.g., D’Aveni & MacMillan, 1990; Geppert & Lawrence, 2008; Short et al., 2010), management discussion and analysis section (e.g., Brown & Tucker, 2011; Bryan, 1997), and annual reports (e.g., Bowman, 1984; Loughran & McDonald, 2011). After researching and reviewing samples of organizational documents that have been content analyzed in prior research, we decided to use the annual letter to shareholders to inductively generate items for the coding dictionary. At this stage, an important question we deliberated on was the following: Can we assume that the annual letter to shareholders will reflect the organizational culture at a given time?
As discussed earlier, literature on organizational culture suggests the prevalence of subcultures within organizations—a firm can thus have a culture espoused by the top management and subcultures within business units or departments or project groups (e.g., Sackmann, 1992; Trevino & Nelson, 2010). Prior research has also argued that despite subcultures that may exist within organizations, the organizational culture espoused by top management retains its strong influence (e.g., Martin, 1992; Meyerson & Martin, 1987). Indeed the organizational culture espoused by the top management is the focus of our study. We assume, therefore, that a high level organizational communication such as the annual letters to shareholders or the annual report is likely to include information about the six organizational culture dimensions—competitiveness, control- and coordination-oriented, customer-oriented, human-resource-oriented, innovation- and learning-oriented, and team-oriented. Our assumption is based on the evidence that authors’ mental models can be mapped by studying the frequency of concepts in a given text (Carley, 1997; D’Aveni & MacMillan, 1990) and that the shareholder letters signal major topics and themes that pertain to top managers (Barr, Stimpert, & Huff, 1992). Furthermore, shareholder letters reflect top managements’ values, beliefs, and ideologies (Short et al., 2010) and construct measures generated from letters to shareholders and annual reports have been used to test relationships with organizational outcomes (e.g., Bowman, 1984; Michalisin, 2001).
Although coding dictionaries have been inductively compiled from documents in prior management studies (e.g., Illia et al., 2014), the approach outlined in our study is new because it introduces NLP capabilities for coding dictionary development. According to Morris (1994) artificial intelligence based content analysis studies are least prevalent in management research. Our coding dictionary development sample comprises a randomly selected sample of 50 companies from the 2012 Fortune 500 list. The Fortune 500 list, which lists the largest companies in the United States each year, has been used by management scholars for over five decades (e.g., Cho & Pucik, 2005; Fombrun & Shanley, 1990; Ridge, Aime, & White, 2014). We also reasoned that a sample of Fortune 500 companies will represent successful companies in which effective leadership (e.g., Bass & Avolio, 1993; Schein, 1985) has established strong organizational cultures (Denison & Mishra, 1995). We extracted two linguistic components—NPs and VPs—from annual letter to shareholder documents using the OpenNLP software. Please see Figure 1 for an outline of steps and settings that we followed for extracting NPs and VPs. Our decision to focus on NPs and VPs was in accordance with NLP emphasis on these two linguistic components. Table 7 lists the 50 companies from the Fortune 500 in our coding dictionary sample and summarizes the number of NPs and VPs extracted and unique word count from the Letter to Shareholder file for each company. Taken together, the OpenNLP program extracted 14,741 NPs and 5,883 VPs from the 50 letters to shareholders.

Steps for extracting noun phrases and verb phrases using OpenNLP within the GATE architecture environment.
50 Randomly Selected Companies From the 2012 Fortune 500 List and the Count of Noun Phrases, Verb Phrases, and Unique Words Extracted From the 2012 Letter to Shareholders.
We reviewed the list of NPs and VPs and made two changes. We replaced proper nouns and numbers with placeholders. For example “LabCorp’s customers” and “Apache’s customers” were both edited to “<>‘s customers” and $1.5 million was edited to # million. After these two edits to anonymize the NPs and VPs, we removed duplicates from the lists which reduced the phrase lists to 8,185 unique NPs and 2,783 unique VPs.
Coding Data Dictionary Items
Two coders then coded the combined data dictionary items for the organization culture dimensions and their indicators (as in Table 6). Both coders had more than 5 years of practitioner experience and graduate degrees in management. The coders were provided copies of Table 6 and the DOCS, OCAI, and OCP measurement instruments. After the first round of coding, intercoder agreement using Holsti’s (1969) measure for the six dimensions fell between 50% and 65%. We convened a meeting with the two coders in which we discussed reliability findings, revisited the dimensions in Table 6, and discussed examples of items on which the coders agreed and disagreed. The two coders then independently undertook two additional rounds of coding, interspersed with another meeting, to improve intercoder agreement, which improved to over 90% for five dimensions and 84% for one dimension—at this stage we decided to proceed with construct measurement and validation using only those phrases on which there was intercoder agreement. A partial phrase list for the six dimensions is presented in Table 8. To recapitulate, we compiled a coding dictionary for the six dimensions of organizational culture in two steps: (a) a deductive approach and (b) an inductive approach in which we employed NLP capabilities to generate phrase lists from a corpus of documents.
Sample of Organizational Culture Coding Dictionary Items for the Six Culture Dimensions.
Validity Tests
According to Nunnally (1978, p. 86), “After a model has been chosen for the construction of a measuring instrument and the instrument has been constructed, it is necessary to inquire whether the instrument is useful scientifically.” The usefulness of a measurement instrument is referred to as its validity. Holsti (1969, p. 142) emphasized usefulness as well and defined validity “as the extent to which an instrument is measuring what it is intended to measure.” Scholars have expressed concerns that rigorous validity tests are missing in studies that use content analysis techniques (e.g., Neuendorf, 2002; Potter & Levine-Donnerstein, 1999; Short et al., 2010; Weber, 1990). In response to construct validity concerns in studies that use computerized content analysis, Short et al. (2010) outline and illustrate construct validity tests that can be adopted and replicated by researchers relying on computerized content analysis for construct measurement.
The validity tests described below are modeled after Short et al.’s (2010) recommendations. Although we conducted similar validity tests we selected sampling frames relevant to our study context. Instead of the S&P 500 and Russel 2000 Index firms used by Short and coauthors, we reviewed published academic research and commercial samples to identify sampling frames best suited for our validity tests. We determined that the annual “Admired List” published by Fortune magazine in partnership with the Hay Group was well suited for our validity tests. 2 The annual “Admired List” of companies published by Fortune magazine has been used in several management studies (e.g., Bear, Rahman, & Post, 2010; Cho & Pucik, 2005; Fombrun & Shanley, 1990). We used three samples from the Fortune 2012 admired lists for conducting validity tests—list of 50 Most Admired Companies, list of 60 champions for each industry, and list of 58 contenders (competed for most admired but were ranked lowest for their industry). We also generated a larger sample of 300 companies from the Fortune 2014 list.
For samples derived from the 2012 admired list we compiled the annual report and the letter to shareholders and for the sample derived from the 2014 Fortune list we compiled the 2014 annual report. Commonly used content analysis programs do not support generating construct scores based on phrase occurrences. We, therefore, wrote code in two programming languages, Python and Visual Basic for Applications, to extract organizational culture scores. 3 The rationale behind writing code in two different languages was to verify reliability of the generated output. We also verified that our custom code used the same normalization procedure employed by DICTION—number of occurrences in the text normalized for per 500-word count to standardize scores for the lengths of the documents. In the paragraphs below, we discuss tests, samples, and results to establish content validity, external validity, dimensionality, and predictive validity.
Content Validity
Content validity (Holsti, 1969, p. 143) is the “adequacy with which a specified domain of content is sampled” (Nunnally, 1978). According to Holsti (1969, p. 143) this type of validity is most frequently used by content analysts and depends on the informed judgment of the researcher(s). Nunnally (1978, p. 92) urges careful planning of procedures for construct measurement because he argues that instead of testing for content validity after the measurement procedure is constructed, content validity can be “ensured by the plan and procedures of construction.” Short et al. (2010) make two recommendations to ensure that researchers make informed judgments and maintain content validity: (a) use of deductive procedures based on a priori theory for creating coding schemes and (b) soliciting reviews by content experts to judge whether the coding dictionary contents adequately represent the construct and its dimensions. We followed both recommendations as we developed the organizational culture coding dictionary in this study. We employed deductive procedures based on a priori theory to establish construct definition, dimensions and initial contents of the coding dictionary. After the coding dictionary was compiled using phrase-level coding, we solicited expert input and made appropriate modifications to the coding dictionary. Our five experts included distinguished social scientists who have published research in leading journals on organizational culture and practitioners who have served as senior executives or strategy consultants to large private sector organizations. Two of our experts had MBA degrees and three had doctoral degrees. Their average work experience was 23.4 years. The average number of years they held a managerial position was 10.4 years; one of our experts was the retired CEO of a multinational corporation and held managerial positions for 30 years. Our experts had consulting experience of 7.75 years on average and teaching experience of 18 years on average. Each expert was provided an excel list of 648 data dictionary items and asked to assign each item to one of seven options—none plus the six organizational culture dimensions. For each dictionary item we determined how many judges agreed with the classification in the previous step. The percentage agreement for each organizational culture dimension is in Table 9. We took action when only one expert or no expert agreed with our classification. For these items, if we observed a consensus (more than 2 experts assigned similarly) we reassigned the item and if there was no clear consensus we deleted the item.
Expert Panel Evaluations.
External Validity
According to Campbell and Stanley (1963), external validity addresses questions about generalizability: “To what populations, settings, treatment variables, and measurement variables can this effect be generalized?” (p. 175). Short et al. (2010) recommend three steps for establishing external validity in studies that employ computerized content analysis methodology for construct measurement: (a) select appropriate samples and relevant narrative texts to examine construct of interest, (b) use more than one sample to evaluate study’s predicted relationships, and (c) compare two relevant samples when possible. To address questions of generalizability we tested the organizational culture measure outlined in this study on two subsamples from the 2012 admired lists—the champions (top-ranked) for each industry and the contenders (lowest-ranked) for each industry. We generated culture scores from two annual organizational documents—the letter to the shareholder and the annual report. Although both documents have been used for content analysis by management scholars (e.g., Bowman, 1984; D’Aveni & MacMillan, 1990; Geppert & Lawrence, 2008; Short et al., 2010), the length of the documents, the details included and the regularity with which they are released varies. We noticed inconsistency in length, content, and details within the letters to shareholders for 2012; the annual report in comparison was more comprehensive and consistent. Our coding dictionary was derived in part from letters to shareholders from a sample of 50 randomly selected Fortune 500 companies, and we believe the comprehensive letters compensated for the brief ones and we successfully generated a comprehensive coding dictionary.
We generated organizational culture scores for the letter to shareholder and the annual report for all companies in the two study samples. We then conducted independent samples t test for the champions sample and contenders sample using scores obtained for both types of documents. The results for the independent samples t test (t statistic and p value) are in Table 10. In addition, Table 10 shows our determination of statistical significance by correcting for familywise Type I error rate using the Benjamini-Hochberg (1995) procedure with false discovery rate controlled at 0.05. When we examined results for the champions (top-ranked) and contenders (low-ranked) samples, we did not observe differences in the scores generated from the letters to shareholders. However, the results for the independent t tests using scores generated from the annual reports show that the champion firms on average have a higher score for the organizational culture dimensions than the contender firms. In addition, the difference is statistically significant for three organizational culture dimensions—competitiveness, customer-oriented, and team-oriented. This finding is as expected because champion firms are the industry leaders with strong reputations for their financial performance and organizational strengths (e.g., Cho & Pucik, 2005; Fombrun & Shanley, 1990). These companies are reputed to have strong organizational cultures and practices. Our tests for external validity are not supported in the case of scores generated from letters to shareholders. In the case of scores generated from annual reports, however, there is reasonable support for the generalizability of our measurement approach.
Independent Samples t test Comparisons of Companies Ranked as Industry Champions and Industry Contenders.
Note: The results of this table are based on computer-aided text analysis using the data dictionary for organizational culture dimensions presented in Table 5.
aSignificance was determined by correcting for familywise Type I error rate using the Benjamini-Hochberg (1995) procedure with false discovery rate controlled at .05.
Dimensionality
Short et al. (2010, p. 337) recommend inspecting the correlation matrix of the computer text analysis generated scores for “significant but not perfect correlations between dimensions of a multidimensional construct.” To conduct tests for dimensionality we used a sample of 300 companies from the Fortune 2014 list and compiled data for organizational culture dimensions from the 2014 annual reports for each company. The correlation matrix for the culture scores, firm profitability measures, and firm market value are reported in Table 11. The correlations between the six culture dimensions are statistically significant with two exceptions (between control- and coordination-oriented and customer-oriented, and between customer-oriented and team-oriented). Two correlations are greater than .60 and the median shared variance between two organizational culture dimensions is 13.1%. To assure ourselves about evidence on dimensionality, we took two additional steps.
Bi-variate correlations of Organizational Culture Dimensions and Firm Performance Measures for a Sample of 300 Companies from the 2014 Fortune 500 List.
Note: Holsti’s (1969) inter-coder agreement proportions are in italics along the diagonal. The results of this table were based on computer-aided text analysis using the data dictionary for organizational culture presented in Table 5.
*p < .05. **p < .01. ***p < .001.
asemipartial correlations controlling for firm size.
First, we reviewed published studies for reported correlations between culture dimensions and found several studies that used two or more culture dimensions but did not report bivariate correlations between culture dimensions (e.g., Deshpande, Farley, & Webster, 1993; Özçelik & Taymaz, 2004). Among the studies that do report bivariate correlations, Cho and Pucik (2005) studied innovation and quality and reported a correlation of .88. Gordon and DiTomaso (1992) focused on stability and adaptability and reported a correlation of .75. Denison and Mishra (1995) on the other hand examined involvement, consistency, adaptability, and mission and reported correlations ranging between .22 and .65. The bivariate correlations reported in prior studies led to our expectation that bivariate correlations between organizational culture dimensions can range from low to high.
Next, we examined a first-order CFA model (see Figure 2) which tested the hypothesis that organizational culture is a multidimensional construct composed of six factors—competitiveness, control-oriented, customer-oriented, HR-oriented, innovativeness, and team-oriented. To conduct this test we randomly sampled the phrases to create three parcels of variables for each of the six culture dimensions. For example, for the “competitive culture” phrases we created three indicators of competitive culture by assigning one third of the variables to Parcel 1, the second third of the variables to Parcel 2, and the final third to Parcel 3. Thus, for the six culture dimensions, we had 18 indicators (3 indicators per dimension). In Figure 2 each indicator had a nonzero loading on the organizational culture factor it was designed to measure and a zero factor loading on all other factors. The six organizational culture factors (dimensions) in Figure 2 are correlated.

First-order CFA model to test the hypothesis that organizational culture is a multidimensional construct.
We implemented the CFA in AMOS. Before we estimated the CFA model we tested for multivariate and univariate normality. A critical ratio value of 325.411 for the normalized estimate of multivariate kurtosis suggests nonnormality in the sample (Byrne, 2010). We, therefore, used ML and ADF estimators to assess and compare model fit (Olsson, Foss, Troye, & Howell, 2000). ADF makes no assumptions about the distribution of the data and estimates the degree of skewness and kurtosis in the raw data (Harrington, 2009; Kline, 2005). One concern with ADF is the required sample size—simpler models require sample size between 200 and 500 and complex models need larger samples (Kline, 2005). The model fit statistics for the hypothesized model using ML estimator are χ2(120, n = 300) = 571.376, CFI = .820, NNFI (TLI) = .770, SRMR = .077, RMSEA = .112. The model fit statistics for the hypothesized model using ADF estimator are χ2(120, n = 300) = 341.873, CFI = .711, NNFI (TLI) = .632, SRMR = .151, RMSEA = .079. The parameter estimates for the model are shown in Figure 2. As expected correlations between some latent factors are high (e.g., between competitiveness and innovativeness) and factor loading for some indicators are low by traditional CFA standards. Although AMOS generated the model fit statistics and the parameter estimates, it warned that the solution was inadmissible. We examined model modification indices to diagnose sources of misspecification in the CFA model. However, the suggested changes to the CFA model did not make sense.
Given the medium to high correlations between the latent variables we tested a second-order CFA model (Byrne, 2010) as in Figure 3. The model tests the hypothesis that organizational culture is explained by (a) 6 first-order factors (competitiveness, control-oriented, customer-oriented, HR-oriented, innovativeness, and team-oriented); (b) each item has a nonzero loading on the first-order factor it is intended to measure and zero-loading on the remaining five factors; (c) covariation among the 6 first-order factors is explained by their regression on the second order factor (organizational culture). The model fit statistics for the hypothesized model using ML estimator are χ2(129, n = 300) = 655.410, CFI = .790, NNFI (TLI) = .751, SRMR = .087, RMSEA = .078. The model fit statistics for the hypothesized model using ADF estimator are χ2(129, n = 300) = 365.331, CFI = .692, NNFI (TLI) = .635, SRMR = .131, RMSEA = .117. The parameter estimates for the model are shown in Figure 3. Unlike the model in Figure 2, AMOS did not generate a warning about the model in Figure 3. Taken together, the scores in the correlation matrix, evidence of similar correlations in published research, and the findings from the factor analyses lend support to our tests for dimensionality.

Second-order CFA model to test the hypothesis that organizational culture is a multidimensional construct.
Predictive (Nomological) Validity
Predictive validity involves using “an instrument to estimate some important form of behavior that is external to the measuring instrument itself” (Nunnally, 1978, p. 87). In other words, it entails determining the extent to which the proposed measures can predict theoretically related constructs (Cronbach & Meehl, 1955). We found some precedents for examining the relationship between one or more dimensions of organizational culture and firm performance (profitability and market value; e.g., Artz, Norman, Hatfield, & Cardinal, 2010; Cho & Pucik, 2005; Simpson, Siguaw, & Enz, 2006). Therefore, we ran regression tests to test the extent to which the organizational culture scores generated for our validation sample (300 companies from the 2014 Fortune 500 list) could predict profitability and market value measures after controlling for the organizational size (number of employees). We measured profitability by generating the three year average for returns on assets (ROA), returns on equity (ROE), and returns on investments (ROI) for the years 2013, 2014, and 2015. 4 We measured market value by generating the three year average for Tobin’s Q scores for the years 2013, 2014, and 2015. We first used OLS regression analyses to test for predictive validity. The regression diagnostics on OLS models suggested the presence of outliers. Therefore, we ran robust regression analyses (MM-type regression estimator using a bisquare psi specification) for the same models (Aguinis, Gottfredson, & Joo, 2013). The results for the OLS regression and robust regression analyses are reported in Table 12. As the results show, robust regression reduced the overall model standard error and the standard error for each predictor in all models. We focus on the robust regression results in the next paragraphs.
OLS and Robust Regression Analyses for Organizational Culture Measures Predicting Profitability and Market Value.
Note: Standardized regression coefficients are reported in the table. Organizational culture dimensions were measured using computer-aided text analysis and the data dictionary for organizational culture dimensions presented in Table 5.
*p < .05. **p < .01. ***p < .001.
achange in R-squared between model 2 with organizational culture dimensions and firm size as regressors and model 1 with firm size as regressor.
The multiple R 2 was higher for profitability measures (ROE = .075, ROA = .143, ROI = .133) than for market value (Tobin’s Q = .053). This finding is consistent with studies that report that culture measures are better predictors of short-term firm performance (e.g., Denison & Mishra, 1995; Kim Jean Lee & Yu, 2004). The results for the four robust regressions models suggest that control and coordination oriented has a positive relationship with ROE, ROA, ROI, and market value; however, this relationship is not statistically significant. Innovation- and learning-orientedness has a positive relationship with ROE, ROA, ROI, and market value; this relationship is statistically significant for three dependent variables. For the remaining four organizational culture dimensions the regression results suggest a negative relationship with the four dependent variables. Considering that the zero-order correlations are positive this is likely due to suppressor effect because the culture dimensions are strongly correlated with each other. Overall, these findings suggest that some culture dimensions (e.g., innovation) have a stronger relationship with firm performance. In their study of Singaporean firms, Kim Jean Lee and Yu (2004, p. 348) note similar findings—in their sample innovativeness and team-orientedness were significantly different across industries.
Our finding that innovation has a positive relationship with profitability and market value is consistent with empirical results reported in management studies (Artz et al., 2010; Cho & Pucik, 2005; Simpson et al., 2006). As we reviewed prior studies to assess how our findings measured up against what has been previously reported, we found that studies adopt one of two approaches—examine one or two specific industries for industry-level differences by dummy coding for industries. The 300 firms in our sample for testing predictive validity belonged to 137 different SIC codes, therefore we considered alternative options for controlling for industry effects. Strategy research has emphasized the competitiveness of technology-intensive industries and therefore we created two groups of companies from our sample—technology-intensive and non-technology-intensive using the Bureau of Labor Statistics list of 31 SIC codes (Office of Technology Policy, 2001, p. 1-5). We reran robust regression analyses after controlling for the nature of the industry (high-tech vs low-tech) and the results are in Table 13.
Robust Regression Analyses for Organizational Culture Measures Predicting Profitability and Market Value After Controlling for Industry Effect.
Note: Standardized regression coefficients are reported in the table. Organizational culture dimensions were measured using computer-aided text analysis and the data dictionary for organizational culture dimensions presented in Table 5.
aChange in R 2 between Model 2 with organizational culture dimensions, high-tech status, and firm size as regressors and Model 1 with firm size as regressor.
*p < .05. **p < .01. ***p < .001.
The results in Table 13 suggest that high-tech industries have higher profitability than low-tech industries. But, their market-value is lower when compared with low-tech industries. This finding is consistent with Lindenberg and Ross’s (1981, p. 29) finding that firms in competitive industries have lower q ratios. When compared with the robust regression results in Table 12 the estimators for the culture dimensions in Table 13 show mostly similar results.
We also tested the interaction effect of high-tech sector status with the six organizational culture dimensions; the results in are Table 14. To address multicollinearity concerns that arise when testing main and interaction effects, we mean centered the continuous variables. However, as the VIF values indicate mean-centering did not address multicollinearity. The results in Table 14 suggest poor support for interaction effects. However, given the high VIF values we cannot make credible inferences from the results in Table 14
Robust Regression Analyses for Organizational Culture Measures Predicting Profitability and Market Value—Testing for Interaction Effects of High-Tech Sector Status.
Note: Standardized regression coefficients are reported in the table. Organizational culture dimensions were measured using computer-aided text analysis and the data dictionary for organizational culture dimensions presented in Table 5.
*p < .05. **p < .01. ***p < .001.
As Kim Jean Lee and Yu (2004) state only a handful of studies have examined the relationship between different culture dimensions and firm performance. Despite this constraint, we were able to report some findings from our tests for predictive validity that are similar to findings reported in other studies.
Discussion
Contribution of This Study
Our study makes two important contributions. First we introduce artificial intelligence based NLP capabilities into computer-aided content analysis methodology. There is growing interest in computer-aided content analysis for measuring management concepts. Although significant advances have been made in recent studies (e.g., Illia et al., 2014; McKenny et al., 2013; Short at al., 2010), NLP capabilities can greatly enhance computer-aided content analysis. Second we propose and develop a computer-aided content analysis based measure of organizational culture. This responds to calls for alterative measures of organizational culture (Van den Berg & Wilderom, 2004). We follow recommendations for establishing construct validity guidelines and demonstrate that the measure of organizational culture dimensions outlined in the study has content validity, external validity, dimensionality, and predictive validity.
Benefits of This Approach, Future Research, and Limitations
The “organizational culture” construct has been used in numerous studies of various organizational phenomena—alliances, joint ventures, mergers, strategic change, strategy, sustainability, innovation, leadership, and trust. But, large-sample empirical studies with this construct as a study variable are difficult due to lack of adequate data sources and measurement options. So far survey-based questionnaires have dominated large-sample studies with the organizational culture construct (e.g., Chatterjee, Lubatkin, Schweiger, & Weber, 1992; Denison & Neale, 2000). Interviews (e.g., Glaser, Zamanou, & Hacker, 1987; Lodorfos & Boateng, 2006) have been used in some studies as well. But, both approaches—surveys and interviews—face two critical contingencies: (a) researchers’ ability to recruit participants in sufficient numbers and (b) participants’ perceptual screens that shape recall of details about prior events. These two contingencies constrain data collection and construct measurement efforts, especially in large-sample studies of prior events. Therefore, we believe that exploration of alternative approaches especially systematic analysis of texts to measure the organizational culture construct provides important advantages over survey and interview methods.
In this article, we presented an alternative approach for measuring the organizational culture construct. In presenting the alternative approach we applied NLP capabilities for developing a data dictionary for organizational culture. We emphasized and demonstrated the use of phrases in data dictionaries for computerized content analysis. Although, a shift from word-level to phrase-level analysis is in keeping with trends in computational linguistics, future research studies can test whether this shift improves construct measures. In addition, computational linguistics researchers are exploring machine-learning that is both human-supervised and unsupervised. Future studies can also explore the import of machine learning techniques to computer-aided textual analysis.
Our study has several limitations. Our measurement approach does not specify the targets of the terms used in the corpus of text. Although phrases capture the linguistic meaning better than individual words, discourse analysis—a prominent approach in organizational analysis—takes a different view of drawing on context to uncover meaning. Alvesson and Karreman (2000, p. 1126) contrast two major approaches to discourse analysis as “study of the social text” and “study of social reality.” It is this latter approach, employing a social constructivist epistemology, that takes a clearly distinct view of the relationship between language and social reality: “It examines how language constructs phenomena, and not how it reflects and reveals it. Discourse analysis is thus distinguished by its commitment to a strong social constructivist view and in the way it tries to explore the relationships between text, discourse, and context” (Phillips & Hardy, 2002, p. 6). Embracing this view of discourse analysis suggests that there is a need to go beyond “content of texts” and also study “where texts emanate from, how they are used by organizational actors, and what connections are established among texts” (Phillips et al., 2004, p. 646).
Our reliance on externally facing documents makes efforts like ours vulnerable to impression management efforts. Despite our efforts to establish convergent validity between culture scores generated through computer-aided content analysis and an alternative measurement approach, we were unable to identify an appropriate index/score. Future studies should prioritize finding or generating alternative scores for tests of convergent validity. Finally, our measurement approach is more etic than emic. Using internal organizational documents, instead of external-facing documents like annual reports, may be useful in developing an emic perspective.
Conclusion
Recent interest in computer-aided textual analysis—for developing alternative measures for constructs like entrepreneurial orientation (Short et al., 2010), psychological capital (McKenny et al., 2013), impression management (Illia et al., 2014)—is very encouraging. But, to leverage the immense potential of computer-aided textual analysis for construct measurement it is essential for management researchers to incorporate cutting-edge research in other disciplines that analyze text as well (e.g., computational linguistics). Sophisticated algorithms and capabilities for textual analysis exist (e.g., Byrd et al., 1995) and can be applied for developing coding dictionaries that include linguistic components—NPs, VPs, adjectives, and adverbs. We expect this study—in which we illustrate the benefit of phrase-level analysis, and develop and validate a measure for organizational culture—will encourage further applications of NLP capabilities in management research.
Footnotes
Acknowledgments
We are grateful to Associate Editor Daniel Newman and three anonymous reviewers for their comments which have greatly improved the manuscript. We are grateful to our content experts for their time and help. Finally, we want to thank Randy Davis and Jian Xie for crucial advice and support.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
