Abstract
Despite the proliferation of published work using the U.S. Office of Personnel Managementโs (OPM) Federal Employee Viewpoint Survey (FEVS) data, the scholarly community to date lacks a review of the practices and value associated with how scholars have used the survey data in their research. We turn a lens at the public administration research that has used the FEVS to this point. We assess the extent to which peer-reviewed studies follow the fundamental criteria of conducting empirical studies using survey dataโfrom accepted guidelines and practices for preinferential evaluations of survey data to the reporting of baseline and advanced standards and practices of analytical methods for measurement and quantitative analysis. Our review provides an overarching appraisement of public management scholarship employing the FEVS, which can strengthen the partnership between OPM and public administration scholars as they jointly continue improving the survey instrument and pursue questions critical to effective governance.
Keywords
Introduction
In 2015, Fernandez, Resh, Moldogaziev, and Oberfield (2015) leveled an assessment of the utility and promise of the U.S. Office of Personnel Managementโs (OPM) Federal Employee Viewpoint Survey (FEVS)โa collection of attitudinal and perceptual data from hundreds of thousands of federal employees, largely representative of the federal workforce, within and across agencies and programs. Their review found dozens of peer-reviewed journal articles, books, and other publications that examine a variety of issues central to public administration. Despite the proliferation of published work based on FEVS data over the past decade or so, the field of public administration to date had lacked a comprehensive assessment of the scholarly use, value, and practices emerging from these surveys. Fernandez et al. (2015) provided a thorough examination of the survey, its relative utility to research, and made recommendations that led to substantive changes in the survey (Personnel Management in Agencies, 2016); however, we still lack a systematic assessment of how public administration scholars have used the survey data in their research. The solitary exception is a study by Somers (2018), who has offered an assessment of the psychometric properties and qualities of the FEVS, albeit only in select studies that have built constructs using items from FEVS data.
In this article, we take stock of peer-reviewed publications and the research questions scholars have explored using FEVS dataโwe consider the practices of current survey data use, evaluate the practices related to methods and modeling, and identify an array of best practices and guidance to public administration scholars. By doing so, the article explores the prospects for improving the presentation and academic value in the use of FEVS data or, for that matter, the secondary use of any survey data.
We identify the best criteria for future use of the survey and for proper selection of reporting practices and analytical toolsโfrom preinferential assessment of the survey attributes, selection of valid and reliable items and measures of constructs claimed from the FEVS data, establishment of more rigorous tests of causality, data aggregation practices utilized in longitudinal studies (agency- and subagency-level analyses), to data mergers from other sources to mitigate potential common source or method biases. Such criteria will be of great value for current and future scholars who use survey data (FEVS and others), reviewers and readers evaluating scholarly submissions, as well as practitioners seeking to apply the empirical findings from academic research to public sector organizations. Therefore, our article turns the lens from FEVS survey quality concerns in Fernandez et al. (2015) to the quality of scholarly work that utilizes FEVS. At the same time, our article is broader in its scope than Somersโ (2018), in that construct measurement quality, while extremely important, is only one of the many considerations in this study. The three articles, however, are complementary and we encourage both public administration scholars and practitioners to consider them concurrently when employing the FEVS data in future research and to consider best practices overall in relation to survey research in public administration.
Public Management Research Using FEVS: Criteria for Review
Scholars have used FEVS data to produce numerous publications that examine a variety of issues central to public management, including public sector leadership, organizational trust, performance management, employee empowerment, job satisfaction, turnover, autonomy, and innovativeness. We consider the range of uses, approaches and applications, and value of the scholarship employing these data for explanatory, empirical modeling, and analysis. To do so, we extend the bibliometric 1 and psychometric analyses in Fernandez et al. (2015) and Somers (2018) of all published, peer-reviewed journal articles that use FEVS data as a key data source. We follow Fernandez et al. by using Harzingโs Publish or Perish software (2012), with the key phrases โFederal Human Capital Surveyโ and โFederal Employee Viewpoint Survey.โ 2
The aggregate results for our search produced more than a thousand documents in which the key terms โFederal Human Capital Surveyโ or โFederal Employee Viewpoint Surveyโ were used in the text of a respective piece of scholarship from 2000 to 2016. We removed all documents except published, peer-reviewed journal articles, excluding all books, dissertations, non-peer-reviewed articles, and so on. 3 This produced 56 documents, of which 48 use some segment of FEVS data for empirical modeling as eight of the manuscripts used the data for descriptive-only purposes. To assess research utilizing FEVS data, we then coded each of the 48 articles according to two sets of criteriaโ(a) preinferential analysis standards, referring to reporting practices of survey data use and suitability before the empirical analysis was performed, and (b) applications, reporting practices of methods, and modeling choices after data selection. The latter is done both for practices that meet the needs of proposed research questions (parsimony) versus measurement constructs or analytical approaches that are/would be superior (complexity), and therefore, necessary.
We stress, however, that for users of FEVS (and for users of any other survey data source), there are certain key preinferential criteria that necessarily must be considered by all researchers. At the same time, there are steps that must be tailored to each individual studyโsome of which can be more general (such as reporting of distributions or steps in data cleaning) or more unique (selection of analytical tools, measurement approaches, or a wide-spectrum of postestimation tests and solutions). While we attempt to summarize what these practices are, we do not want to give an impression, whatever the final count of practices one discovers, that each empirical study must necessarily execute all of them without a question. Analytical tools must fit a researcherโs need, but if and when selected, there are ways in which a scholar must ensure quality of an empirical application and robustness of proposed constructs and models. With this important qualification in mind, we present the criteria for assessing the selected articles next.
The first set of criteria regarding the preinferential exploration of survey data in each of the 48 articles is derived from the established core standards of best versus condemned scholarly practices that exist in survey research science. Because FEVS is a secondary data source, one that is already collected by the OPM, its users must adhere to certain reporting and technical assessment practices. There are established guidelines for survey research in the code of ethics of the American Association for Public Opinion Research (AAPOR, 2015) and the report on best practices for survey research by National Science Foundation Advisory Committee for the Social, Behavioral and Economic Sciences (NSF SBE report by Krosnick, Presser, Fealing, & Ruggles, 2015), core books on survey research science and psychometrics (Fields, 2002; Kraut, 1996; Rea & Parker, 2014), as well as proposed peer review guidelines in a number of disciplines that utilize survey data (Bennett et al., 2011; Chambers & Licari, 2009; Dale, 2006; Fincham & Draugalis, 2013; Kelley, Clark, Brown, & Sitzia, 2003; Li et al., 2014). Based upon this rich and relatively uniform base of research principles and preferred practices, we identify the following set of preinference standards or practices for survey data quality, representativeness, and generalizability:
Does the study report and discuss the technical features of data collection in FEVS? What was the sampling approach in the survey and what biases can it introduce to the study? Are sampling weights introduced in the survey and how were they computed?
What is the nonresponse rate in the survey and were the potential nonresponse biases recognized in the study? What was the final sample size in the survey and what is the error rate attributable to nonresponse rate?
Does the study compare the mix of characteristics in FEVS with population features? What are the potential selectivity issues in the sample generation process? How were such biases mitigated or dealt with?
What is the level of data in the study? Was the microlevel of survey data preserved or did the study aggregate the cross sections to generate longitudinal data sets at subagency or agency levels of analyses? What are the potential information benefits or costs of such aggregation?
If subsamples of FEVS are used, does the study justify the decision? What are the potential information benefits or costs of such data separation?
We provide three tables that organize the 48 articles that comprise our analysis. Table 1 reports the coding results for the preinferential criteria. As we summarize our findings for each preinferential analysis standard, we explain our rationale for its inclusion in the next section โPreinferential Analysis Standards.โ We argue that, prior to using FEVS, all scholars must seek to convince the readers and convey to them that the survey is the right data set for their study.
Preinferential Analysis Standards Usage With FEVS Data.
Note. FEVS = Federal Employee Viewpoint Survey; HLM = hierarchical level modeling; IRS = Internal Revenue Service; OCC = Office of Comptroller of the Currency; SES = sSenior Executive Service.
The second set of criteria concerns applications, reporting practices of methods, and selection of modeling techniques that were employed after FEVS data were chosen. We explore the range of quantitative applications and methods of analyses, the extent to which researchers have established reliable and valid measures of various constructs found in the organizational behavior, human resources management, and industrialโorganizational psychology literatures; the extent of rigor used in tests of causality and/or association; the ways in which FEVS survey data have been merged with data from other sources; and how scholars have used this survey to push forth the boundaries of public administration research. Given that researchers utilize FEVS for answering a variety of research questions, five baseline criteria are more generic and ensure parsimony of a given model, whereas five advanced criteria are specific to each article, if and when necessary, likely requiring more complex solutions.
To assess the relative use of such empirical practices, we coded each of the 48 articles according to baseline criteria that are parsimonious with the research questions asked. The baseline standards are as follows:
1.โIs the survey data set appropriate for tackling the research question? Does the study justify why FEVS is more suitable relative to other sources and types of data? Does the study note the limitations of survey data use, if any?
2.โWhat type of modeling does the study use for explanatory analysis? Is the statistical method appropriate for testing the proposed model?
3.โGiven the research question, does the study establish face and content validity for the measures of both the dependent variable(s) (DVs) and main independent variable(s) (IVs) of interest using items from FEVS (this applies to both single and multiple items from FEVS)?
4.โDoes the study report goodness-of-fit statistics for the main inferential model? Does the study offer pre- and postestimation robustness tests? With regard to model stability, does the study address issues of endogeneity?
5.โDoes the study account for common method or common source bias in the main inferential model? In what way? Are the limitations of using FEVS as a sole source discussed?
Advanced criteria, though increasing in complexity, are often applied for tackling more intricate questions, fitting multifaceted measurement objectives, or controlling for potential biases. Such advanced standards, if and when necessitated by research questions and subsequent methodological and statistical concerns, are as follows:
6.โDoes the study quantitatively test scales using FEVS data for reliability? Does the study discuss the benefits and limitations of using multidimensional scales?
7.โDoes the study establish convergent and discriminant validity of the main variables of interest that use items from FEVS?
8.โDoes the study test for mediating or moderating impacts through interactions among various constructs in the main inferential model?
9.โDoes the study utilize structural equation modeling (SEM) to take advantage of advanced SEM properties for survey data analysis?
10.โDoes the study replicate the analyses using holdout (part) samples or analyses with versus without certain cohorts of respondents in FEVS?
Table 2 reports the coding for the second set of 10 criteria. In the following section titled โApplications, Reporting Practices of Methods, and Modeling Choices After Data Selection,โ we summarize the findings for Criteria 3 to 7. Although the articles were coded on Criteria 1 and 2, we felt we could not undertake a systematic assessment of the suitability of FEVS data for tackling the research question or the appropriateness of the modeling approach for explanatory analysis in a post hoc fashion. In a recent assessment of FEVS, Fernandez et al. (2015) explain the rationale for creating the FEVS and the content of the survey. As they note, the FEVS was designed expressly to track efforts within the federal government to manage human capital, not for use by researchers to answer questions of interest primarily to the scholarly community. Hence, we suspect there are more than a few instances in which other data sources would have proved more suitable than FEVS for answering the relevant research question. The articles were also coded on Criteria 8 to 10, but space limitations preclude a detailed summary of the results from the coding.
Public Administration Journal Articles, Coded According to the Use of FEVS Measures.
Note. FEVS = Federal Employee Viewpoint Survey; Face = face validity check; IV = independent variable; DV = dependent variable; Conv. = convergent validity check; Disc. = discriminant validity check.
As we summarize the findings for each of these criteria, for both baseline and advanced standards, we again explain our rationale for their inclusion in the analysis. A caveat is in order before we proceed with our exploration of studies using FEVS. While we take stock of scholarly practices along the standards we identified, it is conceivable that not all of them are explicitly present in the final published articles. We trust that even if not directly addressed, the relevant and necessary questions have been addressed at academic and professional conferences and during the peer review processes. Notwithstanding, the standards we identify are important for quality of research using FEVS data, which implicitly or explicitly may have been addressed by the authors or requested by the discussants and reviewers, if and when necessary, and evaluated in the final versions of the articles or in the documents generated during the peer feedback and review process. We are mindful of the space limitations that journals must adhere to, which no doubt put pressures on the authors of the articles. However, these standards, some certainly study specific, must still be mentioned to ensure that research quality, reliability, replicability, and scholarly knowledge generation are ensured.
Briefly, the empirical articles we analyze evaluate a fairly broad range of outcomes of interests or DVs. The two most commonly used DVs are job satisfaction and turnover intention, each appearing in eight articles. The former was typically measured using a single item for overall job satisfaction and the latter with a series of categorical indicators measuring whether or not an employee intended to leave his or her current job with the federal government, and if so, for another job with the federal government, another government entity, or the private sector. The prevalence of these two DVs in this literature is not surprising, in so far as they represent two sufficiently narrow and singular attitudes toward work that can be measured with a single survey item (see Judge & Church, 2000; Wanous, Reichers, & Hudy, 1997). Perceived performance was the third most commonly used DV, appearing in seven articles, and usually measured with a single item capturing the employeeโs perception of his or her work unit performance. An employeeโs level of trust in leaders/supervisors and individual motivation to innovate each appeared in two articles as DVs.
Preinferential Analysis Standards
Truman (1945) has outlined the uses and benefits of surveying public opinion and surveying administrators to achieve better results. According to him, survey data can be incorporated in studying, systematizing, and remodeling of organizations. This is important both for people in the government and outside the government workforce. Daneke and Klobus-Edwards (1979) and Robbins (1999) have identified the basic criteria of conducting quality survey research in public management and for public managers and proposed a basic set of โprocedural guidelinesโ for reporting. A report to the National Science Foundation by Krosnick et al. (2015) highlighted the practices of using survey data in social sciences and the ways for improving transparency and dissemination of such data sets looking forward. What is more, there is an ongoing concerted push toward more transparency of published work, broadly in social sciences and public management, specificallyโall to enhance openness and transparency of research projects, materials, data sets, and analytical codes. 4
Technical Features of FEVS Data
The first preinference standard that we use in our evaluative criteria is whether a given scholar reports and discusses the technical features of data collection in FEVS. This should, at a minimum, include what sampling approach was used in the survey and the biases that can be introduced as a result of this approach as well as the relative sampling weights that are recommended by OPM (and whether those weights were then applied in the secondary analysis). Per Table 1, we find that about half of the articles (25/48) in our analysis fail to mention anything regarding the data collection or sampling and weighting procedures. About a third of the articles (17/48) mention that the sampling is agency stratified, 5 whereas a minority of the articles (16%) discuss the sampling strategy and explicitly use weighted variables.
Nonresponse Bias
Next, we assess whether the nonresponse rate in the survey and potential nonresponse biases are acknowledged. This particular criterion is largely ignored in the public management scholarship using FEVS. Indeed, only Choi and Rainey (2014) provide any discussion at all on nonresponse rates in FEVS. Although most report the sample size, and many report the response rate, we could not find any studies in our sample that meet the criterion of acknowledging potential nonresponse biases. Indeed, the 2017 FEVS wave indicated a drop of 9.4% in participation as job satisfaction stayed flat. Although speculation has attributed decreases in participation to drops in job satisfaction, such potential biases will not necessarily be revealed in the data itself. If data missingness is not at random, then the mechanism that causes missingness must be accommodated in oneโs modeling approach to make correct inference. Only a careful assessment of nonrespondents may reveal potential reasons for such changes, and if not directly modeled, such potential biases due to nonresponse should be explicitly recognized.
Sample Bias
The third preinference standard we derived from the best practices scholarship addresses whether the study compares the FEVS sample characteristics with the larger population of federal employees. Again, only Choi and Rainey (2014) provide a comprehensive accounting of this basic tenet. At the same time, several other studies in our analysis acknowledge the limitations of inference from the sample to the larger population (e.g., Bertelli, 2006) or discuss selectivity bias as a potential risk (e.g., Fernandez, Cho, & Perry, 2010). By and large, however, we find most studies fail to discuss characteristics of the respondents at all and how they compare with the population (41 of 48 articles).
Level of Analysis
We also account for the number of articles that preserved the survey data at the microlevel in the analyses or whether researchers aggregated the cross sections to generate longitudinal data sets at subagency or agency levels of analyses. Either approach is acceptable depending on the relationships the authors are attempting to model. However, when data are aggregated, according to best practices literature, the potential information benefits or costs of aggregation should be acknowledged and discussed. Most of the studies in our analysis (39/48) either leverage microlevel data or both. Some of these use multilevel estimation approaches to model aggregate covariance on individual-level data. Of the 11 studies that employ aggregate data, only Oberfield (2014a) discusses the relative risks of data aggregation.
Subsampling
Our last preinference criterion addresses whether subsamples of FEVS are used and whether a study justifies the decision of doing so. In other words, we are interested in whether researchers acknowledge the advantages and disadvantages of focusing on a subsample and whether generalizability is discussed as a result. We find most studies employ the entire sample in their analysis (31/48). For those that use a subsample, very few acknowledge the benefits or risks of doing so. Of the nine articles that select a subsample of FEVS, Fernandez and Moldogaziev (2013a, 2013b, 2015) acknowledge and justify their omission of a certain cohort of respondents from empirical analyses.
Applications, Reporting Practices of Methods, and Modeling Choices After Data Selection
An early set of basic โrules of thumbโ for public administrators on how to report survey measures and regression analyses is identified in Daneke and Klobus-Edwards (1979). These proposed rules largely comport to our baseline standards. Recent survey research science guidelines also offer a variety of practices for approaching empirical studies using survey data, which encompass both best and condemned practices (AAPOR, 2015; Bennett et al., 2011; Kelley et al., 2003). One of the long-standing fundamental concerns has been whether a survey-generated data set is right for tackling a given research question. Therefore, we begin by evaluating the 48 articles for their justifications of why FEVS is a suitable data source and whether they note the limitations of survey data use, if any. To do so, a fundamental baseline assessment would be the extent to which a researcher has established face and content validity with the measures and data they employ.
Face and Content Validity
How many studies using FEVS data have established face and content validity for their measures of DVs and IVs? Face validity is the weakest and the simplest standard. As cited in Melaia, Abratt, & Bick (2008), Earl Babbie describes face validity as the degree to which measures โjibe with our common agreements and our individual mental images concerning a particular conceptโ (p. 237). Determination of face validity typically relies on expert judgment and establishing logical consistency of a measure with the relevant scholarship on the topic. Content validity, another standard that relies heavily on expert judgment, refers to the degree to which a measure reflects a specific domain of content or range of meanings included within a concept (Messick, 1989). Establishing content validity requires clearly specifying the domain of content, including all of the conceptโs dimensions, developing a representative sample of the content, and operationalizing the content into indicators.
To evaluate the research in Table 2 based on these standards, we looked at both the DVs and the main IVs of interest in a given study and determined whether the author(s) relayed the validity and dimensional inclusiveness of the measures they used from the FEVS through direct comparisons with previous theoretical and empirical work related to the construct. We found that 38% of the articles in our analysis that used FEVS data for their main IV did not explicitly state or describe how they address face and content validity, however self-evident such validity may appear. Similarly, one third of the articles did not do so for DVs operationalized using FEVS data. This could be problematic because of OPMโs apparent disuse of established, validated measures for various organizational behaviors as argued in Fernandez et al. (2015) and Somers (2018).
The importance of face and content validity is particularly relevant when a single survey item is used to measure a given construct. When multiple items are used to construct scales, the establishment of face validity is often inherent in the process of constructing a latent measure. When using single items, however, consistent or constant nonrandom measurement error could be a threat to both the internal validity and reliability of a measure if the item does not comprehensively reflect a construct (Langbein & Felbinger, 2006). Our review of published articles shows that 10 of the 23 studies that used a single-item IV from FEVS offer details of assessments for face and/or content validity. Of the 28 studies that used a single-item DV from FEVS data, 20 established face and/or content validity of that construct.
Internal Reliability
Certain advanced measurement standards may not necessarily be needed for all research questions, but must be dependent on research questions and/or subsequent methodological and statistical concerns. For instance, a number of studies employ multidimensional constructs that use more than a single item in a scale. Somers (2018), for example, evaluates the quality of psychometric measurements that empirical articles have constructed using items from FEVS. He finds there exists considerable variation in how public administration scholars use items from the survey and that there is a considerable bias toward convergent validity. Because multidimensional measures are pervasive in organizational behavior research (Edwards, 2000), it is often necessary to construct scales using several survey items that capture multiple dimensions of a latent construct (Spector, 1992). To ensure the measure is reliably capturing the construct proposed by the researcher, such measures are tested for random measurement error. In other words, if โthe responses to each (measured) question should be a linear function of the (unmeasured) underlying constructโ (Langbein & Felbinger, 2006, p. 210), then the expected value of the error term in that linear equation should be small. A minimum expectation of measures used in organizational behavior research is that the โvariables within a scale should measure the same construct and thus the measure should have acceptable internal reliabilityโ (Fields, 2002, p. xix). This is normally calculated with a coefficient alpha (Cronbach, 1951), which is โdesigned to assess the relative contribution of systematic covariation between pairs of indicators compared to that of a pattern of randomnessโ (Langbein & Felbinger, 2006, p. 210).
We use this as our basis for assessing whether a piece of research in our analysis tests for the internal reliability of a scaled measure for either a DV or IV. We find that 69% (33/48) of the studies in our analysis that employ scales for IVs, and of the 28 studies using a scaled, multi-item DV using FEVS data, only 18 (64%) explicitly test for internal reliability. We find these numbers to be indicative of how FEVS data are currently being employed in public management research. Without a quantitative assessment of the extent to which multiple indicators are systematically correlated to one another, it is difficult to determine whether the measure (when used for multivariate analysis) reflects a common indicator or is susceptible to random noise (Fong, Ho, & Lam, 2010).
Moreover, the articles in our analysis that test for internal reliability of measures do not always explicitly indicate whether the items included in a given scale maximize the internal reliability of that scale, or whether a subgroup of fewer items would produce maximal reliability (Langbein & Felbinger, 2006). As Fields (2002) notes, โmeasures with more items will yield higher coefficient alpha values than measures with fewer items, other things being equalโ (p. xix). Suppose that two scales could capture correlated but distinct constructs. An alpha measuring one collective combination of the items comprising both of these distinct scales could yield an indication of high internal consistency (Spector, 1992). This is so because alpha is a function of โboth the number of items [comprising a scale] and their magnitude of correlationโ (Spector, 1992, p. 31). Therefore, it is important to test whether alpha would increase with the omission of any particular item. One may find that the exclusion of any given item actually increases the overall reliability of the measurement. Furthermore, such analysis should accompany (and not take the place of) a logical and theoretical founding of face and content validity (as discussed above). Reliability analysis provides adequate and quantitative assurance to item selection for a given scale, but measurement ultimately relies on the conceptual development of the construct. Therefore, statistical reliability tests do nothing by themselves to guarantee that a scale is actually measuring a given construct.
Convergent and Discriminant Validity
The concepts social science researchers study are typically embedded in a theoretical context and network of meaning (Cronbach & Meehl, 1955) referred to in the literature as a nomological network. This nomological network predicts a specific pattern of relationships (strong, weak, or nonexistent) between the focal concept and other concepts. It also predicts how the dimensions of a concept will be related to each other and to the underlying construct. Assessing convergent and discriminant validity involves estimating the degree to which empirical relationships among dimensions and constructs reflect the pattern of relationships predicted by the nomological network (Furr & Bacharach, 2008; Spector, 1992). Specifically, convergent validity refers to the degree to which measures of dimensions and constructs are correlated with other dimensions and constructs as predicted by theory.
Discriminant validity, however, refers to the extent to which dimensions and constructs are uncorrelated with other dimensions and constructs as predicted by theory (Messick, 1989). When validating measures of a multidimensional higher order construct, a researcher would want to demonstrate that measures of the separate dimensions are correlated with the latent construct as predicted by theory (convergent validity), while providing evidence of greater intradimensional than interdimensional correlation among measures (discriminant validity; Messick, 1989). A variety of statistical techniques are used to assess convergent and discriminant validity, from rudimentary techniques such as correlated matrices, to more sophisticated ones such as factor analysis, to theory-driven confirmatory factor analysis (CFA) and related diagnostic tests (e.g., average variance explained statistic). While Somers (2018) offers an excellent discussion on the benefits of CFA for the use of FEVS data, which we support as a superior standard, for the purposes of this study, we adopt a much simpler threshold and count any test of convergent/discriminant validity.
We find that 48% of articles using scales for a main IV of interest utilize factor analysis to establish convergent and discriminant validity of the measures they have constructed. At the same time, the use of a single item as a measure should not preclude the researcher from establishing whether this item loads independently onto a separate factor from other items used in the empirical model. Including articles that use single FEVS items as an IV of interest, we find that 41% (9/22) establish convergent/discriminant validity through either exploratory or confirmatory factor statistical techniques. Of articles using FEVS data for the DV, about 25% (10/40) validate the measure using such statistical analyses.
A variety of modeling and analytical solutions are adopted in empirical studies using FEVS. When adopted, such techniques must be relevant for tackling the research questions at hand. We include Table 3, summarizing the modeling approaches and diagnostics that scholars have used when employing FEVS data. Broadly speaking, while model fitting often focuses on direct associations between variables, there are often nondirect relationships that are extremely important as well. In approximately one third of the articles, the authors went beyond independent effects by also including interaction or moderating factors. This is a promising trend, given the prevalence of situational and contingency theories and nonlinear relationships in public management. However, in terms of mediating effects, only three articles employed a statistical approach that enables authors to estimate mediating or indirect effects, in addition to direct ones. This is striking, given that much of management theory points to mediating effects, usually in the form of managerial interventions or other external stimuli conditioning employee attitudes and cognition, which, in turn, influence behavior (e.g., motivation and leadership theories, most of which exhibit this causal structure as in Bass & Bass, 2008; Latham, 2012).
Public Administration Journal Articles Using FEVS, Coded According to Modeling Approaches and Diagnostics.
Note. CMB = common method bias check; Endo = endogeneity check; FEVS = Federal Employee Viewpoint Survey; GLM = generalized linear model; HLM = hierarchical level modeling; Intx = use interactions; MLM = multilevel model; MNLM = multilevel nonlinear model; OLS = ordinary least squares; OPM = Office of Personnel Management; SEM = structural equation modeling; SLM = standard linear regression model.
Model Specification
A general practice in empirical analyses is to ensure that the final selected models are properly specified and are selected from other competing models due to superior fit characteristics. These are often completed using a number of pre- and postestimation tests. Models depend on basic distributional assumptions for the variables used as well as the distribution of error terms in the estimated equations. For instance, in applying ordinary least squares (OLS), a researcher must ensure their model is indeed the best linear unbiased estimator (BLUE); in applying ordered regression, a researcher must ensure that the assumption of the parallel odds assumption is not violated, and so on. We consider any discussed or footnoted information of model fit statistics as acceptable. Generally speaking, almost all the articles in the sample discuss or report relevant model fit statistics, as they should.
A special case of model (mis-)specification is potential presence of endogeneity. The complexity of relationships among organizational behaviors often means that two-way causal relationships are likely at play. Trust in organizational leadership, for example, should be intensely affected by oneโs job satisfaction and vice versa. When such concerns are present, structural equations are a common method for determining the relative endogeneity of the relationships among a set of variables by modeling them as โan entire system in order to see all the feedback loops involvedโ (Studenmund, 2006, p. 476). However, there are other steps that can be taken using FEVS data that can account for the simultaneity of a set of variables.
Not all variables are necessarily endogenous, however. There are fixed characteristics, such as an individualโs race or gender, which will not change due to a relative change in a corresponding variable of interest. Also, lagged endogenous variables by their temporal nature of precedence โ. . . are not simultaneously determined in the current time periodโ (Studenmund, 2006, p. 476). A multistage regression can be employed, in which potentially endogenous variables are โinstrumentedโ for by exogenous predictors in earlier stages, thereby absorbing correlated errors between an IV and a DV, without being directly associated with the DV of the later-stage regression. Finally, endogeneity can also be mitigated by tackling common source and common method issues, discussed below.
Although these criteria do not comprehensively cover the methods through which endogeneity can be reasonably accounted for (and such a coverage is beyond the scope of this article), we accepted a simple, explicit recognition of the potential for endogeneity among variables of interest, recognizing that mere association among various organizational behavior constructs is theoretically interesting and worthy of analysis. That is, endogeneity can be addressed in the hypothesis development stage, where one discusses potential impact of endogeneity, if any. Standard tests for endogeneity may follow, or the studies may recognize and admit limitations of their findings, should endogeneity indeed be consequential for identification of the estimators in the reported model. Of the 48 articles in our analysis, 22 of the studies explicitly recognized the potential for endogeneity in their analyses. While only one study in the sample actually tested for the possibility of endogeneity through a two-stage least squares instrumentation, several of the articles tested for interaction effects using measures of exogenous indicators, and the practice of recognizing potential simultaneity bias in the remaining studies gives us trust that the models are robust.
Twenty articles, of 48 articles examined, used single Likert-type scale survey items as DVs. Authors used a variety of regression models for theseโfrom binary logit and probit respecifications of survey items, to estimations of categorical DVs, including ordered logit, ordered probit, and multinomial logit. In six articles, the authors acknowledge violations of the critical parallel regression assumption when using ordered logit or ordered probit (out of nearly a dozen articles) and opt for the use of multinomial regression.
Survey measures from FEVS lend themselves to multilevel (hierarchical or mixed-level) modeling (MLM) as a promising method of analysisโespecially when combining FEVS individual-level data with subagency- or agency-level information. As Heinrich and Hill (2010) argue, it is โchallenging to think of a governmental context in which a multilevel conceptualization would not be appropriate, even if the relevant data were not available to explore the multilevel relationships empiricallyโ (p. 836). In the context of U.S. federal agencies, in particular, researchers have an advantage of adopting variables that have a great deal of organizational variance. Researchers can exploit the variation among respective organizational attributes that allow them to test the impact these various attributes have on individual perceptions and behaviors, as measured by the FEVS. In addition, the construction of models where individuals are nested within various agency settings allows us to learn how the effects of different individual-level predictors of theoretical interest vary across these settings, based on higher level organizational characteristics. To measure these impacts, an intercept-and-slopes-as-outcomes model conceptualizes the indirect and direct influence of organizational embeddedness traits on individual perceptions and behaviors. We find an emerging use of MLM in research employing FEVS data, where 11 studies rely on MLM to model individual behavior nested within organizations.
A number of studies use structural equation models (SEMs). SEM is a family of techniques designed to test models made up of a presumed set of causal relationships or links between constructs as represented by IVs, mediating variables, and DVs. The limitations of SEM are known (Anderson & Gerbin, 1988; Biddle & Marlin, 1987; MacCullum & Austin, 2000), including the familiar challenge of establishing causality using cross-sectional data that one encounters with standard regression techniques. Nevertheless, the use of SEM enables public management scholars to take greater advantage of the data available in FEVS to develop and estimate models that more explicitly represent the theories on which they are grounded, many of which include complex patterns of relationships between variables, mediating effects, and simultaneous feedback loops. Moreover, by using SEMโeither CFA alone or in combination with a structural modelโresearchers can avoid having to wrongly assume the lack of measurement error in their analysis and estimate the measurement error associated with each latent variable in the model directly.
Common Source Bias
Finally, research that relies exclusively on FEVS data may be susceptible to common method or common source biasesโforms of systematic error variance among variables measured using the same data collection instrumentation technique or the same data source. One potential limitation in the use of self-reported survey data, a situation that may lead to common method bias, is when survey measures and scales are improperly designed or administered. Common method bias is generally believed to produce artificially inflated correlations (Crampton & Wagner, 1994). In some cases, however, it can also deflate correlations (Cote & Buckley, 1988; Podsakoff et al., 2003). Crampton and Wagner (1994) concluded that although researchers need to be aware of possible common method bias, overall this problem appears to enjoy an unduly high level of attention. These authors find that on average the common method bias โis neither dominant nor absentโ (Crampton & Wagner, 1994, p. 73). Indeed, in public administration and human resource management scholarship, there has been a tendency to exaggerate claims about common method bias (George & Pandey, 2017). We remain agnostic on this debate.
Nevertheless, we believe that researchers should take steps to detect the presence of this form of bias and mitigate it. A basic test used to detect the presence of common method bias is the Harman test. This involves performing a factor analysis of all the indicators in the study, with a resulting one-factor solution indicative of a common method bias. It is unclear, however, whether the Harman test is useful at all. A more powerful test is a series of CFAs comparing the model fit statistics for a single-factor model versus those from multifactor models. Even then, statistical techniques may not necessarily remedy problems (e.g., Favero & Bullock, 2014). Other scholars question how much it really matters if a question is about testing perceived outcomes, not certain objective or actual organizational factors (Fernandez et al., 2015; Moldogaziev & Resh, 2016).
Common source bias, however, can be mitigated by pooling several data sources together, such as using FEVS and combining them with organizational data. Combining administrative data or measures of exogenous events or stimuli with single-level, cross-sectional survey data can allow researchers to model exogenous impacts on a construct measured using items from the survey data (a concern also relevant to common source bias or endogeneity discussed above). This last step involves combining multiple data sets, which can also include measures of exogenous stimuli, in addition to other benefits. Eighteen out of 48 papers used more than one data source in their empirical analysis. Of the remaining 29 papers that relied exclusively on FEVS data, only 12 provided an acknowledgment of the potential for common method or common source biases or included any fit tests for such potential biases.
Conclusion and Next Steps
Various methods of analysis employing FEVS data have addressed important questions of organizational performance, leadership, job satisfaction, employee turnover, and other relevant human resource management practices. Using known criteria for best and condemned practices in the use of survey data, we offer an assessment of these existing studies. Despite the advancement of survey research field, G. Lee, Benoit-Bryan, and Johnson (2012) find that survey research in the field of public management is of low quality. The evidence we provide above of public management scholarship employing FEVS data may affirm this assessment to some extent. Both OPM and management scholars are interested in a high-quality FEVS; the two can and must learn from and work with one another. Orr and Bennett (2012) forewarn, however, that โcoproducing academic-practitionerโ research is not without obstacles, but if done properly, such coproduction can enrich knowledge and expertise both in the field of practice and theory-building realms, help make better policy choices, and make research more useful to communities.
Although the focus of the present study was to provide an inventory of the research produced in peer-reviewed academic journals through FEVS data analysis and the corresponding value of that research, there are considerable efforts at descriptive analysis generated by agencies and โgood governmentโ groups using this data source as well. Moreover, academic research in the form of technical reports, books, and journal articles has employed this survey for descriptive analysis. Although such resources are not included in our investigation, we acknowledge that their exclusion is a considerable shortcoming of this article.
We also recognize our study has provided a broad overview of an assembly of individual research efforts that, respectively, provide potentially important insights that should not be overlooked as distinct contributions. Indeed, one takeaway of this study should be that FEVS data are being used to produce (or contribute to) sophisticated and policy-relevant models that yield profitable insights into public management theory and research. However, we contend that there are basic criteria that scholars, reviewers, and consumers of research need to pay attention to for research using FEVS data to provide informative and useful insights into how public organizations function. In this study, we have provided the first comprehensive review of public management research using FEVS dataโboth for practices that one must follow when analyzing survey data and using it in empirical studies. We maintain that the overall trends in this research are continuously improving, based on the generally increasing counts of preanalysis, baseline, and advanced standards that are met by public management peer-reviewed studies.
Overall, preinference analysis is largely unacknowledged in public management research that uses FEVS or other commonly used secondary survey data. Whether this is a function of the norms established by journals or are addressed during the peer review processes remains unclear. We note that the journals that most frequently publish work using FEVS in our analysis show no greater consistency across preinference standards than others. We cannot reject the possibility that many of these procedures and diagnostics are excluded from final publications and provided to reviewers only. Therefore, one particular takeaway from our analysis is that public management research that leverages FEVS should utilize the best practices for preinference established in survey research literature more explicitly. In particular, we would encourage our colleagues to make an explicit effort to enhance openness and transparency of their research for replication and evaluation purposes. This can be accomplished as part of a recent push in social sciences, including public management journals, for advance filing of research proposals, data sets, and codes, and any additional materials that did not make it to the final versions of articles. Such efforts are both laudable and necessary. Whether hosted by authors directly on their personal pages, journal websites, or third-party depositories, more transparency in research is necessary.
Fernandez et al. (2015) provided a critical assessment of the FEVS instrument, finding weaknesses in its content, design, and implementation. They offered a set of recommendations for refining the survey and its implementation with the aim to improve the quality and value of the data. Their efforts subsequently led to a proposed OPM regulation that revised many of the statutorily required questions in the survey โto stronger, relevant and unambiguous questions as well as questions that capture a single conceptโ (Personnel Management in Agencies, 2016). If the research is to build on the dialogue established by Fernandez et al. (2015) between public management researchers and OPM on the future use of the FEVS and the potential for refining this instrument, then we as a research community must abide by certain standards of secondary data use and rules of empirical analysis.
Finally, the use of employee surveys such as the FEVS is a burgeoning trend, with the Organisation for Economic Co-operation and Development (OECD, 2017) reporting that close to 90% of the countries examined use such surveys to gauge employee perceptions of their organizations, the work environment, and management. Like in the United States, approximately half of the countries are undertaking annual surveys of the entire public service, and in several cases, the surveys match or exceed the breadth of topics covered in the FEVS (OECD, 2017). The use of these data sets by academics is likely to grow significantly, raising many of the concerns and challenges discussed in this article about preinferential exploration of survey data and applications, methods, and reporting of findings. Our hope is that scholars in other countries using employee survey data for research will abide by the best practices outlined in this article and avoid at least some of the pitfalls encountered by those using FEVS data to study public organizations and management.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
