Abstract
A critical comparison of the agency identifier codes in the Federal Employee Viewpoint Survey (FEVS) and FedScope data sets reveals three distinct types of issues will occur when researchers attempt to merge the data sets: (a) a single agency is assigned different codes across data sets; (b) a single code is assigned to different agencies across data sets; and (c) a single code is assigned to two or more agencies in the FEVS data set and a separate agency in the FedScope data set. Between 2013 and 2016, these issues are present in almost all major federal departments. Compatibility issues between the agency identifiers could cause the user to drop observations unnecessarily or unknowingly combine two different agencies’ data improperly. If uncorrected, these issues will distort the analysis of studies that rely on this combination of data. However, researchers can correct for this issue and still use Office of Personnel Management (OPM) identifiers to combine data across multiple data sets.
The U.S. Office of Personnel Management (OPM) is one of the key sources of data on human resources and performance-related metrics for federal employees. In addition to serving as the human resources and policy guide for the U.S. federal government, OPM also routinely collects demographic and survey data on all federal employees. These data are invaluable for both researchers studying federal issues and for practitioners working to provide data-driven responses to issues impacting their organization. However, a critical comparison of the agency identifier codes in the Federal Employee Viewpoint Survey (FEVS) and FedScope data sets reveals that three distinct types of issues will occur when researchers attempt to merge the data sets:
An agency is assigned one agency identifier code in the FEVS data set and a different code in the FedScope data set;
The same agency identifier code is used for one agency in the FEVS data set and another agency in the FedScope data set; and
A single agency identifier code is assigned to two or more agencies in the FEVS data set and a different agency in the FedScope data set.
If these issues are not corrected, this will result in observations for multiple agencies being dropped from analysis because not all variables have a value assigned. It will also result in the researcher inadvertently combining data from two different agencies into a single observation. If uncorrected, these issues represent a substantial threat to the validity of research produced when researchers combine the FEVS and FedScope data sets.
Despite the fact that the FEVS is used as a data source in dozens of peer-reviewed articles and other publications, until Fernandez et al. (2015), Somers (2018), and Resh et al. (2019) articles, few have critically evaluated this data set. In their article, Fernandez et al. (2015) state that one of the “advantage[s] of using the FEVS data is that respondents are coded by agency in a manner that allows researchers to easily merge the survey data with many other sources in the federal government” (p. 388). This is enabled because “[e]ach survey response is coded according to the federal government agency in which he or she works . . . [and] this agency coding scheme, or ones very much like it, are used regularly by federal agencies such as OPM (e.g., FedScope) . . .” and others “to gather, analyze, and report statistics” (Fernandez et al., 2015, p. 388). However, as this article illustrates, although the data sets appear to use the same coding scheme, the codes are not the same between the FEVS and FedScope data sets.
The FEVS data has most commonly been used by public management researchers to conduct cross-sectional research. However, “[m]any critiques in the current public management literature have been levied against cross-sectional survey research” (Stritch, 2017, p. 221). Recently, researchers have begun to think more critically about some of issues inherent in using cross-sectional data and develop approaches to address the potential for introducing threats to their findings from both common methods (Jakobsen & Jensen, 2015) and common source bias (Favero & Bullock, 2015; Meier & O’Toole, 2013). However, these do not address the fundamental issue that “[r]esearchers cannot use static data to directly test dynamic theories” (Stritch, 2017, p. 222, citing Chan, 1998). Given this, recently increased attention has been devoted to longitudinal and panel data because both may allow researchers to identify change and causal relationships (for a discussion of additional issues to be aware of when using longitudinal and panel data, see Stritch, 2017).
Researchers can use the FEVS to conduct panel data research using OPM’s survey weights to aggregate the individual employee responses by organizational unit and analyze these units over several years. The questions that can be explored using the FEVS data are further expanded when researchers combine this data set with other information such as the data provided by FedScope (e.g., Caillier, 2016; Cohen et al., 2016; S. Lee, 2018; S. Lee et al., 2018; Moon, 2018), the Merit Service Protection Board (Rubin & Kellough, 2011), the NoFEAR Act (Rubin & Alteri, 2019), the Central Personnel Data File (Lewis & Pitts, 2018), and others. Combining data sets helps the researcher to avoid single source bias, a common problem with research that relies upon the FEVS. However, to combine the FEVS data set with others, researchers must either find a variable common to the data sets or hand code identifiers into their data sets. There is a coding scheme that is common to FEVS data and other OPM data sets. However, as this research note illustrates, this agency-specific code cannot be used without correcting for the discrepancies in how individual codes are used across the data sets.
Accordingly, first this research note describes the FEVS and FedScope OPM data sets. Second, I discuss the discrepancies in how agency identifier codes are used across the data sets and describe the impact of this issue. Finally, I describe the steps a researcher can take to ensure that their analyses are correct.
OPM Data Sets
OPM is one of the central managers of data concerning the U.S. federal workplace. Although OPM maintains a number of raw data sets, two of the most frequently used data sets are the annual results of the FEVS and workforce data available on the FedScope website. Although each data set can be used independently, it is common for researchers to combine the FEVS and FedScope data sets. Recently, this combination of data sets has been used to explore whether turnover intentions are a good predictor of future turnover (Cohen et al., 2016), the impact of inclusive management on effectively managing diversity (Moon, 2018), and the impact of representative hiring on employee perceptions and discrimination complaints (Alteri, 2018). Given the usefulness of combining these data sets to examine a wide range of management and personnel questions, it is imperative that researchers understand the problems they may encounter when doing so.
FEVS Data
The FEVS data sets contain individual-level responses to a survey that is collected annually by OPM. Although this survey is voluntary, approximately 40% of all federal employees participate each year. This data set is widely used by researchers, practitioners, and individuals assessing the organizational culture and health of the federal workforce. Specifically, researchers have used the FEVS to examine perceptions of equity and fairness based on the federal employee’s sexual orientation (Lewis & Pitts, 2017), procedural justice perceptions in the Department of Defense (Rubin & Weinberg, 2016), the impact of leadership styles on organizational commitment (Moldogaziev & Silvia, 2015), and performance-based human resource management (H. W. Lee, 2017), among others.
Each FEVS data set includes stratified random sample of federal employee opinions on a variety of workplace issues including personal work experiences, work unit, agency, supervisor, leadership, satisfaction, work/life, and demographics (OPM, 2016, p. 1). To ensure that the federal employees responding to the survey are representative, the OPM (2016) has designed the FEVS so that the resulting estimates of perceptions are statistically reliable not only at the overall Federal workforce (i.e., government wide) level but also at the level of pre-identified work units and senior leader status (i.e., whether a member of the Senior Executive Service (SES) or equivalent) (p. 1).
For discussion on other strengths and limitations of the FEVS, see Fernandez et al. (2015), Somers (2018), and Resh et al. (2019).
When these data sets are downloaded, a 4-digit alpha-numeric code is attached to the observation. This code is automatically assigned by OPM and identifies the department and agency that the observation is tied to, for example, AG07 means that the observation is for the Department of Agriculture (AG) and the agency is the Food and Nutrition Service (+07) (2016 FEVS). This 4-digit code is named “plevel1” in the FEVS data sets. There is also a second agency identifier called “plevel2” that is also included in data sets prior to 2016.
FedScope Data
FedScope is OPM’s repository for workforce data on federal employees. FedScope draws data from Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) and is available online at www.fedscope.opm.gov. It includes department and agency-level metrics on employee ascensions, length of service, salary, education level, federal grade level, duty location, occupation, pay, supervisory status, work schedule, and a wealth of demographic data on employee ages, ethnicities, races, and genders. Although the data does not identify individual employees, it does include the most comprehensive summary data available on federal employees.
FedScope contains a user-friendly interface that you can use to look up data on specific agencies or departments. It also contains a function that allows the user to download data sets to be used in analysis. When you use this function, like the FEVS data, a 4-digit alpha-numeric code is automatically attached to the agency-level observation. Although FedScope data sets also include a title of the agency, this field is more difficult to use across data sets and most researchers rely on the 4-digit code when working with multiple observations.
Compatibility Issue Between OPM Data Sets
When you download the data sets, both the FEVS and FedScope include a 4-digit code used to identify the agency or department. The codes are determined by OPM and cannot be customized by the user when the data set is downloaded. The presumption is that these 4-digit codes are the same across all OPM data sets. Researchers use these codes to match or join data sets because matching these codes is an efficient way to combine multiple data sources, while ensuring that you are matching like organizational units. Matching or joining data sets using these codes has been a more reliable way to ensure that you are correctly matching a given agency or department’s data across multiple data sets. Other methods, such as hand entering data into a master data set or matching based on the agency or department name are more time consuming and subject to keying error.
For many agencies, the code is the same in both the FedScope and FEVS data. However, for a substantial number of agencies, the codes do not match across the FedScope and FEVS data sets. For those agencies that do not have identical codes in both data sets, one of three issues is present (see Table 1). The first issue occurs when an agency is assigned one 4-digit code in the FEVS data set and a different code in the FedScope data set. Uncorrected, this issue will result in data for both agencies being dropped from the analysis because not all variables included in the analysis have a value assigned. The second issue occurs when the same 4-digit code is used for one agency in the FEVS data set and another agency in the FedScope data set. If this issue is not corrected, then the data for two different agencies will be combined when the data sets are merged, distorting the results of the analysis. These issues are not limited to departments that have reorganized. Between 2013 and 2016, issue Types 1 and 2 occurred in almost every major department within the federal government, including the Departments of Agriculture, the Army, Commerce, Justice, Labor, Energy, Education, Health and Human Services, Homeland Security, Housing and Urban Development, Interior, Transportation, Treasury, Veterans Administration, as well as the Environmental Protection Agency, Securities and Exchange Commission.
2015 Examples of Coding Mismatch Between the Federal Employee Viewpoint Survey and FedScope Data Sets.
Note. The italicized print indicates a mismatch. FEVS = Federal Employee Viewpoint Survey.
The third issue occurs when a single 4-digit code is assigned to two or more agencies in the FEVS data set and a different agency in the FedScope data set. This issue is partially due to the fact that prior to 2016, OPM used a secondary agency identifier for smaller organizational units, a variable called “plevel2.” Although this means there are unique identifiers within the FEVS data, if a researcher is not familiar with this change and they do not correct for it, then the data sets may not be able to be merged because the data in the variable in the master file you are using to merge or join by is not unique. After the duplication is corrected in the FEVS data, the researcher then needs to correct for the discrepancy between the codes in the FEVS and FedScope data sets. Between 2013 and 2015, the third issue occurred in agencies in the Departments of Agriculture, Health and Human Services, Housing and Urban Development, Interior, Treasury, and the National Aeronautics and Space Administration.
One of the issues inherent to the FEVS is “the aggregation of data at the subagency or agency level results in a significant loss of information” (Fernandez et al., 2015, p. 385). This loss of data is further compounded when the FEVS data sets are combined with other data sets that because all data sets must contain the same panel of agencies or subagencies for comparison. Given this, it is imperative that researchers take steps to ensure that observations are not dropped from their final data sets unnecessarily and that they are not inadvertently combining data from two agencies into the same observation.
Steps to Correct for Data Set Compatibility Issues
To correct for the issues described above, the user should select a master data set and then check each observation in the secondary data sets to ensure that each agency uses the same codes. If the codes do not match, the user must replace the code in the secondary data set with the code that is assigned to that agency in the master data set. As this is a time-consuming process that will likely need to be repeated several times before the data sets are correctly matched, I recommend capturing these commands in some type of reproducible file (e.g., a STATA do-file).
To better illustrate the steps needed to correct for this issue, I describe the process used in a recent project. First, I examined the data with which I was working, in this case the FEVS and FedScope data sets. In this project, I was using a panel data methodology to analyze aggregate FEVS data at the agency-level and included FedScope data as control variables. A quick comparison of these two data sets revealed that there were substantially more agency and subagency level observations in the Fedscope data than in the FEVS data (524 observations, as compared with 219 observations in 2016). Given this, I used the FEVS data as the master file because I would not be able to use the FedScope observations that did not match to FEVS data points and coding according to the smaller population was more efficient.
Second, I used the FEVS agency codebook to compare the agency name assigned to each 4-digit code, with the agency name assigned to the same code in the FedScope data. Note, when downloaded the FedScope data set returns a variable named “Employmentasvalues” that is a combination of OPM’s 4-digit code (named Agency_ID in the example below) and the agency name. This variable can be separated into two separate variables using the following Stata commands: rename Employmentasvalues Agency split Agency, parse(-) limit(2) rename Agency1 Agency_ID drop Agency rename Agency2 Agency_Name drop if Agency_ID==“All Agencies”
A review of these data sets reveals that there are three separate issues that will either cause the researcher to drop observations unnecessarily or improperly combine the data for two different agencies (see Table 1).
The first type of issue occurs when one agency has different codes across data sets. For example, in 2015, the agency Agricultural Marketing Service was assigned the 4-digit code “AG14” in the FEVS and “AG02” in FedScope. This means that, if not corrected, the data for the agency Agricultural Marketing Service will be dropped from the analysis because not all variables have values assigned. The second type of issue occurs when the 4-digit code is assigned to two different agencies. If this happens, then the researcher will inadvertently combine the data for two different agencies when they merge the data sets and the coding software will not return an error message when the data for the two agencies is merged. For example, in 2015, the code “AG07” was assigned to the Food, Nutrition and Consumer Services Agency in the FEVS data and assigned to the Rural Housing Service in FedScope.
Finally, the third type of issue occurs because for a select number of agencies the same code is applied to multiple agencies within the same FEVS data set and that code is assigned to a wholly separate agency in the FedScope data set. This issue occurs because, prior to 2016, the OPM included an additional agency identifier in the data set called “plevel2.” For most agencies, this value in the data set is blank, but for a select number of agencies, OPM assigns a value for both “plevel1” and “plevel2” (e.g., 11 agencies in 2016 had more than one organizational unit assigned to them, impacting a total of 25 organizational units). Depending on the way the researcher chooses to combine the data sets, the coding software may return an error if this is not corrected prior to merging because the 4-digit code in the master file is not unique.
To illustrate this issue type, in 2015, the code AG10 is assigned to both the Farm and Foreign Agriculture Service and Farm Service Agency in the FEVS data, and to the Office of the Secretary of the Agriculture in the FedScope data. For one of the FEVS observations, Farm Service Agency, there is a value populated in the “plevel2” field, but for the other FEVS observation assigned to this code, there is no value in the “plevel2” field. This means that there are always unique identifiers in the FEVS data, but that if you use data sets prior to 2016, you will need to ensure that these unique identifiers all appear in the same variable. There are a number of ways to correct for this issue, but for those agencies that had a value in the secondary identifier, “plevel2,” I first replaced the 4-digit code (“plevel1”) with the “plevel2.” Doing so ensured that I had a unique agency identifier for all of the FEVS observations. After doing so, I then made changes to FedScope data to ensure that the agencies matched prior to merging.
For all agencies whose 4-digit codes do not match across the master (FEVS) and secondary (FedScope) data sets, you must replace the code in the secondary data set with the code in the master data set. To make this replacement, you should use the agency name in the FedScope data set to uniquely identify the variable. As the spelling or abbreviations for particular agencies change over time in FedScope, this may mean using several replace commands for the same agency. For example, Department of Education agencies that are not otherwise classified may be labeled “Dept Ed,” “Department of Education—Other,” “Education Other,” and so on. This means that you will need to run the “replace” command illustrated above with each variation of the agency’s name that is included in your secondary data set. You use the following Stata commands to replace these codes: replace Agency_ID=“AG14” if Agency_Name==“AGRICULTURAL MARKETING SERVICE”
Note that the third type of error will require at least two edits before the data are matched correctly.
Despite these limitations, the 4-digit code still represents the best way of combining data for agencies across multiple data sets. The OPM data sets are an unmatched source of workforce data on federal employees. However, researchers should take caution to ensure that they are not unnecessarily excluding observations or matching agencies’ data incorrectly, thereby distorting their results.
Conclusion
The FEVS and FedScope data sets were not created explicitly for academic researchers (Fernandez et al., 2015). Instead, they were created for use by managers and administrators within the federal government. Given this, perhaps the reason for the coding discrepancies between the two OPM data sets discussed in this article is that they were not intended to be combined. They were created for different agency-specific audiences and with different uses in mind. However, these issues do not and should not preclude researchers from combining OPM data sets in different ways than the agency originally intended. An invaluable contribution that the academy can make to the world of practice is to approach this information in new and different ways. However, we must also realize that we are using this data differently than the purpose it was created for and take active steps to ensure that we are combining these data sets appropriately.
Embracing existing agency data also requires us to continue to critically evaluate our data sources and the assumptions that we as researchers make about them. As this research note has highlighted, we cannot simply use agency data without accounting for its intended use and the nuances that accompany this design. If we fail to do so, then the disconnect between what the agency data was intended for and what we, as researchers, are using it for represents a substantial threat to the validity of research we produce.
Footnotes
Acknowledgements
I wish to thank the reviewers for their insight and comments, as well as Dr. Ellen Rubin for encouraging me to write this paper.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
