Abstract
Impact evaluation plays a critical role in determining whether federally funded research programs in science, technology, engineering, and mathematics are wise investments. This paper develops quantitative methods for program evaluation and applies this approach to a flagship National Science Foundation–funded education research program, Research and Evaluation on Education in Science and Engineering (REESE). Results of three different bibliometric analyses all point to the same conclusion: REESE is an interdisciplinary research program that attracts highly productive investigators who exhibit an additional increase in their productivity rate as a result of receiving REESE funding. Limitations of the bibliometric approach are discussed, and directions are provided for the future of impact evaluations of research programs intended to serve the public good.
Keywords
Raising the quality of math and science education in the United States has become a high priority for the federal government (National Academies, 2007), with President Obama even calling this “our generation’s Sputnik moment” in his 2011 State of the Union speech. The National Science Foundation (NSF) pursues this agenda through funding of programs intended to increase the pool of talent that will produce the nation’s future innovators in science, technology, engineering, and mathematics (STEM; National Science Board, 2007). NSF funds more than 200 STEM education programs, but the impacts of these programs are often poorly understood (Mervis, 2013). In an era of heightened scrutiny of federal spending, the paucity or absence of impact evidence means potentially strong STEM programs are subject to elimination, while ineffective programs may be maintained (Government Accountability Office [GAO], 2011). Public monies are spent on these programs, and impact evaluation plays a critical role in determining whether these monies are well spent. An accurate gauge of impact, and hence of the return on investment, requires careful assessment and quantitative evaluation (GAO, 2012). In this paper, we develop rigorous methods for assessing the added value of federal spending on education research using as our example a flagship NSF-funded STEM research program, Research and Evaluation on Education in Science and Engineering (REESE).
As described in several program solicitations, REESE “seeks to advance research at the frontiers of STEM learning, education, and evaluation, and to provide the foundational knowledge necessary to improve STEM teaching and learning at all educational levels and in all settings” (National Science Foundation, 2010). The REESE program had by the end of 2012 invested more than $330 million in a portfolio of over 500 basic education research projects that foster innovation in STEM teaching and learning, advance theory and method, and coordinate efforts to accumulate knowledge and influence practice. REESE also encourages interdisciplinary approaches; thus the great majority of these education research projects include investigators with backgrounds in a variety of STEM disciplines employing a wide range of research methodologies. As a result, any approach to evaluating this portfolio needs to be able to encompass these project differences in terms of disciplines represented, questions asked, and methods used. Our approach to evaluating the impact of REESE or other similar research programs involves the use of conventional, replicable bibliometric measures of impact (see Figure 1). We do not undertake a comprehensive evaluation but instead aim to show how a common evaluation method can be reliably used to partially evaluate program effectiveness.

Bibliometric approach to evaluating programs of research
Evaluating Research Programs: A Bibliometric Approach
The use of bibliometrics or “scientometrics” for conducting large-scale evaluations has been much studied and refined (Kroc, 1984; Vinkler, 2000, 2008). When evaluating groups of researchers, such as departments, universities, or even nations, the “value of using citation-based institutional rankings as science-and-technology indicators is ‘obvious’” (Borgman & Furner, 2002, p. 19).
Bibliometric analysis is of similar “obvious” use when no common disciplinary or methodological metric exists to evaluate an entire program of research. For example, the use of publication records to evaluate federally funded research programs and institutions has been a focus of Georgia Tech’s Research Value Mapping Program (http://archive.cspo.org/rvm/), including analyses of the impact of Center Grants on publication rates (Gaughan & Bozeman, 2002) and how such changes in productivity are tied to social networks (Dietz & Bozeman, 2005). More recently, this approach has been extended to include the characteristics of the researchers themselves (Sabharwal & Hu, 2013). Our paper builds on this work by viewing all researchers funded by a single federal program as another kind of “group” whose impact can be evaluated using bibliometric indicators.
Yet, admittedly, bibliometric analysis is flawed on many levels (Seglen, 1997; Skoie, 1999). For example, publication decisions are subject to the vagaries of human judgment (e.g., reviewers and editors), publication counts can vary across research fields and may exclude certain products (e.g., books, reports), and measures of journal impact are influenced to some extent by factors unrelated to scientific quality. On the other hand, as has been pointed out previously (Trochim, Marcus, Mâsse, Moser, & Weld, 2008), bibliometrics has improved in rigor and quality over time (Rosas, Kagan, Schouten, Slack, & Trochim, 2011; Schoepflin & Glänzel, 2001) and is recommended by influential scientific advisory groups, such as the Institute of Medicine (2004), for consideration in large program evaluations. Moreover, REESE is a basic research program and thus is particularly well suited to bibliometric analysis since peer-reviewed journal articles are the primary indicator of project “success.” A limitation of this quantitative approach, however, is that we will be unable to answer the question of why REESE might be having an impact on the academy or whether there is a similar impact on policy or practice.
Another potential challenge common to any program evaluation is obtaining the “counterfactual” needed to rigorously assess the effect or impact of the program (Moffitt, 1991). Ideally, one would be able to randomly assign participants to treatment and control groups (e.g., in program vs. not in program); however, the mechanisms for being able to conduct such experimental evaluations of federally funded research programs are quite limited. A regression discontinuity design has been proposed as a strong nonexperimental alternative approach that can replicate experimental results (Schneider, Carnoy, Kilpatrick, Schmidt, & Shavelson, 2007; Shadish, Cook, & Campbell, 2002; Steiner, Wroblewski, & Cook, 2009). This design would compare researchers who applied to REESE and received the award to researchers who applied to REESE but did not receive funding. Unfortunately, data on researchers who applied for but did not receive a REESE award are available only to NSF staff and contractors (Kostoff, 1995). Instead, we carry out three separate bibliometric analyses that use different comparison groups (i.e., counterfactuals) in order to test whether, as we hypothesize, receiving an NSF grant increases the academic productivity of REESE principal investigators (PIs; see Figure 1). These analyses include the following:
Comparing the productivity of REESE PIs to similar researchers using a large, nationally representative sample of STEM doctorate recipients;
Comparing the productivity of PIs before and after receiving the REESE award using an interrupted time series analysis; and
Comparing the quality of journals where REESE PIs publish to journals in disciplines where education research is typically published.
Although each of the three analyses has limitations, as will be discussed below, their combined results potentially strengthen the basis on which to make inferences regarding the academic impact of the REESE program.
Comparing the Productivity of REESE PIs With Other STEM Researchers
The first analysis compares REESE PIs to a nationally representative sample of researchers in STEM disciplines, specifically, the 2008 Survey of Doctoral Recipients. This analysis is not aimed at providing a causal estimate of the impact of REESE but instead provides a baseline comparison of the overall productivity of REESE PIs to a group of researchers matched on key characteristics. This description sets the stage for the quasi-experimental method described in the next section and for the analysis of the relative quality of REESE-funded journal articles presented last.
Method
Each of the three bibliometric analyses in this paper uses journal publication data collected from different subsets of 323 investigators representing 402 projects funded by NSF’s REESE program from mid-2006 until March 2012. As part of an ongoing effort to collect publications funded by the REESE program, the Center for Advancing Research and Communication in STEM (ARC; the resource network for the REESE program) asked REESE PIs to supply the center with current curriculum vitae (CVs). If we did not receive a response, or if the version sent was out of date, we searched online for either a more recent CV or an updated list of publications. This multimode data collection process resulted in 238 complete, up-to-date CVs, a response rate of 74%. The purpose of this data collection was twofold. First, demographic information about REESE investigators could be used to help isolate researcher characteristics that might be related to project productivity and impact. Second, CVs contain publication information beyond that directly attributed to REESE awards and thus are a potential source of data for comparing the productivity of REESE investigators with similar investigators who did not receive REESE funding (Dietz, Chompalov, Bozeman, Lane, & Park, 2000).
Samples and data sources
For a comparison group, we applied for and received permission to employ restricted-use data from NSF’s Survey of Doctoral Recipients (SDR). The SDR is a longitudinal survey of individuals who have received doctoral degrees from U.S. institutions in science (including social sciences), engineering, or health fields. SDR is primarily sponsored by NSF and has been conducted more or less every 2 years since 1973. The SDR is a very well-suited comparison group to REESE PIs because individuals in both samples obtained a doctoral degree, and about 80% of REESE PIs obtained a doctorate in fields of study covered in the survey. We used the 2008 SDR both because this was the most recent year when the key outcome we are interested in—research productivity—was included in the questionnaire and because this year represents a midpoint in the history of the REESE program. This overlap in time allowed us to compare the research productivity of REESE PIs to a contemporaneous, nationally representative sample of doctorate recipients involved in similar research activities, in the process creating a national “benchmark” against which other STEM research programs might also be evaluated. The total SDR sample size for the 2008 survey was 40,093, of which 81% completed the questionnaire. When weighted, these respondents represent 752,000 individuals in the United States who hold research doctoral degrees in science, engineering, or health fields.
Data preparation
We took several steps to make the REESE and SDR samples as comparable as possible. First, to match the profile of SDR respondents, we constructed a REESE analytic sample composed of PIs who had received a doctorate, who had done so by 2008, and who had a doctorate in a field other than humanities or education since these are not included in the SDR. These restrictions excluded 73 of the 238 REESE PIs with completed and up-to-date CVs, leaving a sample of 165 PIs. Second, we restricted the years of publication to 2004 through 2008 in the REESE sample, thus matching as closely as possible the range of years included in the 2008 SDR (i.e., publications since 2003). Third, to facilitate the comparison of SDR to REESE PIs with regard to doctorate field of study, SDR fields of “biological, agricultural, and environmental life sciences” and “physical sciences” were combined into a single category labeled hard sciences. Finally, we included only employed SDR respondents who indicated that their work had a research focus. We also split both samples into those working in academic or nonacademic institutions since the pressures to publish are likely to be less outside the academy, thus potentially biasing the results in favor of REESE PIs, who are more likely to be working in academic settings. Our final analytic sample is composed of 165 REESE PIs and approximately 8,400 SDR respondents, which represents 205,500 STEM doctorate recipients who were employed in a research position as of October 2008. Although we could have selected a random sample of 165 SDR respondents as an alternative, this strategy would have reduced the SDR sample size and therefore reduced the degrees of freedom used in the t test. This strategy would have ultimately resulted in a less precise estimate of the difference in research productivity between groups.
Measures
The key measure of research productivity used in this analysis is the number of peer-reviewed journal articles published or accepted for publication. In SDR, we relied on responses to the question, “Since October 2003, how many articles, (co)authored by you, have been accepted for publication in a refereed professional journal?” For comparability with SDR, only peer-reviewed journal articles published from 2004 through 2008 were included in our measure of productivity for REESE PIs. Although there is no universal format for CVs, journal publications were reported by nearly every REESE PI, including all the elements necessary for analysis (e.g., year, title, authorship, journal name). As described above, we also took measures to include only those REESE PIs with complete listings of journal articles. Articles listed as “under review” or “submitted” were excluded. The year of publication and journal name were also collected. The journal was checked against online databases and/or the source’s website to verify that it had been peer-reviewed. If not verifiable, the publication was excluded. For SDR respondents, the year and subject of the doctorate as well as employment status and sector of employment were obtained from the survey. For REESE PIs, doctorate information was obtained from their CVs, while employment data was provided by NSF administrative sources.
Results
Table 1 shows the composition and productivity of the SDR and REESE analytic samples by doctorate field of study, years since doctorate, and sex. In the SDR sample, those in the health and psychological sciences publish more than those in other fields of study, males publish more than females, and as expected, researchers working in academic institutions publish more than those in nonacademic jobs. The pattern of productivity in the REESE sample differs somewhat, with the largest number of publications in the fields of psychology and computer science and with a smaller difference between the publication rates of women and men.
Comparison of SDR Population Working Primarily in Basic or Applied Research and REESE PIs: Research Productivity by Field of Study, Years Since Doctorate, and Sex
Note. SDR = Survey of Doctoral Recipients; REESE = Research and Evaluation on Education in Science and Engineering; PI = principal investigator.
For both REESE and SDR respondents, academic institutions include 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutes.
For both REESE and SDR respondents, employment in nonacademic institutions includes private/for-profit sector; self-employed (including self-employed or business owner in a nonincorporated business); private nonprofit; federal government and state/local government; 2-year colleges, community colleges, or technical institutes and other precollege institutions; and employers not broken out separately.
REESE sample was created by eliminating PIs matching the following criteria: received their PhD after 2008, received a PhD in education or health, did not provide a curriculum vitae (CV), or provided CV that was not current through 2008.
p < .05. **p < .01. ***p < .001 (based on two-sample t test that adjusted for unequal variances and sample sizes).
Comparing the productivity of the REESE and SDR groups, Table 1 shows that the number of research articles published during the 5-year period was significantly higher in the REESE group (M = 11.40) than in the SDR group (M = 9.65), as confirmed with a two-sample t test that used the Satterthwaite approximation to account for inequality of variances and sample sizes. SDR and REESE PIs did not differ in research productivity by discipline, although the gaps for psychology and computer science approached statistical significance. REESE PIs with more recent PhDs did significantly outperform the same group of SDR respondents (p < .05), and female REESE PIs were more productive than their SDR peers (p < .05).
These results may not come as a surprise since REESE PIs represent a highly selective group of researchers. In fact, the bottom rows of Table 1 show that there is no statistical difference in productivity between REESE PIs and SDR researchers working in a 4-year college or university. We should note, however, that the exclusion of books and book chapters is likely to disadvantage REESE PIs since, as education researchers, they are much more likely to view these as viable publication outlets than are the SDR researchers, who are primarily publishing in the natural sciences (Larivière, Archambault, Gingras, & Vignola-Gagné, 2006). Conservatively, then, we can say that REESE PIs are at least as productive as their SDR peers. What we cannot distinguish from this analysis, however, is whether this high level of productivity is due to the “proper” selection of PIs who get awarded a REESE award (i.e., a selection effect) or to the financial and academic boost provided by the REESE award itself (i.e., a causal effect). We explore this latter possibility next.
Comparing the Productivity of PIs Before and After Their REESE Award
A second method of using bibliometric data to measure the impact of a research program is to examine how the productivity of investigators changed as a result of receiving the grant (Gaughan & Bozeman, 2002). As discussed above, when random assignment is impractical, the evaluation ideally involves comparing the academic productivity of two matched groups, both of which applied for the grant but one group received the award (treatment group) and the other did not (control group). However, such a regression discontinuity design is not always possible (Hedges & Hanis-Martin, 2009; Moffitt, 1991); in the present circumstance, a cooperative agreement with NSF prohibited ARC from accessing NSF records of those who applied for but did not receive REESE funding. An alternative potentially useful strategy is to estimate changes in the quantity of researchers’ publications before and after receipt of an award (Trochim et al., 2008).
Interrupted time series has been proposed as a strong quasi-experimental design for obtaining such before-and-after estimates (Schneider et al., 2007; Shadish et al., 2002; Steiner et al., 2009). Evidence for a causal impact in this design comes from a different level or slope of the two lines estimated from the series of observations before and after the treatment. In other words, the causal hypothesis is that the series will exhibit an “interruption” at, or soon after, the time the treatment was delivered (Bloom, 2003; Steiner et al., 2009). In our analysis, the pretreatment and posttreatment series corresponds to rates of publication observed before and after receiving a REESE award, and we examine whether receiving the award changes the level and slope of academic productivity. For each PI, we have an average of about 19 years before and 4 years after the REESE award. Although our analysis is strengthened by the many years of data prior to the receipt of REESE funding, we cannot evaluate REESE using this method if its impact is delayed beyond our average 4-year window following the award. Barring this lagged effect, our interrupted time series analysis will provide a quasicausal estimate of the impact of REESE on PI’s research productivity.
Method
As before, much of the data for this analysis was obtained from the 238 complete, up-to-date CVs we collected from the 323 REESE PIs in our sample (Dietz, 2004).
Sample and measures
This interrupted time series analysis restricted the sample to 238 PIs who received their REESE award prior to 2009 (n = 124). We did so because we needed at least 4 years of postaward publications in order to describe the difference in research productivity after the REESE award. Unlike the prior comparison between REESE PIs and STEM doctorate recipients working in research positions, this analysis included education PhDs, nondoctorate recipients, and REESE PIs working in nonacademic settings. To be clear, this analysis relies on all PIs who received a REESE award in 2006, 2007, or 2008 for whom we have a complete list of publications, regardless of highest degree attained, field of study, or employment position. As before, we used all peer-reviewed journal publications, not just those attributed to their REESE grant. As noted above, books and book chapters are also likely to be important publication outlets for REESE PIs; however, there is no agreed-upon method for combining these with journal articles to create a single “composite” measure of productivity (Ramsden, 1994, p. 209). Also, our samples of books and book chapters are simply too small to conduct separate analyses. Although perhaps not the ideal metric, peer-reviewed journal articles comprise nearly two thirds of publications by REESE PIs over the period being used for analysis.
Data analysis
We estimated a random-intercept hierarchical linear model in which the dependent variable was the number of journal articles per calendar year each REESE PI had published throughout his or her career. We created a data file and estimated a model in which years are nested within PIs, and therefore, standard errors are adjusted for clustering of records within PIs. Covariates in the model adjusted for productivity differences attributable to factors other than a REESE award and included year of highest degree, receipt of funding from other federal agencies or programs, rank of the institution where the PI attained his or her highest degree, field of study of the highest degree, and sex. The model included a time variable centered on the year the PI received his or her first REESE award (i.e., coded as 0). Calendar years that took place before and after receiving a REESE award were coded, respectively, as –1 (–2 . . . to –40) and +1 (+2 . . . to +7). Effects of time are therefore interpreted as mean annual changes in productivity over PIs’ careers. A second variable, time after, captured annual changes in productivity after the year of the award by coding as 0 the year of the award and all prior years and incrementing by 1 every year after the award (+1 to +7). Time after represents additional annual changes in productivity over and above mean changes observed over the career trajectory and thus suggests productivity changes attributable to the award. Linear and quadratic terms for time and time after were modeled, with the quadratic term capturing nonlinear (accelerating and decelerating) effects of time on PIs’ publication rates. The estimated model is
where i indexes REESE PIs, t is time, After is time after the REESE award, Σcbcxci are covariates included in the model, ui is random variation between PIs that is not a function of time, and rti is random year-to-year variation within PIs.
Results
Table 2 shows the coefficients for the random-intercept hierarchical linear model predicting number of journal publications and adjusting for covariates. As can be seen from these results, REESE PIs published, on average, 0.072 journal articles per year (SE = 0.014) over the course of their careers, with a significant quadratic increase in the publication rate (b = 0.001, SE = 0.0004). After receipt of the REESE award, the PIs’ publication rate increased by an additional 0.360 journal articles per year (SE = 0.131) to produce a postaward publication rate of 0.43 articles per year. This means that, on average, PIs publish one additional journal article over a 3-year cycle during the period after the award (0.36 × 3 years), as compared to the period before the award. PIs publish this additional article above and beyond what they would have published throughout their careers, and this additional article is associated with receiving a REESE award.
Coefficients From Random-Intercept Hierarchical Linear Model Predicting Number of Journal Articles Published Per Year, PIs With a REESE Grant Awarded in 2006–2008
Note. PI = principal investigator; REESE = Research and Evaluation on Education in Science and Engineering.
The beneficial effects of the REESE award were robust to the inclusion of gender, receipt of federal research funding in addition to REESE, year and field of study of highest degree attained, and ranking of institution where this highest degree was attained.
We also explored the possibility of greater/lesser impact among subgroups of REESE PIs by investigating whether changes in productivity might be different for Faculty Early Career Development (CAREER) awardees, which are specifically designed to give promising young scholars an early “boost” to help them establish academic careers. Building on the original multivariate model, we did not find a statistically significant effect for the interaction of award type (CAREER vs. other types) and time or time after (ps > .1). However, the large postaward difference in publications between CAREER and other types of REESE awards, even though nonsignificant in this underpowered analysis, suggests that the CAREER award is particularly impactful on academic productivity. This is evident in Figure 2, which depicts this difference and shows that although CAREER awardees already have a higher publication rate trajectory before the award, the postaward increase in publication rate trajectory is greater for CAREER than other REESE awardees. The majority of CAREER awardees are women (62%), and it is these who seem to benefit the most from receiving a CAREER award (results not shown here). This increase in productivity is likely the result of different processes at play. On one hand, the REESE program may be attracting particularly promising young female scholars who are at the cusp of receiving greater recognition for their scientific contributions. On the other, receiving a CAREER award provides support at a critical time for young researchers, and this support may be especially beneficial for women, who face a higher set of competing demands in their early careers (Sabharwal, 2013). In any case, additional research is needed to replicate these findings in a larger sample.

Annual number of journal publications before and after receiving REESE award, by type of award (CAREER versus other REESE awards)
Comparing the Impact of REESE Publication Outlets With Standard Education Research Journals
A third method of using bibliometric information for program evaluation involves looking at the quality rather than the quantity of research output. Although there is no universally accepted system of journal ranking, there are more or less generally accepted rankings for publications in particular fields of study (Thomson Reuters, 2012; see Schneider, 2009, for a more extensive discussion of issues and concerns regarding ranking of scientific publications). The most widely utilized measure of journal quality is the journal impact factor (JIF) published in the Thomson Reuters Journal Citation Reports (JCR), which was devised to provide a quantitative evaluation of leading journals by taking into account the number of “citable” items appearing in the journal (McVeigh & Mann, 2009). By aggregating articles’ cited references, the impact factors published in the JCR provide a measure of research influence and impact at the journal and discipline levels (Thomson Reuters, 2008). Although we endorse the recent call by the San Francisco Declaration on Research Assessment (2013) not to use the JIF to “assess an individual scientist’s contributions,” we agree with this body of experts that “the peer-reviewed research paper will remain a central research output that informs research assessment,” particularly for assessing the impact of entire programs of research, such as REESE.
The main limitation of the JIF for this purpose is a high level of variability across academic disciplines, perhaps due to disciplinary differences in citation practices and in citations to journals indexed by the JCR (Althouse et al., 2009). To the extent that the JCR excludes influential journals in certain disciplines, the JIF remains problematic. However, differences in citation practices have been addressed by the recent inclusion of the article influence (AI) score, which has advantages to the JIF due to a normalization process that makes it a more accurate metric when comparing journals across disciplines. In the AI score, citations are weighted as well, so citations coming from heavily cited journals count for more than citations from less-cited journals, and citations coming from long reference lists count for less, all of which “would be expected to lessen or remove the large differences between fields that are evident in the Impact Factor” (Arendt, 2010). For this reason, we use the AI score in the analyses below.
Method
In order to assess quality, we focus here on publications that stem from the REESE award. To collect information on REESE-funded publications, ARC employed a multistep process beginning with electronic correspondence to PIs asking them to submit all published research products (journal articles, books, book chapters, and conference proceedings) that resulted from their NSF funding. Next, the ARC team used NSF’s public Fastlane system (http://www.nsf.gov/awardsearch/) to collect information on nonresponders and publications from expired projects. Additionally, ARC conducted searches for REESE publications in Google Scholar using project award numbers, yielding a “virtual” response rate of 100%. This data set was then updated through March 2012, yielding data on a total of 402 projects, their investigators, and their products. Citation counts for each publication were obtained by searching titles in both Google Scholar and Web of Science. The AI score for each publication journal also was recorded from both the 2009 and 2010 editions of the JCR where available.
Results
Of the 752 journal articles published by REESE projects funded from September 2006 to December 2011, 419 appear in journals indexed by the Web of Science, with 348 appearing in journals that have an associated AI score via the 2010 JCR. Averaging the influence scores across the 348 REESE publications yields a mean AI score of 2.33 (SD = 3.35, median = 1.44, range = 0.10–16.82), more than double the normalized mean score of 1.0 for all journals in the JCR. As shown in Figure 3, this difference is even greater when the average REESE AI score is compared to the influence scores of journals in which educational researchers typically publish. For example, the JCR includes 139 journals in the category Education/Education Research, of which 102 have associated influence scores. The average AI score for these 102 education journals is 0.48. Similarly, the average AI score for 97 sociology journals is 0.67, whereas the average AI score for 444 psychology journals is 0.92. This finding suggests that REESE projects are publishing in the most prestigious journals in their fields or are perhaps finding more influential publication outlets in other disciplines.

Impact of Research and Evaluation on Education in Science and Engineering (REESE) publications compared to Journal Citation Reports disciplinary averages
Notably, 22 REESE publications appear in journals with AI scores over 8.0. This includes 12 publications in Science (the second-highest-ranked journal in the Web of Science’s Multidisciplinary Sciences category), 4 publications in the Psychological Bulletin (the second-highest-ranked journal in the Psychology category), and 2 publications in Nature Reviews Neuroscience (the top-ranked journal in the Neurosciences category). In addition, 11 REESE publications appear in the Proceedings of the National Academy of Sciences, which is the third-highest-ranked journal in the Multidisciplinary Sciences category, and 10 publications appear in Educational Researcher, which has the highest impact factor of the 184 journals in the Education and Educational Research category. Overall, 96 REESE journal publications appear in journals with AI scores above 2.0. Consistent with the multidisciplinary goals of the REESE program, it appears that REESE projects are publishing in influential disciplinary as well as multidisciplinary journals. This suggests that REESE projects are having a significantly higher-than-average impact on educational research throughout the academy.
Discussion
The REESE program has been ongoing since 2006, and for the first time, here we report an evaluation of its impact on research quantity and quality. In brief, three objective, replicable, and independent bibliometric indicators converge on the finding that a REESE award is associated with both high and higher scientific productivity and impact. Results of the three separate analyses show, respectively, that REESE investigators (a) outperform, in publication numbers, a national population of scientists in equivalent fields; (b) increase their publication rate after relative to before the award; and (c) publish in higher-quality and more impactful journals than their peers. However, each indicator also suffers from its own unique limitations, including the inherent difficulties in accurately “matching” a particular group of NSF-funded researchers with a national sample of doctoral recipients representing a range of STEM fields and occupations, the lack of a randomized control group to fully assess the causal role of a REESE award in increasing productivity, and problems with using JIFs to evaluate the quality of publications, particularly across academic disciplines. Nevertheless, the three different analyses point to a similar conclusion: REESE is a program of research that attracts highly productive investigators and provides support to make investigators with REESE funding that much more productive.
Of course, academic productivity is only one measure of program success. Perhaps the bigger concern is how to assess program impact within the broader public sphere. Indeed, in education, where so much of what is studied is in the public sphere, measurement of impact beyond traditional academic products is critical. However, impact beyond publication output is not well assessed at present and would require a comprehensive framework that encompasses a broader set of concepts, metrics, and analytic strategies. Some potential measures of impact relevant to the public sphere include media coverage, provision of training, intervention development and implementation, commercial applications, and impact on legislation. Analytics—defined broadly as data-driven decision making—is one possible source of information on how to model the impact of educational research on teaching and learning (Van Barneveld, Arnold, & Campbell, 2012). However, the use of analytics for evaluation purposes requires that relevant data be made publicly available in a convenient and timely fashion. For example, Google collects and disseminates information on patents issued, schools routinely post the minutes of board meetings, and the Library of Congress has begun to archive websites and federal legislative records in order to preserve important historical materials for future generations of researchers. Such digital archives make it possible to go beyond bibliometrics and establish additional transparent, replicable benchmarks that follow impact from the academy through the halls of Congress and into the classroom. Having such a broad range of metrics not only could promote more objective, evidence-based decisions about federal spending on education research but also might help garner public support for STEM research programs such as REESE.
