Abstract
In this article, I review, comment upon, and assess some of the suggestions for evaluating scientific merit as suggested by contributors to this symposium. I ask the reader to take the perspective of the individual who has the final say in making a tenure, promotion, or hiring decision. I also ask that one imagine the difference between the fallible human state we are in on such an occasion and what it would be like to be omniscient when making such decisions. After adopting the terminology of “deep” and “surface” eminence, I consider what an omniscient being would take into account to determine eminence and to guide decision-making. After discussing how some proposed improvements in assessing merit might move us closer to wise decisions, I conclude by noting that both data and judgment are, and will continue to be, necessary. A clerk cannot determine eminence.
In his opening remarks for this symposium, Bob Sternberg confessed that going up for tenure may have been the second most stressful time in his life, only exceeded by the threat of death (Sternberg, 2016, this issue)! That is a dramatic indication of how important and how fraught such evaluations can be. He also remarks: “Evaluations today are almost certainly more valid than they were in the past” (p. 878). With one caveat, I definitely agree with that—I would substitute “can be more valid” for “are almost certainly more valid,” because the improvement in evaluation procedures he and others in this symposium talk about are not universally used. But, of course, improvement does not mean “mission accomplished.”
Suppose you have a vote or, more dramatically, have the final say in a tenure decision. You review the materials and then have to make a yes or no choice. You’re in the world of a classic 2 × 2 decision table. On one dimension, we have what you would decide if you were omniscient; and on the other dimension, we have your actual, human, and possibly fallible, decision. As always, we’d like to live strictly on the main diagonal of such a table—deciding “yes” only when the person is or will truly be a hit and “no” only when he or she correctly should be rejected. But, of course, we don’t live in such a neat world. I was once a colleague of a highly influential and highly cited scientist (not in psychology) whose work was highly admired by others in his discipline and who, outside of medical schools, received the most Federal grants in the entire state. Earlier in his career he had been denied tenure at a major university in the University of California system. Most of us would conclude that UC had added an entry in its “miss” cell.
Let’s consider what you would take into account if you were omniscient and making a call about the research and scholarship contributions of our tenure candidate. Career-long eminence, or at least being eminent at the end of one’s career, would be the touchstone of a positive decision. (One nice thing about being omniscient is that seeing the future is no problem.) But what is eminence?
In this little thought experiment, I propose that there are two types: surface eminence and deep eminence. Deep eminence means that your omniscient self knows, and maybe future generations will also know (although that is not guaranteed), that the candidate is doing (or will do) work that moves some part of the discipline toward what we might metaphorically call “capital-T Truth.” But it’s pretty likely at this time that you and I, as well as the outside letter writers, don’t really know whether the candidate will make such contributions. We must assess whether the candidate is moving, or seems to have a good chance to move, the subfield forward on the basis of a small sample—his or her past and current work—where “forward” is based purely on one’s opinion of the direction the field should move. And that opinion of what I’m calling surface eminence constitutes the basis for our mere-mortal judgment of “tenurability.” Such surface eminence is all we have—though we might take solace from something my mother once said: “What else can you scratch but the surface?”
Our contributors to this symposium have critically and usefully analyzed the surface indicators and they have proposed additions to them. I consider them attempts to sharpen and extend the suggestions from Sternberg and Gordeeva (1996) as summarized in the Sternberg’s Introduction (2016): “that high-impact work has six characteristics: quality of presentation, theoretical significance, practical significance, substantive interest, methodological interest, and value for future research” (p. 878).
One big, highly positive change over the past half century was also pointed out by Bob Sternberg: Many sources of explicit, person-centered bias (e.g., the candidate’s gender or ethnicity) have been wrung out of the system or at least very largely reduced. Years ago, a then senior scholar and member of the National Academy of Science told me that, early in his career, he was kept out of an Ivy League school because of its Jewish quota. Although Eagly and Miller (2016, this issue) are skeptical that these biases are completely gone, they do note that the current dominance of men on lists of eminence “may be a vestige of the earlier era of female exclusion.” They go on to examine whether commonly used surface indices of eminence may be biased against women. After a review of the evidence, they do not come to a clear conclusion that such biases still exit. Of course, this and related topics will (and should) remain a topic of continued interest and investigation.
In his contribution, Diener (2016, this issue) discusses two approaches for assessing eminence. In one, he proposes to elevate the key unit of eminence from the individual to the academic department. He argues that it will be more efficient overall if we focus on that level—though he notes that the department-level measure depends upon the status of its individual members. Diener suggests that the discipline will progress more rapidly if the most highly productive individuals are allowed to be even more productive, which will occur by further unburdening these worthies from their teaching responsibilities.
I have three quick reactions to this point. One is that, to a considerable extent, his proposal has already been adopted. The typical teaching assignments in research universities are very substantially lower than they were in the days when modern experimental psychology took off. And it is not unusual to see less productive scholars with teaching assignments that involve, say, larger sections of undergraduates, as well as carrying out other service activities. Second, in many places it is still possible for successful grantees to “buy out” some of their teaching time. By definition, these are members of the publishing crew or they would not have the grant money that allows this exchange. And third, and most importantly, let’s revisit what a university is for. One of its primary goals is to develop the human capital of society. In order to keep faith with the funders of (at least the public) universities, we should be leery of allowing that mission to slip too low in our goal hierarchy. I find Alfred North Whitehead’s (1929) aspirational description of the primary reason for these great institutions to be quite compelling: The universities are schools of education and schools of research. But the primary reason for their existence is not to be found in either the mere knowledge conveyed or in the mere opportunities for research afforded members of the faculty. . . . The justification for a university is that it preserves the connection between knowledge and the zest for life by uniting the young and the old in the imaginative consideration of learning. (p. 97)
Diener also wisely suggests we keep in mind that “In order to produce high-performance work systems, we need to coordinate selection with continuing education, performance appraisals, tracking of talent, and rewards and compensation for excellence” (p. 910). He describes the successful application of this approach at the University of Illinois, home to a highly ranked psychology department. I’ll return to the role of assessment in fostering the careers of faculty members. However, in terms of my imaginary assignment that you be the final decision maker on tenure cases, let’s revisit our inquiry about the proper bases for these performance appraisals and the subsequent tenure decision, and their connection to eminence.
The h index receives a lot of attention in this symposium, especially from Ruscio (2016, this issue), who thinks it can be massaged in such a way that it deserves at least two cheers, and from Simonton (2016, this issue), whose cheers are more muted. As a reminder, h is a number such that an individual scholar has h publications cited at least h times. So if a person with 50 publications has 20 that have each been cited 20 times or more, that person’s h index is 20. The claim is that h simultaneously combines an assessment of productivity and impact. You can’t have an h index of 20 if you’ve only published 15 papers, and you can’t be a 20 if only 5 of your papers have been cited 20 times or more.
Ruscio argues that, if we are careful, we can make a big improvement in the simple h index. Furthermore, that improvement can help us assess our tenure-seeking candidates and, more generally, assess surface eminence at any point in anyone’s career. The key insight is the instruction to relativize the individual’s h index by comparing it to others at peer institutions (a) at the same point in their careers, and (b) in the same subdiscipline. Ruscio reminds us that h is a measure of cumulative productivity and impact and therefore is expected to increase over one’s career—hence the importance of (a). And (b) helps because, for understandable reasons, the average frequency of publishing can vary across subfields. Ruscio also notes that the database we interrogate must comprehensively cover the outlets in which our candidate and his or her cohorts might publish, and the searches must be carried out at the same point in time.
Along with a thoughtful discussion of impact metrics, Nosek et al. (2010, pp. 1288–1289) provide an example of this procedure. They plot cumulative citations against years since obtaining the Ph.D. for faculty in social-personality programs in all U.S. and Canada Ph.D. granting institutions. Their Figure 2 limits the range of time since the Ph.D. to 10 years—the important part of the curve for tenure decisions. Upon inspection, the average regression line grows slowly over that range, as we would expect given both that these are beginning faculty members and the restriction of range relative to the total career data shown in their Figure 1. However, they report that early career award winners were almost 1 SD (.88) above the cumulative impact for that sample. In addition, the slope of the curve representing +1 SD is considerably steeper and is clearly affected by a small number of outliers.
Citation data thus appear to be somewhat predictive of early prominence, at least at the extremes of the citation distribution. It will take longitudinal studies to actually see how predictive these early data are. By necessity they constitute a relatively small sample, though it is the sample that the tenure decision makers must work with. There will be a problem with longitudinal studies though. If a person does not get tenure, then the likelihood is that the next job will, on average, be at an institution with fewer resources in support of research. So a “no” may appear to be a correct rejection, but the prime reason may be in the environment rather than in the person.
As mentioned, aside from scientific productivity, Simonton (2016) is not yet convinced about the h index, particularly “when it is necessary to make fine distinctions among researchers for purposes of hiring, promotions, or awards . . .” (p. 889). He suggests that other indices may discriminate better. More seriously, he has found that the relationship between predictor and criterion variables “tend to be small to moderate—certainly not large enough to make very fine discriminations among scientists” (p. 889). Simonton also notes that there is more consistency in judging important contributions in the “hard” sciences than in psychology. However, relativizing by subdiscipline as advocated by Ruscio (2016) and Nosek et al. (2010) should help avoid that problem. In general, using data that have more than just face validity is more helpful than not using them.
Simonton (2016) also raises the problem of interjudge reliability among journal advisory editors (and, for that matter, perhaps intrajudge reliability across time for the editor). That can affect the number of papers an individual gets published in high impact journals. In the end, he thinks that the best indicators of surface eminence have insufficiently high predictive validities and interjudge reliabilities to accurately do the job. Finally, and somewhat depressingly, he asks, “what’s wrong with posthumous eminence . . . ?” (p. 891).
Roediger (2016, this issue) has a seemingly even more depressing answer to that question: Except for a very, very few extraordinarily transformative people, most researchers should just forget about posthumous recognition. Those future ingrates are going to forget about you. I’ve been privileged to have a small connection with two really incredible intellects (among others). When I was a postdoc, I sat in on a course taught by Noam Chomsky and he was a guest in my home about 5 years later. He’ll make it for 100 years from now or more. I also took courses from and had some other cool interactions with Paul Meehl (1920–2003), who I consider the smartest psychologist I’ve ever seen in action. (He was also a professor of philosophy, neurology, psychiatry, and law; and I think the only person trained in clinical psychology who has ever been elected to the National Academy of Science.) His early conjecture that schizophrenia has a genetic component is now so commonplace as to be “obvious,” and I suspect that few except historians will soon cite that conjecture. However, his critique of null hypothesis testing is gaining ground again, so maybe he will be known for a quite a while. But given Roediger’s data, I don’t think he will make it for 100 years. Paraphrasing Roediger, if Meehl is forgotten, so will you and I be.
But that’s okay. For one thing, I think that the ideas of, for example, clinical versus statistical prediction and the hypothetical construct are likely to be around for a very long time—the latter is a contribution from Cronbach and Meehl—so many individuals move the field forward and live on via their work. Thus, deep eminence can and usually does come without a name on the label, as it were.
Given that, Simonton’s point is not so depressing after all. And anyway, name-labeled posthumous eminence just won’t be as much fun as great dinner conversations at the meetings, colloquium interactions, polite and occasionally tense interactions with journal editors, invitations to write a handbook or Annual Review chapter, to be on an editorial board, or just the quiet activity of trying to get one’s thoughts straight and to write them down. Posthumous fame is not as interesting as trying to figure out which way is forward in your subdiscipline and how your work can nudge it in that direction. This perspective is a variation on Feist’s view (2016, this issue) that both extrinsic and intrinsic motives guide the working scientist. Also, we should remember to use our assessment attempts as occasions to provide feedback and advice to our colleagues.
To conclude, I want to tie these points to the hiring and the tenure and promotion decisions that may weigh heavily on you, no matter whether you are the decision maker or the one whom that decision most directly affects. My best estimate is that I’ve read and acted upon somewhere between 350 and 400 promotion binders and have interviewed and participated in hiring about half that number of junior faculty. Not being omniscient, I’ve lost sleep over quite a few of my decisions. However, given the contributions of the symposium contributors, along with the experiential basis just mentioned, here is what I look for in the research vector of a job candidate:
Some publications—though, after an entry number, the impact on me is not primarily driven by quantity but more by my assessment of the quality of the work. That is a tough evaluation to make when the topic is outside my areas of expertise, of course, so I am influenced by the reputation of the journals and, indirectly, by the opinion of their editorial boards.
I know that a job talk is a very small sample of behavior, but I value it as an indicator of teaching ability and, importantly, as an indicator of the candidate’s “take” on the big problem and where his or her work fits within it. I want to know what the candidate thinks the big issue(s) are and where he or she thinks progress can be made. If someone doesn’t mention those points in the job talk, I will ask him or her, often in private, “What is the big problem you care about? Six years from now what will you be known for as your contribution to working on that problem?” I believe that people who have an answer to that question have their eye on big picture, will more likely have their work cited, and have a good shot to be on the path to eminence.
And here is what I think one looks for in the research section of a tenure binder:
Visibility “out of town,” as indexed by publications and citations, and I would now ask science departments to provide something akin to a relativized h index. A successful grant proposal is a big plus, as would an invitation to join the editorial board for a journal in the subdiscipline.
Most of all, is the work contributing to what seems a reasonable answer to an important question? Do judicious outside letter writers make the case for a positive answer to that question?
We cannot avoid making judgments. Though appropriate counting is a start, we have to “weigh” things as well as count them. The contributors to this symposium have helped to provide judicious suggestions and advice about how to proceed, including suggestions for improving what we count. We should, of course, strive to continue to improve so we can increase the chance that our decisions primarily are “hits” and “correct rejections.”
Footnotes
Declaration of Conflicting Interests
The author declared no conflicts of interest with respect to the authorship or the publication of this article.
