Abstract
This study addresses validity issues in evaluation that stem from Ernest R. House’s book, Evaluating With Validity. The authors examine American Journal of Evaluation articles from 1980 to 2010 that report the results of policy and program evaluations. The authors classify these evaluations according to House’s “major approaches” typology (Systems Analysis, Behavioral Objectives, Decision making, Goal-free, Professional Review, Art Criticism, Quasi-legal, and Case Study) and the types of validity (measurement, design, interpretation, use) the evaluations consider. Analyzing the intersection of evaluation type and validity type, the authors explore the status of House’s standards of Truth, Beauty, and Justice in evaluation practice.
The passing of three decades since the publication of Ernest R. House’s Evaluating With Validity (1980) invites reflection on the implications of his insights for current evaluation research and practice. Not just the passing of time but the centrality to the field of evaluation of the issues that House raised warrant focus on his work. For example, American Evaluation Association President, Leslie Cooksy designated the 2010 annual conference theme as “Evaluation Quality,” with House’s (1980) book as the “starting point for our exploration and discussion of this theme” (Cooksy, 2010, 6). With the three standards for quality of “truth, beauty, and justice,” Cooksy asked conference applicants and attendees to consider: “How do our evaluations embrace and inform truth? beauty? justice?” and “How do we balance dimensions of evaluation quality when they seem in opposition to one another?” among other questions (Cooksy, 2010, 6–7). The conference program included 12 sessions and round tables based on these issues, and an additional 10 papers that explicitly considered truth, beauty, and justice as evaluation standards, standards that House originated 30 years earlier.
The primary question addressed in this study is “to what extent has House’s (1980) work been relevant to evaluation research that followed?” Specifically, we consider how House’s framework has been reflected in the American Journal of Evaluation (AJE), our field’s main journal, over the past 30 years. What are the interactions between House’s evaluation types and validity types? And what are the implications of the observed interaction for evaluation research and practice?
In brief, we find that all of House’s evaluation types appear among AJE-published evaluations, though some categories are more prevalent than others. Also, evaluation types based on objectivist epistemology are more likely to consider House’s “truth” dimension of validity (i.e., content, internal, or external validity) and are less concerned with the “beauty” dimension (i.e., consequential or communicative validity). Contrary to our expectation, however, validity as defined in House’s concept of beauty is addressed in all types of evaluation, whereas validity as “justice” (i.e., pragmatic validity) appears in only some evaluation types.
This article proceeds as follows. We first discuss House’s (1980) standards of truth, beauty, and justice. We define the multiple meanings of validity and our use of these concepts for understanding evaluation. We then review House’s categories of evaluation. Against this background, we propose our research hypotheses, describe our methods, and present our findings, discussing what they reveal about House’s work through the lens of published evaluation research.
Truth, Beauty, and Justice
House (1980) considers truth, beauty, and justice as standards for evaluation. To understand this framework, consider the three strata of evaluation that House laid out based on the relationship between an evaluator and a decision maker (pp.19–20). If the evaluator and decision maker are the same person, then selecting a set of standards is relatively easy. The evaluator can apply his or her standards of evaluation and exercise judgment as the decision maker. While it can be challenging to carry out this judgment, those who evaluate are also the ones making decisions based on the evaluation. If the evaluator and decision maker are different people, the evaluator’s credibility can be an issue. The decision maker must consider whether the facts from the evaluation are correct as well as whether the evaluator is truthful. Here, communication and credibility are essential, reflecting a relationship between the two parties involved. When a public program or policy is the object of evaluation, the evaluation holds collective value. Decision makers may include a large number of interested parties. In addition to truthfulness and credibility, evaluation must consider justice for stakeholders. Truthfulness, credibility (beauty), and justice are standards for evaluation that House (1980) addresses.
According to House, what “truth” means in evaluation varies from its use in traditional research. Truth in evaluation moves away from convincing people with proofs toward persuading them with multiple reasons and evidence. This view goes beyond seeking certainty of knowledge pursued within the intellectual tradition of Descartes and Mill. With evidence as often equivocal, evaluation functions as persuasion rather than truth-seeking. Even if evaluation cannot produce definite truth, “it can provide the credible, the plausible, and the probable” evidence (House, 1980, p. 72). Evaluation results may not be certain but can still be useful and open to argumentation. Because an argument that works for one kind of audience does not necessarily resonate with others, House argues, the audience for evaluation needs to be narrowed from a universal audience to a more particular one.
Whereas truth concerns logic, beauty as an evaluation standard considers coherence and credibility (House, 1980). House considers a story as the basic underlying structure of an evaluation. Therefore, at a minimum, evaluation must tell a coherent story. The story’s content and presentation play an important role in persuading readers. They will interpret an evaluation story, with this interpretation varying from reader to reader (House, 1980, p. 102). If a reader finds the story credible, he or she is more likely to value the evaluation result and subsequently take related action. To be credible, an evaluation story must be coherent with explicit and implicit sequences of events. While the persuasiveness of the evaluation may largely rely on explicit logical argumentation, the coherence of the story can reinforce the argument’s persuasiveness and improve its credibility.
As a final standard for evaluation, House proposes the concept of justice within the context of three philosophical traditions: utilitarianism, pluralism/institutionalism, and Rawls’ justice-as-fairness. In House’s view, “none of these dominant theories of justice is entirely satisfactory as a base for evaluation” (p. 134). Although he regards justice-as-fairness as superior to utilitarianism as a theory of justice, he prefers a pluralist/institutionalist conception (p. 135). In addition, “an appropriate theory of justice would take into consideration the values of moral equality, moral autonomy, impartiality, and reciprocity, as well as the aggregative principle of utility” (pp. 135–136). The evaluator’s theory of justice operates implicitly and subtly rather than determines the type of evaluation that the evaluator undertakes.
Validity Typology
Although House discusses validity only with regard to truth and logic (see pp. 85–95), we assert that his three standards can be reinterpreted more generally as types of “validity” in evaluation, per the title of his book. The concepts of validity in evaluation are rooted in Campbell and Stanley (1966), Cook and Campbell (1979), and Cronbach (1982), whose subsequent influence has been crucial (e.g., Chen, Donaldson, & Mark, 2011). Most broadly, validity refers to whether one measures what one intends to measure, given the assumption of an objective reality. The notion of validating evaluation through correspondence with such a reality can be narrow or misleading from a subjectivist perspective (House, 1980; Kvale, 1995). In comparison to “truth” alone, the concept of “beauty” is proposed as a standard for evaluation that involves validity defined by the interpretive characteristics of stories. Finally, in a pragmatist tradition, “justice” is an evaluation standard that considers the extent to which knowledge connects to action. To validate evaluation results is to act with some beneficial intent. Defining “how we know” in different philosophical perspectives connects foundational discussions of validity to House’s consideration of beauty and justice as standards for evaluation and to evolving analyses of validity in evaluation (Chen et al., 2011; Kane, 1992; Mark, 2011; Scriven, 1993; Shapiro, 1989).
One way to frame the relationships among the various concepts of validity is presented in Figure 1. Here, we make a distinction between the theoretical and the empirical, and among the measurement, design, interpretation, and evaluation-use definitions of validity. The framework is largely based on Cronbach’s validity framework, which includes the elements of inference that one draws from research (Conrad & Conrad, 1994; Mark, 1986), and which we assume is familiar to evaluators. We extend Cronbach’s validity framework, embracing the different epistemological perspectives such as the role of interpretation and use in evaluation.

Concepts of validity in evaluation.
As shown in Figure 1, we identify dimensions of measurement, design, interpretation, and use and how they relate to one another. For example, construct validity is a kind of measurement validity that draws on connections between theoretical conceptualization and empirical observation. The elements within the box of Figure 1 focus on measurement and design validity and their implications, whereas those outside it concern generalization and the extension of ideas into the practical or applied arena.
Within the measurement, design, interpretation, and use categories, the following are specific definitions of validity types that are common in research and evaluation.
Measurement validity
Within the class of measurement validity are what others commonly refer to as face validity, content validity, construct validity, and, in some cases, predictive validity (Conrad & Conrad, 1994; Gaber & Gaber, 2010).
Face validity is “defined as a post facto assessment” of whether a measure “appears to make sense without having to give, or expecting to hear, detailed reasons” (Gaber & Gaber, 2010, p. 138). It considers, “on the face of things, does this research make sense?” (Gaber & Gaber, 2010, p. 139).
Content validity deals with “the representativeness or sampling adequacy of the content—the substance, the matter, the topics—of a measuring instrument” (Gaber & Gaber, 2010, p. 139).
Construct validity, also called theoretical validity, examines “the representativeness of the relationships in the identified variables” with regard to known relationships in other, similar research (Gaber & Gaber, 2010, p. 139). In other words, “construct validity deals with the relationships between the measurement process or operations and the theoretical variables or constructs” (Chen & Rossi, 1987, p. 99).
Predictive validity, related to construct validity, refers to measures having “statistical correlation …, indicating correspondence between test results and some external criteria” (Kvale, 1995, p. 22).
For simplicity, we will use the term content validity to capture this whole class of measurement-related validity and reflecting a component of what House (1980) called “truth.”
Design validity
In evaluation, internal validity refers to the ability of an evaluation design to support causal inference, and we group within this the notion of statistical conclusion validity. External validity refers to the generalizability of an evaluation result to larger populations, settings, and treatments. These validity types are developed from the psychometric or statistics tradition (Mark, 1986) and introduced to assure the “truth” of evaluation results.
We posit that, together, the measurement and design validity types described here are akin to House’s evaluation standard of truth.
Interpretation
As discussed in Kvale (1995), validity is not only limited to measurement quality and evaluation design strength but also includes interpretation of evaluation results. Evaluations that emphasize interpretation of results may still study the relationship between treatments and outcomes even if they are not quantitative. Qualitative evaluation approaches might be more likely to consider these issues of interpretation, involving consequential or communicative validity.
Consequential validity examines the social implication of evaluation findings (Kane, 2006). It addresses intended or unintended consequences of measurement and assessment. This kind of validity inevitably involves stakeholders to balance and resolve various interests, opinions, and values of evaluation (Brandon, Lindberg, & Wang, 1993; House, 1993).
Communicative validity refers to “the validity of knowledge claims in a dialogue,” involving “a conversation about the social reality: What is a valid observation is decided through the argumentation of the participants in a discourse” (Kvale, 1995, p. 30). Validity may be established from the practice that is similar to judicial procedures in courts.
These two types of validity—under the umbrella of interpretation validity—can represent proxy for House’s beauty standard.
Use
Kvale (1995) suggests that validity can also refer to the use of evaluation results. Whereas other fields of research may be less interested in use, evaluation’s applied and change-orientation renders use more central (Patton, 1997). We align use with the concept of pragmatic validity, which emphasizes knowledge as an action. “Whereas communicative validity includes an aesthetic dimension, pragmatic validation is verification in the literal sense—‘to make true’” (Kvale, 1995, p. 32). Thus, “it goes further than communication; it represents a stronger knowledge claim than a mere agreement through a dialogue” (Kvale, 1995, p. 33). The two types of pragmatic validation are (1) “whether a knowledge statement is accompanied by action” and (2) “whether it instigates changes of action” (Kvale, 1995, p. 34). In the pragmatic approach, “truth is whatever assists us to take actions that produce the desired results” (Kvale, 1995, p. 35). The ideal in any theory of justice may actually be achieved when an action is taken for the desired result, not simply when a consensus is achieved in a dialogue. We argue that pragmatic validity represents one embodiment of justice as a standard for evaluation: to bring the desired result to society for disadvantaged groups or for the practitioners themselves.
Thus, we propose grouping validity into the three main classes described above: validity that pertains to measurement or evaluation design (truth), validity that pertains to interpretation of a story for others and society (beauty), and validity that pertains to an evaluation’s use (justice). We collapse face validity, content validity, and predictive validity into content validity, given that they are all concerned with accuracy in the measurement of constructs. Table 1 compresses and aligns these categories of validity with House’s concepts of truth, beauty, and justice.
Alignment of Concepts of Validity With House’s Truth, Beauty, and Justice
Evaluation Typology
In his categorization, House (1980) proposed eight types of evaluation: Systems Analysis, Behavioral Objectives, Decision Making, Goal-free, Professional Review, Art Criticism, Quasi-Legal, and Case Study. According to House, each of these types poses different questions, results in different types of outcomes, and varies in its epistemological underpinning. Table 2, adapted from House, summarizes his typology and underlying key questions and outcomes. We also include the epistemological orientation House identified for each evaluation type (objective or subjective).
House’s Major Evaluation Approaches

Yearly distribution of articles, 1980–2010.
As shown in Table 2, the first four of House’s evaluation types stem from an objectivist perspective, whereas the last four are subjectivist. According to House, in the objectivist tradition, knowledge is “the pooled sum of individual observations” (1980, p. 51). These are the evaluation types that draw on the so-called objective technician and that technician’s point of view to draw conclusions about program effectiveness.
The first of these objectivist evaluation types, Systems Analysis, is what we might think of as a traditional impact or efficiency analysis, including cost–benefit analysis. House referred to the Planning, Programming, and Budgeting System as an example of Systems Analysis and to Rivlin’s Systematic Thinking for Social Action (1971) and Rossi, Freeman, and Wright’s Evaluation: A Systematic Approach (1979) as books that espouse the systems analysis approach.
Second, Behavioral Objectives evaluation is concerned with productivity. The evaluator examines the extent to which the program is achieving its intended objectives as stated in its goal statements. House offers an example of this type of evaluation as management-by-objectives. We interpret this type of evaluation as exploring the processes and implementation procedures associated with program “success.”
Third, the Decision-Making approach considers utility maximizing, effectiveness, and quality control. This approach advocates that “the evaluation must be structured by the actual decision to be made” (House, 1980, p. 28). Decision makers’ values, opinions, and criteria are used to determine the success of a program, assuming that such an inclusion will increase use of the evaluation’s results. Often, surveys and interviews, with questions targeted at decision makers’ information needs, are used to gather data to determine effectiveness of a program or parts of a program.
The fourth type of objectivist evaluation, Goal-free evaluation, considers program effects, but does not judge only a subset of outcomes; instead, it considers all possible outcomes, whether intended or unintended, without weighting any as more important than others. Inspired by Scriven’s (1993) belief that evaluators should remain uninformed of program goals so as not to become biased by the goals, this approach seeks to determine the needs of the program’s target groups versus the needs of the program developers (House, 1980). House notes that there are few applied examples of this approach in the social sciences.
Subjectivist epistemology, characterizing the last four approaches, is “less interested in arriving at a proposition that is ‘true’ (in the generalizable sense) than in relating the evaluation to a particular experience of the audience” (House, 1980, p. 56). From this perspective, improving the understanding of particular stakeholders is an evaluation goal; therefore, it requires active participation of individuals in the evaluation process. For example, House’s Professional Review might take the form of an accreditation review and considers professional acceptance, as in a scorecard or benchmarking process, to be a measure of success. This approach assumes that professionals are best qualified to assess programmatic merits. The institutions to be accredited often form committees and task groups to gather materials, meet with reviewers, and maintain an active role in the evaluation process.
Next, House’s Art Criticism is essentially a critical review that questions whether a program would be approved by a connoisseur or whether the audience’s appreciation would be increased. Based on Eisner’s (1979) application of the methods of traditional art critics to educational programs, being familiar with the program is considered crucial in this approach. It assumes that evaluation through criticism will eventually improve standards and performance.
Another subjectivist approach, Quasi-Legal evaluation, is an adversary approach that has been used to evaluate programs with a panel serving as judges and testimony presented as one would see in a trial. In some cases, individuals are appointed as judges to make decisions, while in others a mock jury is established. While the procedures may vary, Quasi-Legal evaluation centers on evidence for and against a program, with a one-sided resolution being the expected outcome.
Finally, House describes a Case Study evaluation as one that considers diversity of perspectives in judging mostly formative questions. This approach is qualitative in methodology and presentation. Its purpose is to improve the understanding of the reader or audience of the evaluand. Unlike Art Criticism, which relies of the experiences and knowledge of the evaluator, the case study evaluator gathers observations of a program from as many perspectives as possible, including his or her own.
In brief, House categorizes evaluations based on the questions they ask, which tends also to relate to their epistemological approach. A few of House’s categories may not be intuitive and can be open to interpretation, especially those that rest on the subjectivist epistemology.
Research Hypotheses
An inevitable connection exists between how we know (i.e., epistemology) and what is legitimate (i.e., validity). House discusses truth as “an ideal that can only be approximated through an interplay of introspection and public verification” (1980, p. 88) and contends that in evaluation truth is often conflated with objective knowledge. Objectivity in evaluation is the notion that the evaluators are removed from the process, making results the product of instruments and procedures rather than intersubjectivity. This view of evaluation leads to our first hypothesis: Evaluations based on an objective epistemology will be more likely to consider validity types that align with House’s truth dimension. Therefore, these evaluations will focus on internal, external, or content validity, including the analysis and estimation of program effects.
House argues that “imagery, dramatic structure, and mode of presentation are central considerations for the import of an evaluation” (1980, p. 100). The subjective interpretation of events sets the tone and structure of an evaluation as well as the interpretation of evaluation results by evaluators and audience. As such, we hypothesize that evaluations based on a subjectivist epistemology will be more likely to consider validity types that align with House’s beauty dimension. These will be more interested in the people involved in the process and understanding varying interpretations of program effects by various stakeholders.
Finally, validity claims pertaining to House’s concept of justice may appear in evaluations of both objective and subjective epistemological types, but in different ways. From a more objective view, a set standard of measures can be used and decisions on what to do can be based strictly on those measures, rather than on unintended findings or consequences of the evaluation. From a subjectivist perspective, House notes that “What the evaluator believes is right and the prevailing conception of justice significantly affect the evaluation” (1980, p. 120). To that end, evaluators can interpret results to influence decisions based on personal beliefs and the beliefs of those involved in the evaluation. Thus, we expect that more objectivist evaluations will consider issues of justice secondarily to issues of truth, reflecting an assumption that truth concerns drive justice concerns. In contrast, we expect that more subjective evaluations are likelier to accord primacy to justice.
Methodology
Following Christie and Fleisher’s (2010) definition of what makes an article an “evaluation,” we selected those articles that “describe the evaluand, the methods used, and the results” (p. 329). This definition excludes articles that consider theory or conceptual issues in evaluation or pertain to solely teaching or the process of conducting evaluation. Although many other sources for evaluation reports exist, we drew exclusively from AJE because it is the journal of the evaluation field’s professional association.
We began by reviewing the abstracts of articles from 1994 forward, resulting in 139 candidates for our list. AJE did not publish abstracts from 1980 to 1993, so we scanned those articles to determine which ones reported the results of evaluations. This generated another 42 candidates. After reading each article in more detail and coding for selected characteristics, we determined that some of the initially selected articles did not fully meet our criteria for being actual evaluations. For example, in some studies, the evaluation results were not sufficiently discussed to warrant it being included as an evaluation per se. Ultimately, our sample included 106 articles drawn from all of the work published in AJE from 1980 (Volume 1, Issue 1) through 2010 (Volume 31, Issue 3). Figure 2 shows the distribution of articles by publication year.
Our coding documented the following characteristics 1 of the evaluation detailed in each article:
What is the type of evaluation, according to House’s (1980) typology: Systems Analysis, Behavioral Objectives, Decision Making, Goal-Free; Art Criticism, Professional Review, Quasi-Legal, or Case Study?
What issues pertaining to validity are raised and/or what kinds of validity are relevant: Content validity, Internal validity, External validity, Consequential validity, Communicative validity, and/or Pragmatic validity?
We randomly assigned articles to each of the three coauthors, convening to ensure consistent application of the coding scheme and discussing specific articles whenever a coder had questions. After a first round, we clarified and finalized the codes and reviewed all articles to ensure consistency.
We searched articles for the use of the word “validity” and then applied codes according to the type of validity explicitly discussed. More often, however, we ended up coding implicit uses of validity. For example, if an article discussed the challenges associated with making causal claims, we would code it as considering issues of internal validity. If an article discussed generalizability—regardless of whether it also used the term “external validity”—we coded that article as dealing with issues pertaining to external validity. If an evaluation discussed stakeholder input in interpreting evaluation results, we coded it as addressing consequential validity. Articles that reported the impact of evaluation results—be it to specific stakeholders, agencies, or the public—or made specific statements about the use of the research in follow-up studies were coded as focusing on pragmatic validity.
To address our research questions, we cross-tabulated the evaluation type codes with the validity codes. Because a given article could address multiple validity types, in addition to examining evaluation type by individual validity type (where each evaluation could be counted more than once), we cross-tabulated evaluation type by combined validity types (where each evaluation was counted only once, corresponding to its particular subset of validities).
Results
Overall, the vast majority of evaluations addressed one or more types of validity, as shown in Figure 3. Just 5% made no mention—explicitly or implicitly—of validity. The two most common validity types were content (or measurement) validity (42%) and internal validity (40%). From one fifth to one quarter of the evaluations considered consequential, communicative, and pragmatic validity types, while just 10% addressed external validity. Recall that these are not mutually exclusive categories; any one article could address more than one validity type. (Appendix A presents the results for combined validity types.)

Distribution of validity types in American Journal of Evaluation (AJE) articles.
All of House’s evaluation types are represented among AJE-published evaluations. As shown in Figure 4, the vast majority (about 71%) fall into three categories: Behavioral Objectives (29%), Decision Making (28%), and Systems Analysis (14%). Goal-Free evaluations comprise another 9% of the sample. AJE has largely published evaluations based on objectivist epistemology in House’s categorization but also has included other evaluation types such as Art Criticism (5%), Professional Review (2%), Quasi-Legal (1%), and Case Study (12%).

Distribution of House’s evaluation types in American Journal of Evaluation (AJE) articles.
Interaction Between Evaluation and Validity
Table 3 presents the interaction between evaluation and validity types. Interestingly, the four objectivist types—Systems Analysis, Behavioral Objective, Decision Making, and Goal-Free—addressed the widest variety of validity types. Neither internal nor external validity was addressed in the subjectivist types—Art Criticism, Professional Review, Quasi-Legal, and Case study. (Appendix A shows the distribution of evaluation types by combined validity categories.)
Evaluation Type by Validity Type
Note: Sample is the 106 evaluations published in American Journal of Evaluation, 1980–2010.
Figure 5, based on Appendix A, shows that 54 of the evaluations considered at least one validity type that fell within House’s truth dimension. All of those evaluations represented an objectivist epistemology. No subjectivist evaluation solely discussed truth-related validity. The 18 evaluations that addressed validity types in the beauty dimension reflected all evaluation categories. Pragmatic (justice) validity alone was discussed in six evaluations, representing the Behavioral Objective, Decision Making, Professional Review, and Case Study types. Some evaluations considered validity types that reflected more than one of House’s three standards. For example, nine evaluations considered validity types from both the beauty and the justice dimensions. Four evaluations addressed validity from both the truth and the beauty perspectives, while another four discussed validity pertaining to both truth and justice. No evaluation discussed validity from all three standards. The evaluations that addressed validity from more than one of House’s standards included both objectivist and subjectivist types.

Venn diagram of validity types by evaluation types. Notes. Evaluation types are abbreviated as follows: Systems Analysis (SA), Behavioral Objectives (BO), Decision-making (DM), Goal-free (GF), Professional Review (
Discussion
The intent of this study was not to assess House’s influence in general but to consider how published evaluations—classified according to his evaluation typology—considered issues of validity that deal with the three pillars of truth, beauty, and justice. We believe that these pillars can be interpreted as representations of various types of validity in evaluation and social research.
House’s broad spectrum of evaluation types provides a way to examine the nuances of evaluation. All of House’s categories of evaluation appear in the sample but not all are well represented. The least represented evaluation types are ones that pay greater attention to validity types aligned with House’s beauty standards. As hypothesized, validity types corresponding to House’s truth dimension appear in evaluations that are rooted in an objectivist epistemology. These same validity types were less likely to be discussed in subjectivist evaluation types. Contrary to our hypothesis, validity types reflecting House’s beauty dimension appear in all types of evaluation. This indicates that communicative validity, inclusiveness of multiple views, and the social implications of evaluation results are relevant across a wide spectrum of evaluation types. Though somewhat less prevalent than the other validity types, pragmatic validity—which aligns with House’s justice standard—could be found in both objectivist and subjectivist evaluations, even though it was not considered by every evaluation type.
Although we expected a relationship between subjectivist evaluation types and beauty-related validity, we cannot conclude that specific validity types are tied to subjectivity; we observed no clear relationship between validity and epistemological stance. Varied perspectives on validity can provide a more comprehensive evaluative look at a program. Truth and beauty are not at odds, although the results suggest there might be a trade-off between the two at times. House asserts that truth might take precedence when truth and beauty are in conflict, thereby privileging truth in a relative sense. However, it is clear that evaluations can benefit from both viewpoints: truth focuses on the “basic facts” of the narrative, while beauty gives authenticity to the story behind the data through interpretation. That said, the types of validity associated with beauty were less commonly found than those associated with truth, most likely because of the lower representation of subjectivist approaches to evaluation among evaluation types. Moreover, the focus on truth in evaluation might reflect the prevalence of this type of validity in mainstream texts and classrooms. Given that our findings underscore the relevance of beauty and justice validity to evaluation, expanding our awareness of these dimensions in evaluation education could enrich future evaluation practice.
As we hypothesized, notions of justice appear in both objective and subjective approaches to evaluation, but its percentages are relatively low, especially given the importance House places on this perspective. Of course, the justice-related implications and impact of a program might occur well after an evaluation is concluded, and thus not be represented via the publication process. To the extent that a concern for justice characterizes our society overall, it provides a context within which we formulate evaluation questions and carry out evaluation research. Consistent with this notion, House sees theories of justice as general guides for evaluators, rather than encompassing specific criteria for them to follow. He argues for a theory of justice that “takes into consideration the values of moral equality, moral autonomy, impartiality, and reciprocity, as well as the aggregate principle of utility” (1980, p. 136), and acknowledges that these criteria are difficult to link to validity. Pragmatic validity considers the relationship between evaluation results and noticeable actions or changes in programs or systems, changes that may simply be not discussed in manuscripts due to the fact that they may be part of the social and political context that supports the research. Moreover, the changes generated by evaluation might be seen as just or unjust depending upon the stakeholders’ perspectives applied to them. All of these considerations suggest that caution be exercised in interpreting our results on pragmatic validity.
With respect to the truth standard, House defines it in a fashion that is broader than the term’s conventional meaning, which is closely tied to fact-finding, proof, and objectivity. The validity types examined for House’s truth standard—especially content validity, construct validity, predictive validity, statistical conclusion validity, and internal validity—resonate more with the “truth-seeking” aspect of evaluation than that of persuasion or argumentation. We hope that the additional validity types included—face validity and external validity—capture some of the extended meaning of House’s truth standard. In addition, House (2011) believes that “conflict-of-interest threats” should be included within Campbell’s validity framework, given the growth of research funding from federal agencies, private companies, and foundations. Finally, it is important to note that the validity types included in this study do not exhaust the categorization schemes developed and discussed in other research traditions (e.g., consistency, credibility, and authenticity in Guba & Lincoln, 1985) that may be relevant to House’s evaluation standards.
Conclusion
This article takes an empirical approach to assessing the implications of “validity” in House’s (1980) Evaluating with Validity to program evaluation. Specifically, we analyze the evaluations published in AJE with regard to variables relevant to House’s work. We classify the articles using his evaluation taxonomy and also characterize the evaluations’ attention to validity to reveal a sense of how these interact in published work.
This study does not intend to judge the overall impact of House’s contributions to evaluation. If it did, we might have considered the more than 600 citations of his work computed by Google Scholar or the roughly 140 citations recorded in the Web of Science (as of February 2012). Instead, our research illustrates how a subset of his concepts can be used to analyze evaluation publications. Our approach to examining House’s relevance is constrained by our decision to look exclusively at evaluations, rather than at all types of articles or other citations. What AJE publishes has evolved over time, with conceptual articles being more frequent in the early years and formal evaluations less frequent during that period.
Although our sample was not representative of all evaluations occurring over the past three decades, it provides insight into how evaluation considers validity. Our results are unlikely to be generalizable to all evaluations, but our research approach could be applied elsewhere. Indeed, it would be instructive to apply this approach to other evaluation journals, as well as to evaluations that are not published in academic venues.
In sum, three main observations from this study offer lessons for evaluation practice. First, our empirical assessment highlights that certain types of evaluations—as classified by House—are not frequently used or reported. Specifically, Art Criticism, Professional Review, and Quasi-Legal evaluations are not common. The implications here are twofold: either evaluators should think more broadly about their toolbox or House’s classification is not a useful representation of types of research published in the field.
A second observation is that the vast majority of AJE-published evaluations consider truth: This includes all evaluation types but is dominant among those that follow a more objectivist approach. Considerations of beauty cross all types of evaluations—be they objectivist or subjectivist in their epistemological orientation—but are usually secondary in frequency to truth. This truth–beauty balance suggests that evaluators might do well to spend more time considering issues of interpretation, if they truly value this activity, since it currently seems to be a lower priority in the published work we analyzed.
Finally, considerations of pragmatic validity—House’s conception of justice—appeared least often and waned over time in our sample. Once again, if evaluators believe these issues are important, they should publish details about the ways in which evaluation results have been used for change, if known. Of course, we might choose to use a different lens to view this finding: A democratic context committed to justice provides the foundation for, and is implicit in, the evaluation work we engage in to achieve the changes we seek in the world.
Footnotes
Appendix A
Evaluation Type by Mutually Exclusive Validity Type
| Validity | Systems analysis | Behavior– objective | Decision making | Goal- free | Art criticism | Professional review | Quasi- legal | Case study |
|---|---|---|---|---|---|---|---|---|
| # Evaluations | 15 | 31 | 30 | 9 | 5 | 2 | 1 | 13 |
| None | 2 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| Content-only | 2 | 9 | 3 | 0 | 0 | 0 | 0 | 0 |
| Internal-only | 3 | 3 | 9 | 1 | 0 | 0 | 0 | 0 |
| External-only | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Consequential-only | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 1 |
| Communicative-only | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 2 |
| Pragmatic-only | 0 | 2 | 1 | 0 | 0 | 1 | 0 | 2 |
| Content + Internal | 2 | 5 | 7 | 0 | 0 | 0 | 0 | 0 |
| Content + Communicative | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Content + Pragmatic | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| Content + Internal + External | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Content + Consequential + Communicative | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| Internal + External | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| Consequential + Communicative | 0 | 2 | 1 | 1 | 1 | 0 | 1 | 3 |
| Consequential + Pragmatic | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| Consequential + Communicative + Pragmatic | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| Communicative + Pragmatic | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Other | 2 | 4 | 0 | 0 | 0 | 0 | 0 | 1 |
Notes: Sample is the 106 evaluations published in American Journal of Evaluation, 1980–2010.
Acknowledgments
The authors thank Tom Schwandt for organizing the American Evaluation Association Fall 2010 Research Conference panel, along with panel participants and attendees, where a preliminary version of this work was first presented. The authors also recognize Andrea Mayo and Sang-Eun Lee for research assistance.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
