Abstract
Evaluability assessment (EA) has potential as a design option for thesis and dissertation studies, serving as a practical training experience for both technical and nontechnical evaluation skills. Based on a content review of a sample of EA theses and dissertations from graduate professional degree programs, the authors of this article found that some technical skills, such as understanding the knowledge base of evaluation, evaluation design, qualitative methods, identifying data sources, data collection, and data analysis, were demonstrated through the EA thesis and dissertation studies. However, the review also indicated a lack of understanding of EA and its essential elements and application of standards of quality evaluation. Recommendations are offered to enhance the quality of EA theses and dissertations, the learning experiences of students, and, ultimately, evaluation capacity building.
Program evaluation thesis and dissertation studies require students to apply technical and nontechnical knowledge and skills learned through coursework, providing an avenue for practical experiences in evaluation training. This is particularly important for students in graduate professional degree programs, such as master’s programs in social work and criminal justice or professional practice doctoral programs in education and psychology. Unlike students in graduate evaluation training programs, these students may only take one course in program evaluation, leaving little opportunity for applying evaluation knowledge and skills (Morris, 1994). Further, professional practice doctoral programs are not intended to prepare students for academic careers but are intended for individuals planning to “practice” in their fields (Willis, Inman, & Valenti, 2010). A program evaluation thesis or dissertation can provide a practical training experience for students in professional programs that is more relevant to their career goals than a more traditional research study.
There is increasing interest in applied theses and dissertations for students in graduate professional degree programs, often conducted by students as practitioners. For example, action research has become more common among both doctoral students for dissertation research (van der Meulen, 2011; Zambo, 2011) and masters students for completing their thesis studies (Zuber-Skerritt & Fletcher, 2007; Zuber-Skerritt & Perry, 2002). In fact, there are books and websites devoted to writing action research theses and dissertations (see, e.g., Herr & Anderson, 2005).
However, there are no published studies on the program evaluation thesis or dissertation, and little guidance is available for conducting them beyond what may be included in university- or program-specific procedural guidelines. Given the importance of the thesis and dissertation as a culminating experience, the time required by students and faculty in their development, and the potential of the experience to provide practical training for students in graduate professional degree programs, this is an area where research is needed. This article includes the following sections: background, method, results, key findings, discussion and recommendations, and final remarks and conclusion.
Background
Practical Experiences in Evaluation Training
The need for technical and nontechnical skill development has been well described in the teaching of evaluation literature (Altschuld, 1999; Mertens, 1994; Patton, 1990; Stevahn, King, Ghere, & Minnema, 2005), and most advocate for practical experiences for developing these skills (see, e.g., Dewey, Montrosse, Schroter, Sullins, & Mattox, 2008; Leviton, Collins, Laird, & Kratt, 1998; Morris, 1994; Trevisan, 2004). As Trevisan (2004) noted in a review of the literature on the topic, “one of the most enduring recommendations in literature about the teaching of evaluation is that students receive hands-on or practical experiences during their education” (p. 256). Practical experiences can develop skills needed to successfully deal with the real-world challenges of conducting evaluation, including technical skills, such as handling missing data, and nontechnical skills, such as communicating with evaluation clients.
Questions remain, however, about how common and effective practical experiences are in university coursework in program evaluation. Dewey, Montrosse, Schroter, Sullins, and Mattox (2008) identified gaps in the skills of recent graduates of evaluation training programs and the skills needed in entry-level evaluation jobs, including interpersonal skills, writing, project and team management, research design, and evaluation theory. They noted that most of the gaps reflect a disconnect between conceptual and real-world evaluation practice. For example, students may learn about research design in their coursework but lack the skills needed to choose feasible designs for real-world contexts. They suggest that including practical experiences in university evaluation programs through practica, internships, and graduate assistantships is the obvious solution to decreasing the gaps identified. Students in graduate professional degree programs would likely experience greater gaps, given limited evaluation coursework and opportunities for evaluation practica, internships, and graduate assistantships. Although their futures may not involve working as a professional evaluator, many will assume evaluation responsibilities in their work (Datta, 2006), and all could benefit from skill development in program evaluation.
In addition, others have noted a focus on technical skills in evaluation coursework, such as quantitative and qualitative methods (Stevahn et al., 2005), and a lack of training in the nontechnical skills needed for program evaluation, such as negotiation and dissemination (Owen, 2007; Stevahn et al., 2005). Perhaps the greatest barrier to including both technical and nontechnical skills training through didactic and practical experiences is time. For both single course projects and practicum experiences, evaluation instructors have noted the difficulties of students completing an evaluation within the time frame of a university course and semester. Others have commented on the additional time needed for planning and monitoring evaluation projects by instructors as well as evaluation clients (Trevisan, 2004). Although there is agreement on the need for these experiences, there are time constraints associated with providing them and limited research literature on their effectiveness.
The Program Evaluation Thesis and Dissertation
Practical experiences in coursework, such as role-play, simulation, course projects, and practica (Trevisan, 2004) can offer opportunities for developing technical and nontechnical skills in evaluation. However, this skill development can also be a significant part of the thesis and dissertation process. In particular, (a) theses and dissertations are not bounded by course units and academic terms the way a traditional course is, (b) students conducting thesis or dissertation studies are at the end of their coursework and have had time to gain needed foundational skills, (c) the project-oriented nature of a thesis or dissertation and the size and scope of this work allow for depth of learning, (d) students conducting thesis or dissertation studies are generally not enrolled in a full load of coursework and expect to devote a large amount of time to their study, and (e) faculty supervising thesis and dissertation studies expect to devote a large amount of time to supervision and may receive some form of credit for serving as a committee chair or member. In addition to supporting development of technical and nontechnical skills necessary for evaluator competence, program evaluation theses and dissertations provide opportunities for students to target real-world problems in their studies. This is particularly important for students in graduate professional degree programs because it makes the thesis and dissertation processes more relevant to future professional work.
Although there are no published studies about program evaluation theses and dissertations, there is interest in the topic, as evidenced by Nick Smith’s regularly offered American Evaluation Association (AEA) Pre-Conference Workshop, “How to Prepare an Evaluation Dissertation Proposal.” The workshop has been offered as part of every AEA annual meeting since 2006 (Nick Smith, personal communication, September 27, 2013).
In addition, in their book on developing a dissertation proposal, Krathwohl and Smith (2005) include a section specific to program evaluation dissertations. In it, they offer several considerations for a student embarking on a program evaluation dissertation, including whether the study will be applied or basic research. The authors note that most evaluations are applied research or “application studies whose local findings are intended to be useful to an organization or program as well as contribute to the doctoral student’s education” (p. 174). On the other hand, a program evaluation dissertation as a basic research study focuses on generalizing findings more so than supporting decision making at the local level. Regardless of whether the program evaluation is an applied or basic study, considerations to be addressed differ from those in other types of dissertation studies and include questions about the audiences for the evaluation, the stakeholders for the evaluation, how criteria for judging progress or merit will be defined and by whom, and what evaluation approach will be used. These considerations are important to students and faculty involved in program evaluation thesis or dissertation work.
In addition, quality program evaluations meet standards of utility, feasibility, propriety, accuracy, and evaluation accountability (Yarbrough, Shulha, Hopson, & Caruthers, 2011). These standards include and extend the commonly cited criteria for quality research, which focus on trustworthiness and/or validity of research findings. Further, cultural competence in evaluation practice has become increasingly understood as critical to quality program evaluation (see AEA, 2011). Thus, students conducting program evaluation theses and dissertations can use the The Program Evaluation Standards (Yarbrough et al., 2011) along with other guidance, such as AEA’s Statement on Cultural Competence in Evaluation (AEA, 2011) and Guiding Principles for Evaluators (AEA, 2004) to support them in their work.
Evaluability Assessment for Program Evaluation Thesis and Dissertation Studies
As with program evaluation in general, evaluability assessment (EA) has been used for thesis and dissertation studies. Initially used as a pre-evaluation strategy to determine an existing program’s readiness for outcome evaluation, EA has evolved into a type of evaluation that can be used at any point in a program’s development and implementation (Smith, 1989; Thurston & Potvin, 2003), including prior to program implementation or even throughout a program’s life cycle (Thurston & Potvin, 2003). In addition to determining a program’s readiness for further evaluation, the EA process and results can be used for evaluation planning; formative evaluation; understanding program culture and context; increasing stakeholder involvement, collaboration, and communication; identifying promising practices; meeting accountability and performance measurement requirements; and organizational learning and evaluation capacity building (Leviton, Kettel Khan, Rog, Dawkins, & Cotton, 2010; Trevisan, 2004; Trevisan & Walser, 2014). EA may also result in process use; that is, learning that occurs during the evaluation process that directly benefits stakeholders (Yarbrough et al., 2011).
Joseph Wholey and his colleagues at the Urban Institute are credited with developing EA in the late 1970s (Wholey, 1979). Since then, other evaluators have contributed additional EA models and to the literature on EA (see, e.g., Jung & Schubert, 1983; Leviton et al., 2010; Rog, 1985; Rutman, 1980; Schmidt, Scanlon, & Bell, 1979; Smith, 1989; Thurston & Potvin, 2003; Trevisan, 2007; Trevisan & Walser, 2014). Although EA models vary across authors, all include four essential elements: involving program stakeholders, developing a program theory, gathering feedback on program theory, and using the EA. If these elements are not included, the approach used is not characteristic of EA. For example, Smith’s (1989) often cited model includes the following steps that integrate the four essential elements of EA: (1) determine purpose, secure commitment, and identify work group members; (2) define boundaries of program to be studied; (3) identify and analyze program documents; (4) develop/clarify program theory; (5) identify and interview stakeholders; (6) describe stakeholder perceptions of program; (7) identify stakeholder needs, concerns, and differences in perceptions; (8) determine plausibility of program model; (9) draw conclusions and make recommendations; and (10) plan specific steps for utilization of EA.
Fitzpatrick, Sanders, and Worthen (2011) classify EA as an evaluation approach oriented to decisions to be made about a program. Decision-oriented approaches were designed to promote evaluation use among decision makers, such as administrators, managers, policy makers, boards, program staff, and other stakeholders. Owen (2007) classifies EA as a “clarificative” evaluation approach and notes that although it can still be used as a pre-evaluation strategy, EA can also stand alone as an evaluation approach for determining the essential features of a program. He further notes that stakeholder involvement in the EA process has become a greater focus, and the results of EA may be particularly useful to program developers and policy makers. Finally, Wholey (2004) identifies EA as one of three evaluation approaches he has promoted to increase the benefits of evaluations, given the resources needed to conduct them. As an evaluation theorist, Wholey has been most closely associated with evaluation use (Alkin & Christie, 2004). Thus, the use of EA by key stakeholders for decision making is central to EA, as is stakeholder involvement.
Given its reliance on stakeholder involvement to ensure understanding of a program, including program logic and context, data collection to gain stakeholder perspectives of a program and its implementation, and the expectation that EA results are used for decision making, EA has potential as a design option for theses and dissertation studies, as a practical learning experience for both technical and nontechnical evaluation skills. The level of stakeholder involvement in all phases of EA provides an opportunity for developing the interpersonal skills needed for conducting evaluation. In their article describing the use of EA for a course project, Leviton, Collins, Laird, and Kratt (1998) noted that the project required skills in working within real-world constraints, group collaboration, and understanding program delivery. Thus, the use of EA for thesis and dissertation studies can result in meaningful information for programs and practice, exemplifying the type of technical and nontechnical skills important to quality program evaluation.
Finally, there has been resurgence in interest in and use of EA across disciplines (Davies, 2013; Leviton et al., 2010; Trevisan, 2007; Trevisan & Walser, 2014); however, there is a lack of studies on EA methodology. Thus, theses and dissertations that focus on evaluation knowledge (e.g., EA methodology, the utility of EA) in addition to program and field-specific knowledge (e.g., EA results for a reading program), would contribute to evaluation theory and practice.
Study Purpose
EA is an evaluation approach that has been used for conducting thesis and dissertation studies in graduate professional degree programs. As with program evaluation in general, there is a lack of knowledge about the frequency, types, and quality of EA theses and dissertations. Likewise, there are no published studies about the effectiveness of EA theses and dissertations as practical training experiences. Given the potential of EA as a design option for thesis and dissertation studies and for teaching technical and nontechnical skills, the purpose of this article is to describe a sample of EA theses and dissertations from graduate professional degree programs and provide related recommendations for improved use of EA. Although the focus of the article is on EA, findings may also be useful to those conducting and supervising thesis and dissertation studies using other approaches to program evaluation.
Research Questions
The following research questions guide the study:
What was the purpose of the EA thesis and dissertation studies? What EA model was used? What procedures were used to conduct the EA thesis and dissertation studies? How were the EA thesis and dissertation study results presented and used?
Method
The WorldCat database was queried to obtain dissertations and theses for this study. To this end, the years 2000–2013 were used to frame the search. The rationale for this time frame is that this is a recent set of years and would provide the most current representations of practice in using EA for a thesis or dissertation. It also provides a reasonable length of time in which to obtain enough articles to assess practice.
Inclusion and Exclusion Criteria
The key words “EA” and “EA dissertation” were used for the search. This generated 24 theses and dissertations. Other key words were used such as “pre-evaluation,” but these key words did not generate any theses or dissertations. Despite repeated inquiries, five theses and dissertations could not be obtained. In addition, three PhD dissertations were found. Since the focus of this study is on EA use in the context of professional master’s and doctoral programs, PhD dissertations that employed EA were eliminated from the review. We reviewed the 16 theses and dissertations that were obtained to determine if they met criteria for inclusion in the analysis, that is, if the authors identified a published EA model and/or included EA literature in their theses and dissertations, thus indicating that their studies were designed as EAs or to include EA as a component of a larger study. It should be noted that the sample was chosen to develop a description of current EA use for theses and dissertations as opposed to describing good examples of EA use for theses and dissertations; therefore, quality of the EA studies was assessed through the review process, not for sample selection. In addition, the inclusion criteria noted previously required that the theses and dissertations be program evaluations using EA. However, criteria did not preclude studies that included evaluation questions about a program and questions about EA theory, practice, and/or methodology with the intention of generalizing results or otherwise contributing to academic knowledge.
Sample
In total, 13 theses and dissertations met criteria for inclusion and were retained for analysis: one EdD dissertation, two PsychD dissertations, five MSW theses, three MS theses, one MIT thesis, and one MEd thesis. All theses and dissertations were for discipline-specific, professional degree programs.
Procedures
The content review form used to analyze the theses and dissertations consists of 25 questions within the following categories: (a) EA model, (b) EA purpose, (c) EA procedures, (d) EA results, and (e) EA use and thesis/dissertation format. These categories were developed to address the research questions. Each category consists of four to six questions. Question development was guided by the authors’ expertise and experience related to EA and thesis and dissertation work, features of the sample of EA theses and dissertations that were central to their description, literature related to dissertations, and The Program Evaluation Standards (Yarbrough et al., 2011). The development process is described in more detail in the subsequent sections, and the resulting content review form is located in Appendix A.
To develop the content review form, the authors first developed an initial set of questions and applied them to three theses/dissertations independently. These questions dealt with, for example, whether the EA studies used a particular EA model, whether Institutional Review Board (IRB) approval was obtained, and how results were presented. Once the reviews were completed, the authors compared their answers to the questions. This process generated further discussion and debate about the questions themselves, what constitutes a legitimate answer to a particular question based on interpretation, and additional questions that might be relevant. The content review form was then revised by eliminating, adding, and refining questions. The authors then applied the revised questions to the same three theses/dissertations, made comparisons, and discussed answers. The same process was applied using a “final” revised form with two additional theses/dissertations not previously reviewed.
In addition, question development was informed by literature specific to conducting dissertations and extrapolating these guidelines to thesis work. In particular, the professional practice dissertation guidebook by Willis, Inman, and Valenti (2010) was used to inform the development of questions dealing with specific features of the dissertation (or thesis), such as alignment of research questions with EA—that is, for program evaluation, can the questions be addressed by EA?; alignment of research questions, methods, and results; the role of theory; and the purposes for which dissertation work is conducted. For example, regarding the role of theory, Willis et al. define a theory-based study as one that begins with a theory that is applied to a practical problem or tested. The theory provides the framework for conducting the dissertation. This is contrasted with a theory-informed study, in which a theory or multiple theories are used to inform the study but are not the focus. They also provide a continuum of seven purposes for doing a dissertation. The continuum ranges from more traditional purposes to emerging purposes including (1) test a theory and develop implications for practice, (2) evaluate the universal effectiveness of a professional practice, (3) objective description, (4) evaluate the local effectiveness of a professional practice, (5) develop a solution to a local problem or issue, (6) hermeneutic understanding, and (7) storytelling and narrative inquiry. Thus, some of the questions developed for the content review form for this study include references to theory-based and theory-informed studies as well the continuum of dissertation purposes.
Additionally, based on a study of the California State University EdD initiative, Auerbach (2011) offered suggestions related to EdD dissertations that include the importance of contributing to knowledge and practice in the student’s local context, the student describing the research setting and his or her role, and presenting dissertation findings to different audiences using different formats to disseminate the work and increase its use. These suggestions informed question development as well. For example, to be identified as a thesis or dissertation that contributes to professional knowledge, the work must contribute to knowledge and practice in the local context in which the study was conducted. Finally, The Program Evaluation Standards (Yarbrough et al., 2011) were used in the question development process, particularly for questions related to accuracy, propriety, and utility standards.
Coding
Once the content review form was finalized, each author independently applied it to a set of five theses/dissertations. This process was continued in increments of five theses/dissertations to complete the 13 reviews. At the end of each set of reviews, the authors compared their responses to each of the 25 questions. Any differences were reconciled to achieve 100% agreement. This was done until all theses and dissertations were coded.
An incremental process was chosen to provide a controlled, measured coding process that allowed examination of agreements and disagreements between authors as coding progressed. In turn, the examination of agreements and disagreements from earlier increments provided a means to better calibrate coding between the two authors for subsequent increments.
Note that theses were treated the same as dissertations. Because we could not find any literature or frameworks specific to theses, the ideas used to classify and categorize dissertations were applied to theses.
Analysis
To gain an understanding of the accuracy of the coding system as well as the amount of work needed to achieve reconciliation of all codings, a measure of agreement was computed. This required counting agreements and disagreements between authors before reconciliation. Agreements by authors were straightforward and readily counted. Disagreements were of two types: (1) complete disagreement between authors and (2) partial disagreement that required revision to both authors’ responses in order to reconcile.
Interrater agreement was computed by counting the number of agreements by section of the content review form, or category, (number of possible agreements minus number of complete and partial disagreements), dividing by the total number of possible agreements, again for that category, and multiplying by 100 to convert to a percentage. For example, the section of the content review form, EA Model, consists of six questions. With 13 dissertations and theses, there are 78 possible agreements. This formed the denominator of the proportion statistic, the numerator being the number of actual agreements. This proportion was multiplied by 100 to obtain a percentage. Table 1 provides the percentage agreement for each category as well as an overall agreement estimate by summing across categories. Agreement indices range from a low of 78.1% for the EA Results category to a high of 85.4% for the EA Use/Format category. The overall percentage agreement is 81.5. As can be seen in the table, a high level of agreement between authors was obtained before reconciliation took place.
Percentage Agreement for Each Content Review Form Category and Overall.
Note. EA = evaluability assessment.
A widely accepted way to consider the magnitude of interrater agreement or reliability estimates is offered by Cicchetti (1994). Cicchetti suggests that interrater reliability estimates in the range of .7–.8 can be thought of as fair, .8–.9 as good estimates, higher than .9 would be classified as excellent. For this study, the interrater agreement estimate for EA Use/Format at 93.6 can be thought of as excellent; the categories EA Model, EA Purpose, and EA Results with interrater agreement estimates at 82.1, 81.6, and 80.8, respectively, thought of as good; and the EA Procedures category with an interrater agreement estimate of 78.4 thought of as fair.
There were two central reasons for computing initial interrater agreement estimates. The first is replicability. Should replication of the study be desired, given two different raters with essentially the same training and the same data set, the interrater agreement estimates provide a measure of stability of ratings. This would be particularly important if given time and resources, the decision was made not to work to reconcile differences and achieve full agreement in a replication of the study. Given the relatively high magnitude of the interrater agreement estimates in this study, it can reasonably be ascertained that codings would be similar with two different raters, and thus, findings would likely be similar as well.
The second reason for computing interrater agreement estimates is credibility. That is, the interrater agreement estimates provide an indication of the initial starting point between the two raters for reconciling differences. Moreover, the estimates provide an indication of just how far apart the raters were in their ratings and by implication, the work needed to reconcile differences and achieve agreement. Without estimates of initial agreement, it is not possible to determine the level and depth of the differences between the raters. Given the high level of initial agreement, raters started in a strong position toward agreement as they began work to reconcile differences. Further, the amount of work necessary for achieving agreement on all ratings was relatively modest, given the magnitude of the agreement estimates.
Results
Results of the content review of EA theses and dissertations are organized as three sections, which are based on themes derived from the content review: adherence to a specific EA model and purpose, research procedures, and EA results and use. Detailed results for each content review form question are provided in Appendix B.
Adherence to a Specific Evaluability Assessment Model and Purpose
The purposes of all of the theses and dissertations reviewed fell into the broad category of evaluating the local effectiveness of a professional practice (100%, n = 13) on the Willis et al. (2010) continuum; therefore, most were intended to contribute to professional knowledge and could be characterized as applied evaluation studies (Krathwohl & Smith, 2005). In addition, more than half of the theses and dissertations (69.2%, n = 9) included determining program evaluability as a study purpose. Those that did not include determining program evaluability as a study purpose instead included purposes such as evaluation planning, formative evaluation, implementation assessment, and logic modeling. The thesis and dissertation purposes and/or research questions focused on answers to be generated by the EA (84.6%, n = 11) more so than the EA method (15.4%, n = 2). More than half of the purposes and/or questions were appropriate for an EA (69.2%, n = 9) and alignment among purpose/questions, procedures, results, and recommendations was evidenced in more than half of the studies (61.5%, n = 8). Alignment was characterized as appropriate procedures for addressing purpose and/or research questions, results presented in response to purpose and/or research questions, and recommendations deriving from the results.
Most of the thesis and dissertation authors also identified an existing EA model from the literature as the framework for conducting their studies (76.9%, n = 10) and developed a program theory or model (84.6%, n = 11), often in the form of a logic model, as part of the EA process. Common EA models cited included those of Wholey (1979) and Rutman (1980). However, only 38.5% (n = 5) of the authors adhered to the model they identified. This was often due to authors not implementing the full EA model; that is, they cited the model to be used but did not implement all steps or components of the model. As previously mentioned, although the specific components of EA models differ across authors (see, e.g., Jung & Schubert, 1983; Leviton et al., 2010; Rog, 1985; Rutman, 1980; Schmidt et al., 1979; Smith, 1989; Thurston & Potvin, 2003; Wholey, 1979), the essential elements of EA, across all models, include stakeholder involvement, developing a program theory, gathering feedback on program theory, and using the EA. Fewer than half of the authors (46.2%, n = 6) sufficiently involved stakeholders in the EA process, and although the majority (84.6%, n = 11) of the authors developed a program theory as part of their studies, not all did.
Some authors used additional evaluation approaches or data collection methods in conjunction with EA (30.8%, n = 4). Some used additional research approaches such as case study and action research; others used additional evaluation types or approaches such as formative evaluation, process evaluation, and outcome evaluation; or included instrument pilots as part of EA. EA was included in the review of the literature sections of all of the theses and dissertations (100%, n = 13).
Research Procedures
Interview and document review or analysis were the most commonly used data collection procedures, with all of the authors using interviews (100%, n = 13) and most also using document review or analysis (92.3%, n = 12). Other common data collection procedures were observation (30.8%, n = 4), focus group interviews (30.8%, n = 4), and surveys or questionnaires (30.8%, n = 4). Although data collection procedures involved stakeholders, sufficient representation of stakeholders was evidenced in fewer than half (46.2%, n = 6) of the thesis and dissertation studies. Finally, data analysis procedures were sufficiently described and appropriate for the task at hand in more than half (69.2%, n = 11) of the theses and dissertations.
Fewer than half (46.2%, n = 6) of the authors specified that they obtained IRB approval for conducting their studies. Although 61.5% (n = 8) of the authors described their relationship with the study site or sample, fewer than half (37.5%, n = 3) of those who indicated that a dual role existed described how the relationship was handled in terms of potential bias or conflict of interest (e.g., a student conducting an EA for graduate work and also an employee of the organization in which an EA is being conducted).
Evaluability Assessment Results and Use
The majority of authors presented the results of their study in narrative and tables (46.2%, n = 6) or in narrative format alone (46.2%, n = 6). Few of the theses and dissertations followed a traditional five-chapter format (23.1%, n = 3). The number of chapters varied, with many including six chapters, and the content often differed from what is typically included in a traditional thesis or dissertation. Additionally, the focus of the presentation of results was on the results of the EA in most of the theses and dissertations (76.9%, n = 10), with fewer authors focusing on the EA process (23.1%, n = 3).
The use of more than half of the studies (61.5%, n = 8) was consistent with common expectations for EA use. When this was not the case (38.5%, n = 5), it was largely due to authors not adhering to or fully implementing an EA model, which would require that they develop a program theory or model, sufficiently involve stakeholders in the EA, and facilitate use of results and recommendations. In some cases (30.8%, n = 4), the purposes and questions of the theses and dissertations were not appropriate for EA.
All theses and dissertations (100%, n = 13) were characterized as theory informed (Willis et al., 2010). Further, most of the theses and dissertations had the potential to contribute to professional knowledge—for example, to program knowledge and professional practice (92.3%, n = 12), and only one study (7.7%) had the potential to contribute to academic knowledge and demonstrated sufficient scope for generalizability of findings or scholarly publication. Few of the authors addressed or showed support for validity or trustworthiness of findings (23.1%, n = 3), and fewer than half (38.5%, n = 5) described study limitations.
Key Findings
Key findings are presented in Table 2 along with related results. Findings are categorized as strengths of the theses and dissertations, areas where some improvement is needed, and weaknesses of the theses and dissertations. Areas where some improvement is needed were identified when although more than half of the theses and dissertations included an important characteristic, most or all of them did not, thus leaving room for improvement.
Key Findings and Related Results Categorized as Strengths, Some Improvement Needed, and Weaknesses.
Note. EA = evaluability assessment; IRB = Institutional Review Board.
aThe number who cited an EA model was 10; the percentage was calculated with a denominator of 10 instead of 13. bThe number who had a dual role was 8; the percentage was calculated with a denominator of 8 instead of 13.
Discussion and Recommendations
Overall, several strengths of the EA thesis and dissertation studies were identified. First, most included development of a program theory model. As an essential element of any EA, developing a program theory model shows understanding of the role of program theory in EA and supports the accuracy of evaluation findings, specifically “Explicit Program and Context Descriptions” as described in The Program Evaluation Standards (Yarbrough et al., 2011). It further demonstrates development of evaluator competencies including understanding the knowledge base of evaluation and specifying program theory, both of which are “systematic inquiry” competencies according to Stevahn, King, Ghere, and Minnema (2005).
Second, the thesis and dissertation authors used data collection procedures commonly used for EA. All used interviews and most also used document review. Observation, focus group interviews, and surveys and questionnaires were used as well. This is similar to findings in the published literature on EA (Trevisan, 2007) and demonstrates understanding of data collection procedures that are appropriate for the approach, supporting the accuracy of evaluation findings (Yarbrough et al., 2011). It also demonstrates development of evaluator competencies in systematic inquiry, such as knowledge of qualitative methods, developing evaluation designs, identifying data sources, and collecting data (Stevahn et al., 2005). In addition, alignment among study purpose/questions, procedures, results, and recommendations was evidenced in more than half of the theses and dissertations, and most data analysis procedures were sufficiently described and appropriate. This further supports the accuracy of evaluation findings (Yarbrough et al., 2011) and development of evaluator competencies in systematic inquiry (Stevahn et al., 2005). Thus, pedagogical elements necessary for the development of technical skills in evaluation, such as skills in evaluation design, data collection, and data analysis were generally evidenced.
In addition to strengths, several areas for improvement in EA theses and dissertations were identified. The following is a discussion of these findings and related recommendations.
Recommendation 1: Be Clear About Evaluability Assessment Intent, Model, and Modifications
More than half of the EA theses and dissertations cited determining the evaluability of a program as a study purpose; however, EA was not always used this way and was sometimes confused with other evaluation activities such as evaluation planning, formative evaluation, implementation assessment, and logic modeling. Although these evaluation activities can be done as part of EA and/or as a purpose of EA, they can also be done independent of the approach. Without clarity, it was not possible to determine if or how these activities were connected to EA. Further, most of the authors did not adhere to the EA model they specified, nor did they describe modifications or provide rationale for not implementing the model as intended. Lack of adherence was largely due to authors not implementing the full model. For example, two, or 15.4%, of the authors did not develop a program theory. Because development of program theory is an essential element of EA, in our view, even 15.4% is too high a number of studies without program theory. In addition, many did not sufficiently involve stakeholders in the EA and some did not use the results of the EA as is expected for EA use.
Similar confusion related to the purpose of EA has been noted in the EA literature as has inconsistency in EA implementation and use (Smith, 2005; Trevisan, 2007; Trevisan & Walser, 2014). Trevisan (2007) found that among published EA studies, few authors explicitly identified the model used and clearly followed it. Additionally, when other evaluation activities or uses were included in studies, such as formative evaluation and process evaluation, no differentiation or explanation was provided. Thus, the issues related to implementation and use identified in this review are potentially a reflection of confusion related to the implementation and use of EA in actual practice, similar to what was observed in the published literature. Note that the parallel confusion appears to be among nonevaluators,; that is, discipline-based faculty and students in graduate professional degree programs in this review and nonevaluation professionals in practice identified in the published literature (Trevisan, 2007).
Based on these findings, one recommendation is that those conducting an EA thesis or dissertation use EA to address study purposes and questions that are consistent with the purposes and intended uses of EA; be clear about the EA model used, including the author and steps undertaken; and fully implement the model specified. Any change to the model should be described and justified. Any negative impacts on the study findings, conclusions, and recommendations, due to the revision of the model, should be identified and discussed. This recommendation supports the accuracy of evaluation findings and evaluation accountability as described in The Program Evaluation Standards (Yarbrough et al., 2011) along with development of evaluator competencies in systematic inquiry, including understanding the knowledge base of evaluation and developing evaluation designs (Stevahn et al., 2005). It could also help to alleviate some of the confusion about EA implementation and use as found in this review and may further help to decrease confusion in actual practice, particularly among nonevaluators.
Recommendation 2: Involve Stakeholders in the Evaluability Assessment Process
Stakeholders were not sufficiently involved in more than half of the theses and dissertations reviewed. For EA, stakeholder involvement includes stakeholders as data sources (e.g., interview participants) and stakeholders as participants in the EA process (e.g., involvement of key personnel and/or an EA work group in the conduct of the EA). As mentioned, stakeholder involvement is an essential element of any EA and is perhaps one of the most beneficial aspects of conducting an EA thesis or dissertation. That is, involving stakeholders sufficiently in the EA process has strong potential for developing the nontechnical skills, such as interpersonal skills, which are difficult to learn through coursework but key to effective evaluation practice.
Given this finding, we recommend that students undertaking EA thesis or dissertation studies involve stakeholders as both data sources and participants in conducting the EA. This involvement is needed to fully adhere to the EA process and supports utility, feasibility, propriety, and accuracy standards (Yarbrough et al., 2011) as well as several evaluator competencies that fall into the categories of “Professional Practice,” “Situational Analysis,” “Project Management,” and “Interpersonal Competence” (Stevahn et al., 2005).
Recommendation 3: Obtain IRB Approval and Control for Bias and Conflicts of Interest
Fewer than half of the authors of the theses and dissertations reviewed for this study specified that they had obtained IRB approval. Given that data are collected from people in any EA, IRB approval should be obtained in each case and clearly documented in the thesis or dissertation. Also, over half of the theses and dissertations were conducted by students who were employed by the agency in which the EA was being conducted or had some other relationship with the agency. In short, the student had a dual role. To be sure, we do not view the idea of a dual role in and of itself as a problem. Moreover, students in graduate professional degree programs are often employed in a professional setting. Conducting an EA in this setting is convenient and meaningful for them and a potential benefit for the agency. However, fewer than half of these students described how they dealt with potential issues of conflict of interest and bias, given this dual role. Thus, to ensure quality studies and sound decision making, it is important that students in graduate professional degree programs using EA for their thesis or dissertation, particularly within an agency that they have some type of relationship with, think through the ethical complexities of such work, describe the study setting and their role, and reflect on their role and its impact on the study (Auerbach, 2011). Students should follow and document procedures to ensure ethical research practices. Particular attention should be given to procedures for minimizing bias and conflict of interest when a dual role exists. This recommendation supports evaluation propriety, specifically evaluation standards of human rights and respect and conflicts of interest (Yarbrough et al., 2011), along with evaluator competencies of “Professional Practice” (Stevahn et al., 2005, p. 52).
Recommendation 4: Provide Evidence of Validity and Trustworthiness
Few of the studies included information about how validity or trustworthiness was addressed and fewer than half included study limitations. In addition, only two of the studies reviewed, both of which were doctoral dissertations, included mention of The Program Evaluation Standards (Yarbrough et al., 2011). Another recommendation then is that students conducting an EA thesis or dissertation address the validity and/or trustworthiness of findings and identify limitations of the study. Students should also use The Program Evaluation Standards (Yarbrough et al., 2011) and/or other guidance for conducting and assessing program evaluations, such as AEA’s Guiding Principles for Evaluators (AEA, 2004) and AEA’s Statement on Cultural Competence in Evaluation (AEA, 2011). This recommendation supports evaluation accuracy and propriety (Yarbrough et al., 2011). Evaluation accuracy requires sufficiently valid and reliable information; systematic collection, review, and verification of information; and explicit evaluation reasoning. Evaluation propriety requires transparency and disclosure regarding study limitations. This recommendation further facilitates development of evaluator competencies in the categories of “Professional Practice,” “Systematic Inquiry,” “Situational Analysis,” and “Reflective Practice” (Stevahn et al., 2005, p. 52).
Recommendation 5: Consider Contributions to Professional and Academic Knowledge
All of theses and dissertations reviewed were characterized as applied evaluation studies and most demonstrated potential to contribute to professional knowledge in the local context. This aligns with Krathwohl and Smith’s (2005) assertion that most program evaluations are applied studies and is not surprising, given that the focus of this review is on theses and dissertations from graduate professional degree programs. That said, it would be beneficial if students conducting such program evaluation studies included a section in their thesis or dissertation describing specific ways their work was used by the local organization—that is, how it contributed to professional practice.
In addition, an EA thesis or dissertation has the potential to contribute to academic knowledge in the content area of the study if the study is designed and results interpreted within the context of the research literature. It could also contribute to academic knowledge if questions are included that focus on EA theory, practice, and/or methodology. Thus, students in graduate professional degree programs conducting an EA thesis or dissertation should consider potential contributions of the EA to academic knowledge in addition to professional knowledge. This recommendation supports development of evaluator competencies in the categories of “Professional Practice” and “Interpersonal Competence” (Stevahn et al., 2005, p. 52).
Recommendation 6: Evaluation Faculty Reach Out to Faculty in Professional Programs
All of the EA theses and dissertations reviewed for this study were produced as part of graduate professional degree programs. Given the lack of understanding of EA observed in many of the theses and dissertations, it is reasonable to assume that there was little connection to evaluation faculty members for this sample of theses and dissertations. Thus, evaluation faculty on campuses with graduate programs that offer professional degrees, such as those in psychology, social work, and education, should reach out to these programs and offer support to faculty and students pursuing evaluation study, particularly an EA thesis or dissertation. This could be in the form of serving on dissertation or thesis committees, offering an EA or evaluation course, or participating in professional seminars. Given disciplinary boundaries and continued competition for evaluation coursework and other evaluation-oriented instruction on university campuses, close collaboration between and among faculty will be required (Altschuld, 1995). Evaluation faculty will need to learn about and understand the program, the kind of professional produced by the program, and the connection EA could play in the professional life of the student. Trevisan (2002) discussed the role evaluators could play in school counseling programs, and thus, this article could be consulted by evaluation faculty looking to support professional programs on their campus. The benefits of this work will likely be better understanding of EA by students and faculty from graduate professional degree programs and greater programmatic and pedagogical influence by evaluation faculty members on these campuses.
Final Remarks and Conclusion
The purpose of this article was to describe a sample of EA theses and dissertations produced by students in graduate professional degree programs and provide related recommendations for improved use of EA for thesis and dissertation studies. Findings indicate that, overall, students demonstrated some technical skills in program evaluation through their EA thesis or dissertation studies; specifically, they demonstrated skills in understanding the knowledge base of evaluation, evaluation design, qualitative methods, identifying data sources, data collection, and data analysis. However, findings also indicate a lack of understanding of EA and its essential elements, as well as a lack of understanding and application of standards for conducting program evaluation, such as IRB procedures and dealing with potential conflicts of interest, addressing the trustworthiness/validity of study findings, and identifying study limitations. Further, only one of the studies had potential to contribute to academic knowledge. Just like other program evaluations, EA thesis and dissertation studies should address The Program Evaluation Standards (Yarbrough et al., 2011), including standards of utility, feasibility, propriety, accuracy, and evaluation accountability to meaningfully contribute to professional knowledge and practice. Students are encouraged to describe how their study was used locally to contribute to professional knowledge and are further encouraged to examine their work within the broader disciplinary context and consider how it may also contribute to academic knowledge in the content area of the study and/or in program evaluation.
We, like other evaluation faculty who provide dissertation support, must find a way to work within the pedagogical expectations and constraints of the diversity of graduate programs in which students may conduct a program evaluation thesis or dissertation. The continuum of dissertation purposes offered by Willis et al. (2010) that we used for the content review provides a reasonable way to think about this issue and has strong pedagogical implications that we find compelling as faculty who works with students from a wide variety of graduate programs.
In addition, given the misunderstanding of EA evidenced in our review, it is likely that there was little connection to program evaluation faculty for this sample of EA theses and dissertations. Thus, we made a recommendation for evaluation faculty to reach out to graduate professional degree programs, such as social work, education, psychology, and offer assistance to those using EA (or other approaches to program evaluation). We view this work as critical and somewhat urgent so that evaluation work by these students and programs is done in a manner consonant with principles and standards that the evaluation field holds as central to conducting quality evaluation. There is an enormous opportunity for evaluation faculty and the field as a whole to have a positive impact on the larger evaluation enterprise, as graduate professional degree programs employ program evaluation in the context of dissertation and thesis work for their students.
This study provides a starting point to improve the use of EA in the context of thesis and dissertation work and build capacity of nonevaluation professionals who may be responsible for EA work in their organizations. As mentioned previously, although the focus of this article is on the EA thesis and dissertation, the results, key findings, and recommendations may also be useful to those conducting and supervising graduate professional thesis and dissertation studies using other approaches to program evaluation. The teaching of evaluation literature could benefit from further work and guidance on conducting program evaluation theses and dissertations in general, and EA theses and dissertations in particular—for bridging the gap between evaluation theory and practice through application of technical and nontechnical skills; thus, improving the process as a practical learning experience for students. For example, more specific guidance on how to negotiate common contextual and political factors involved in conducting EA, how to involve stakeholders meaningfully in the EA process, and how to facilitate the use of EA results to make decisions could be beneficial. Such resources are needed not only to improve the quality of EA theses and dissertations and related student learning but to also promote program evaluation approaches such as EA as potential designs for thesis and dissertation work for students in graduate professional degree programs.
Finally, the purpose of this study was to describe a sample of EA theses and dissertations from graduate professional degree programs and provide related recommendations for improved use of EA—to provide a starting point for understanding the current state of EA theses and dissertations. As with program evaluation in general, there is a lack of knowledge about the “content” of EA theses and dissertations as well as a lack of knowledge about the effectiveness of the thesis and dissertation process as a practical training experience. Based on the number of theses and dissertations located for this review, EA is not being widely used for theses and dissertations in graduate professional degree programs. This could be due to faculty and student unfamiliarity with EA and/or related questions about its appropriateness as a design for thesis and dissertation studies. As mentioned, there has been much misunderstanding of EA throughout its evolution (Trevisan, 2007; Trevisan & Walser, 2014). Further research is needed to better understand the value of EA theses and dissertations as practical training for students in graduate professional degree programs as well as the value of the work to the organization for which the EA was conducted. We recommended that students include a section in their thesis or dissertation describing specific ways their work was used by the local organization. Additional research could include follow-up data collection, such as interviews with the thesis and dissertation authors, their advisors, and/or the evaluation client, to determine the value of the EA for practical training and the organization.
Footnotes
Appendix A
Appendix B
Results of Dissertation and Thesis Review by Content Review Form Question.
| Review Question | Response Percentage (Frequency) |
|---|---|
| EA Model | |
| 1. What EA model was used? | Model cited: 76.92% (10)
Wholey (5) Rutman (4) Thurston (1) No model cited: 23.08% (3) |
| 2. Did the author adhere to the model? | Yes: 38.46% (5) No: 61.54% (8) |
| 3. Was EA described in the review of the literature? | Yes: 100.00% (13) |
| 4. Were separate types of evaluation or data collection used in conjunction with EA? If so, what were they? | Yes: 30.77% (4)
Instrument pilot (2) Case study research (1) Process evaluation (1) Action research (1) Theory-driven evaluation (1) Implementation assessment (1) Formative evaluation (1) Outcomes evaluation (1) Metaevaluation (1) |
| No: 69.23% (9) | |
|
Was a program theory developed? If so, was a logic model developed? |
Yes: 84.62% (11)
Logic model developed (8) No logic model developed (3) |
| No: 15.38% (2) | |
| Purpose | |
| 1. What was the purpose of the thesis/dissertation research according to the Willis, Inman, & Valenti (2010) continuum? | Evaluate the local effectiveness of a professional practice: 100.00% (13) |
| 2. Were the purpose and/or research questions focused on the EA method or answers to be generated by the EA? | Method: 15.38% (2) Answers: 84.62% (11) |
| 3. Were they appropriate for an EA? | Yes: 69.23% (9) No: 30.77% (4) |
| 4. Was there alignment among purpose/questions, procedures, results, and recommendations? | Yes: 61.54% (8) No: 38.46% (5) |
| Procedures | |
| 1. What data collection procedures were used? | Interviews: 100.00% (13) Document review: 92.31% (12) Observation: 30.77% (4) Survey/Questionnaire: 30.77% (4) Focus groups: 30.77% (4) Site visits/Meetings: 7.69% (1) |
| 2. Did the author describe his or her relationship to the study site/sample? If a dual role existed (i.e., a student and an employee or affiliate of the organization for which an EA is being conducted) did the author describe how this was dealt with? | Yes: 61.54% (8)
Describe how dealt with (3) Did not describe how dealt with (5) |
| 3. Was IRB approval obtained? | Yes: 46.15% (6) Not reported: 53.85% (7) |
| 4. Was stakeholder involvement sufficiently representative? | Yes: 46.15% (6) No: 53.85% (7) |
| 5. Were data analysis procedures described and appropriate? | Yes: 69.23% (9) No: 30.77% (4) |
| Results | |
| 1. How were results presented—for example, tables and graphs and/or narrative; organized by research question, step in the EA model, logic model? | Narrative and tables: 46.15% (6) Narrative: 46.15% (6) Other: 7.69% (1) |
| 2. Was the focus on the EA process or the EA results? | EA process: 23.08% (3) EA results: 76.92% (10) |
| 3. Were validity or trustworthiness of results addressed and/or supported? | Yes: 23.08% (3) No: 76.92% (10) |
| 4. Were limitations described? | Yes: 38.46% (5) No: 61.54% (8) |
| Use and format | |
| 1. How was the EA used (applied use)? Did it contribute to professional knowledge—for example, solve a real-world problem? | Yes: 92.31% (12) No: 7.69% (1) |
| 2. Was its use consistent with the purpose of EA (applied use)? | Yes: 61.54% (8) No: 38.46% (5) |
| 3. Did the EA contribute to the larger literature base on the topic, EA, and/or evaluation—for example, were results interpreted in the context of other literature; does the work contribute to academic knowledge (academic use)? | Yes: 7.69% (1) No: 92.31% (12) |
| 4. Was the thesis/dissertation theory-based or theory-informed—for example, did the author test or develop a theory or did the author use theory to inform the study? If theory-informed, was the study well-grounded in (clearly tied to) theory/literature on the study topic? | Theory-informed: 100.00% (13) |
| 5. Was the scope (breadth and/or depth) of the thesis/dissertation sufficient for generalizability of findings and/or scholarly publication? | Yes: 7.69% (1) No: 92.31% (12) |
| 6. Did the thesis/dissertation follow a traditional five-chapter format or was another format used? | Traditional: 23.08% (3) Other: 76.92% (10) |
Note. EA = evaluability assessment; IRB = Institutional Review Board.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
