Abstract
Purpose:
This case study is the introspective account of the evaluation process of Functional Family Therapy (FFT) as implemented in Middlesex County in New Jersey between 2005 and 2011. The study presents challenges and issues in evaluation falling into three main categories.
Methods:
The case study is based on the recollections and documented experiences of the author who was responsible for all major aspects of the evaluation including designing the study, collecting the data, and handling daily evaluation activities.
Results:
The author differentiated among three main categories of challenges. In respect to research design, the relative merits of experimental versus nonexperimental designs and quantitative versus qualitative research methods are discussed. The second set of issues involves developing and exercising the social competence skills necessary to form working partnerships with service providers. The third set encompasses logistical barriers encountered during daily evaluation activities.
Conclusions:
The challenges and lessons learned from conducting the outcome evaluation of FFT are situated within scholarly debates on evaluation research, with the goal of providing further insights into the on-the-ground implementation and process of program evaluations. The experiences, recollections and processes illustrate challenges and solutions applicable to evaluations of other family-based violence prevention interventions.
Evaluations of family-based violence prevention interventions have a unique place in social science research. They have traditionally been supported by various governmental and nongovernmental agencies, and in recent years, evaluation findings have been widely disseminated in peer-reviewed journals and at academic conferences (Butts & Roman, 2018; Fagan & Buchanan, 2016; Petrie & Weisburd, 2006). However, evaluation research has not been recognized as a separate academic discipline or even as a separate scientific method (Levin-Rozalis et al., 2003). Regardless of what constitutes “real” evaluation research, evaluations can illuminate the most effective or promising interventions and thereby help transform research findings into practice and policies. “It is about making a real difference and making it happen now” (Welsh, 2016, p. 611).
The present case study describes challenges encountered while attempting to evaluate a family intervention. Between 2005 and 2011, the author was the principal evaluator (PI) of functional family therapy (FFT), an intervention implemented in Middlesex County, New Jersey. FFT is an evidence-based intervention that addresses the needs of delinquents and youth at risk of delinquency through targeting intra-family dynamics. Prior evidence of FFT’s effectiveness is strong enough to support implementing the program but, by no means, strong enough to assure its success (Jaycox et al., 2006). As explained by McPherson et al. (2017), knowledge on the impact of routine service delivery of FFT outside of controlled clinical context is very limited. In addition, more research is still needed on how to conduct an effective evaluation of FFT or any family intervention in various settings. Understanding challenges in evaluating these types of programs can assist future efforts in addressing potential barriers before they occur. This is an important contribution as the literature on the evaluation processes of various interventions is still extremely limited.
The Case of FFT
FFT is a systematic clinical model developed by Alexander and Parsons in the late 1960s. Its theoretical roots can be traced to early communication theory, cognitive theory, and social constructionism. Sexton and Alexander (2004) describe FFT as a “true” family-based intervention that addresses all youth’s experiences: cognitive, emotional, and behavioral; and all major life dimensions (individual, family, and multisystemic). An evidence-based intervention, FFT is demonstrably effective in addressing youth problematic behaviors, delinquency, and substance abuse (Robbins et al., 2016).
FFT is a manualized short-term family intervention that is delivered by FFT-trained therapists. It targets youth between ages 11 and 18 and requires that at least one-involved parent or guardian be present during the therapy sessions. The FFT intervention model consists of five distinct components: engagement, motivation, relational assessment, behavioral change, and generalization (Alexander et al., 2013).
Method
This is an introspective case study centered on the evaluation of the FFT intervention as implemented in Middlesex County, New Jersey, between 2005 and 2011. This multiyear evaluation research project relied on a mixed-methods data collection approach. The research project adopted a quasi-experimental design with qualitative interviews and observations. Experimental design was not possible in this particular therapeutic and community setting. Just over 140 youth and their families were enrolled in FFT and recruited to the treatment group. The sample for the comparison group included over 100 youth enrolled in a case management program. The research design was adjusted to assist with the program’s functioning (process evaluation component was added to assist in understanding an early low enrollment in the program) and the change in the data collections (qualitative interviews with the clients and the data on recidivism for the comparison group were added).
This case study is based on the recollections and documented experiences of the author who was responsible for all major aspects of the evaluation, which included designing the study, collecting the data, handling daily evaluation activities, and disseminating the results. These experiences, recollections, and processes that are based on the outcome evaluation of FFT illustrate challenges and issues applicable to evaluations of other family-based violence prevention interventions.
Results
Three different sets of challenges were identified: research design and methods, collaboration and collaboration-related skills of the researcher, and the logistical barriers. See Table 1 for the summary of challenges and potential solutions.
Summary of Challenges in Evaluating the Functional Family Therapy Intervention and Potential Solutions.
Research Design and Methods
Experimental versus nonexperimental design
Although a randomized experiment would have offered maximal scientific rigor, stakeholders (funding and major referrals sources representatives) ruled it out on ethical grounds because it would have meant denying some youth access to FFT. Instead, stakeholders decided that no adolescent would be refused admission to FFT until the program reached full capacity. Assignment to treatment-based program availability, as opposed to the characteristics of the youth, still permitted a strong quasi-experimental design. That is not to say that the execution of the quasi-experiment went smoothly. The original provider that was selected, trained, and prepared to provide services to the comparison group lost funding and had to withdraw from the study. The selection of a new comparison group, a case management organization, resulted in a number of challenges, such as additional training on data collection.
Evaluations based on experimental design are often called “real evaluations” as they permit valid causal inferences that allow for generalization of the results (Holosko, 2010; Lum & Yang, 2005; Shadish et al., 2002; Wilson, 2006). However, it is not always possible to pursue an experimental design for a variety of practical, ethical, and legal reasons (Nagin & Weistburd, 2013). Indeed, many scholars have noted how rarely randomized controlled experiments are used in evaluation research, including in the evaluations of family therapies (Howell & Yemane, 2006; Lum & Yang, 2005; Stratton et al., 2015). For example, Sherman et al. (2002) found that out of 657 evaluations, over 80% used nonexperimental methods. In fact, it seems that the most evaluators tend to employ convenience samples (Butts & Roman, 2018; Welsh, 2016).
Some scholars have observed that social science researchers attempt to build on randomized controlled trials as they were clinical trials in medical or psychological research. The underlying assumption is that the intervention can be easily measured as a binary variable, which actually does not capture the complex social reality in which interventions are implemented. The image of precisely and rigidly controlled conditions and neatly demarcated variables rarely aligns with social reality (Butts & Roman, 2018; Lum & Yang, 2005).
The practical implications of randomized trials are often limited. For example, complex interventions often preclude a clear understanding of which aspects of the intervention were helpful and which were not (Butts & Roman, 2018; Chemers & Reed, 2005; Rose & Bowen, 2019). According to some scholars, conducting randomized clinical trials among culturally diverse clients is especially challenging (Marsiglia & Booth, 2013; Sampson & Torres, 2015). Not surprisingly, published evaluation studies often lack adequately detailed methodological information (Stratton et al., 2015).
Given the difficulties in implementation and negative consequences of relying solely on fully experimental designs, others have suggested alternative research designs that could be employed in conducting evaluations. For instance, some scholars suggest quasi-experimental or observational designs (Lipsey, 2006; Nagin & Weisburd, 2013). Both Lauritsen (2006) and Eck (2006) recommended case studies or multiple nonrandomized evaluations with small samples. Finally, Linning et al. (2019) advocated for time-series analysis that focuses on temporal effects and underlying processes in conducting intervention.
Quantitative versus qualitative methodology
The initial evaluation of FFT followed solely a quantitative design with two main quantitative outcomes: the changes in the strengths and needs assessment (SNA) and subsequent court appearance (representing recidivism). It was only during the implementation of the evaluation that the PI added a qualitative component to the evaluation. In order to address an inadequate number of referrals to FFT, the evaluator conducted a series of interviews with the therapists who provided the intervention and the probation officers, judges, and social workers who were making referrals to FFT. Later, satisfaction interviews with parents and youth were also added to the evaluation.
The inclusion of qualitative methods brought its own challenges. Early on, support for conducting interviews was tepid because some stakeholders argued that the interviews would not provide the “hard evidence” needed to gauge the effectiveness of the intervention. The PI countered that qualitative results can provide valuable insights and necessary context to quantitative outcomes. In fact, the interviews with families assessed strengths and weaknesses of the program and the families’ feedback facilitated processing their therapeutic experience.
While collecting qualitative data for evaluation of the FFT program enriched the results and brought up important insights into assessing intervention, it was also burdensome. The main challenges were the added costs in terms of time conducting interviews, transcribing interviews, and analyzing qualitative data, as well as the difficulties in identifying peer-reviewed journals willing to publish the qualitative results.
An ongoing debate concerns whether qualitative methods should be included in evaluations. The prevailing view is that the qualitative results are not generalizable and thus cannot reliably indicate interventions’ effectiveness. By contrast, proponents of qualitative data in evaluation research see these data as an important or necessary addition to quantitative data (Holosko, 2010; Patton, 2015; Plewis & Mason, 2005), providing “human factor dimensions” (Brown, 2006; see also Bloom, 2010).
Furthermore, when evaluating heterogeneous programs, a qualitative design in the form of case studies might be even more useful than randomized trials. Patton (2015) suggested that in-depth interviewing, focus groups, participant observations, document and discourse analysis, and case studies are pillars of “utilization-focused evaluation.” These qualitative methods provide necessary context and richness, and results are viewed as more authentic and comprehensible to practitioners (Brown, 2006; Ong et al., 2020).
Many scholars believe that the future of evaluation research lies in integrating qualitative and quantitative methods (Plewis & Mason, 2005). This could be a particularly welcome development in the family therapy field, given the scarcity of qualitative and mixed-method studies in that context (Christenson & Gutierrez, 2016).
Collaboration and the Skills of a Researcher
Every step in evaluating the FFT intervention required collaborations. Forging and fostering those relationships was vital to each stage of evaluation process: design, data collection, analysis, and dissemination of the findings. During the early stages of implementing the FFT evaluation, the PI developed relationships with therapists who provided FFT to families, case managers who provided alternative services to the comparison group, and with other major stakeholders. Developing positive and working relationships with the FFT therapists was very important but also challenging at times and time-consuming. For example, the therapists who had greater familiarity with FFT tended to work better with the researcher than those who did not (see also McPherson et al., 2017). The therapists were preoccupied with various administrative and workload issues. They already viewed documenting and recording work with families as time-consuming and onerous. So, adding any evaluation data collection instruments to the therapists’ workload was problematic. Moreover, some therapists, at least early on, seemed wary that the evaluation would be used to assess their own productivity, effectiveness, and the clients’ satisfaction with the therapists.
Exercising teamwork skills was crucial to reducing therapists’ concerns and securing their cooperation. The first strategy was to emphasize common goals the therapists and the evaluator had. The second was to develop mutual respect via working together. To accomplish these objectives it was necessary to build positive and trusting relationships with the therapists. The PI attended staff meetings and FFT meetings, conducted data collection training sessions, and collaborated with the therapists to develop a short instrument on families’ participation in FFT, the services tracking form (STF). Choosing an appropriate outcome measure—the STF and the SNA (assessment that was routinely filled out by the therapists regardless of evaluation)—also facilitated the researcher–provider relationship (see also Phipps, 2019).
A consensus in literature supports the importance of collaboration between evaluators and service providers as well as other stakeholders. According to some scholars, good partnerships can influence the quality of the overall evaluation and the research findings (Bender et al., 2011; e.g., Bullock et al., 2012).
Effective collaborations and partnerships are generally based on mutual respect, flexibility, clear understanding of roles, and considering others’ interests (Bullock et al., 2012). Researchers should be sensitive about any additional responsibility that evaluation could impose on providers (McPherson et al., 2017). To increase staff buy-in, some scholars have even suggested involving the practitioners in the selection of study outcomes (Bender et al., 2011; Taut & Alkin, 2003). Indeed, PI incorporated that very strategy in the focal evaluation, the therapists were eager to participate, and it seemingly increased their engagement.
Research suggests that in terms of cooperation with service providers, the practical competence of the evaluators is not a barrier to effective evaluation but rather their social competence and relationship-building skills (Taut & Alkin, 2003). As Levin-Rozalis et al. (2003) noted, evaluators must have an ability to communicate with others, win others’ trust, and update practitioners on the evaluation process.
Logistical Barriers in Collecting Original Data
Given the need to recording and data collection tasks required of therapists, the evaluation necessitates extensive front-end tasks such as completing comprehensive paperworks (e.g., applications to the Internal Review Board (IRB), Certificate of Confidentiality and applications to the New Jersey Supreme Court) and frequent ongoing contacts with services providers throughout evaluation. In addition, logistics of planning and collecting qualitative data were quite complex. The evaluation-related activities included also trainings on data collection, various formal and informal meetings, overseeing data entry, and later extensive data analysis and work on dissemination of the results.
While conducting evaluation research, the most significant concerns related to ensuring privacy and confidentiality of the young participants. The study included both assent and consent forms, consent to record interviews, and the Certificate of Confidentiality. Similarly, special permission from the Supreme Court of New Jersey was required to access the court records.
IRB approval and modification processes were at times strenuous (requiring multiple iteration of consent forms and protocols with many new applications for continuation) and time-consuming (especially given that approvals were required from two educational institutions). Although midstream changes in evaluation protocols are obviously problematic (Jaycox et al., 2006), several revisions in the IRB research protocol were necessary (e.g., when applying for permission to use identifiable data for the comparison group).
In the early stages of evaluation, some therapists were concerned that the evaluation’s goal was to assess their performance. While those concerns were largely minimized with the FFT therapists via formal trainings and formal and informal meetings, and enlisting therapists to support evaluation activities, gaining support from the case manager was a tougher task. It is likely that a high staff turnover and a lack of direct communication with the case managers contributed to less communication and slower data collection.
One of the most important abilities in conducting this evaluation was PI’s flexibility and availability. For example, there were over 100 in-person interviews conducted with families in various locations and often arranged on short notice. Some interviews took place on Saturdays because it was the best time to participate for the families. Collecting qualitative data tend to be time-consuming and expensive and requires many logistic tasks.
Discussion
Conducting multiphased mixed-methods evaluation project with a sample that consisted of a vulnerable group of youth and their guardians was a complex endeavor with many challenges, some of which were unexpected. Participating in every step of evaluation of FFT for nearly a decade allowed for the deeper understanding of evaluation process.
In the present case, three different sets of challenges were identified: research design and methods, collaboration and collaboration-related skills of the researcher, and the logistical barriers. After conducting FFT evaluation and successfully navigating its various challenges (e.g., selection of the comparison group), and after being informed by relevant literature, it was clear that there are other research designs besides experiments that can provide meaningful results, be less burdensome, and cost-effective (Farrington, 2006). For instance, quasi-experiments are suitable substitutes for experiments. They are highly uneven, but they are also not obtrusive (Eck, 2017). On the other hand, even many small evaluations and case studies could contribute to the programs’ improvement and policy development (Farrington, 2006). It is important to note that overestimating randomized controlled trials in evaluations might lead some researchers either to change their research methodology all together or to concentrate only on projects that can be evaluated via an experimental design, which would result in many interventions not being studied at all.
Conducting qualitative interviews with families was valuable to understanding the therapeutic intervention. While the qualitative results might seldom inform about the outcomes, they provide important context and give voice to participants (Eck, 2006).
When designing an evaluation of the intervention, the researchers should consider outcomes that are relevant to assessing the services that clients receive and that are sensitive to changes in clients’ behavior or attitudes (see Stratton, 2017). In this study, this recommendation was achieved because the evaluation employed two outcome measures that met this criterion: SNA and the STF.
Although more scholars recognize value in different types of evaluation designs and in the use of mixed-methods approach, grantees and editors of peer-review journals continue to be less flexible. Researchers who conduct outcome evaluations in the form of an experimental design have a “high disciplinary legitimacy” (Soydan, 2002), while evaluations that do not follow rigorous research criteria are viewed often as bad research (Levin-Rozalis et al., 2003). This challenge is difficult to overcome unless more evaluations based on various research designs are published and recognized as contributing to their prospective social science fields.
The second group of challenges pertains to developing collaborations with others, especially service providers. The evaluators need to be flexible, adapt to new situations without jeopardizing research design, adapt when dealing with organizational obstacles or requirements that are outside of research, be sensitive to the needs of participants and service providers, and be constantly learning about the intervention they are evaluating. The work of an evaluator is often consuming and underappreciated as it does not bring immediate results and can be viewed by some as invasive. The lesson learned is that it takes time to develop working relationships with others and that certain interpersonal skills are imperative. It is important to involve service providers in evaluation, share information about findings, and encourage their feedback.
Finally, the logistics of conducting evaluation of FFT were often time-consuming—requiring adjustments and trade-offs. Any evaluation of the intervention requires exchanging hundreds of emails and phone calls, attending meetings, and completing extensive paperwork. Those challenges tended to dominate daily evaluation activities.
Although outside of the scope of this study, it is worth to mention other issues that are being debated regarding evaluations. Some discussions center on theoretical foundations of evaluations, and whether evaluations should consider theories that provide background for the interventions (e.g. psychological theories) or the so-called evaluation theories that focus on creating evaluation design (Lipsey, 2006; Rosenfeld, 2006; Sherman, 2006). Another interesting topic regards objectivity—specifically, whether objectivity is possible in evaluation research. Some scholars argue that evaluation is value-laden (Cohn, 2002) and that available funding and grants impact what types of interventions are being evaluated (Fagan & Buchanan, 2016). Similarly, Barton (2002) claims that evaluators have little control over how evaluation findings are interpreted by practitioners and implemented in policies.
Evaluations are necessary, even for a known intervention like FFT, because once the programs are introduced to communities they tend to change or be modified due to lower per-case funding, lower fidelity to the model, negative expectations from participants if they are mandated to participate, negative attitudes from service providers if they are “forced” to learn the new intervention, or when they are implemented with a new population and context (Dodge & Mandel, 2012).
Evaluations do not hold a high status in any social science field, as other types of research (e.g., theoretical) are valued more (Farrington, 2006). The overall sentiment in the United States tends to be against evaluation research of prevention interventions, as the prevailing attitudes seem to privilege continuous preoccupation with individual and local responsibility for any behavior, including crime and delinquency (Rosenfeld, 2006). If evaluations were to be acknowledged as a part of a separate and independent discipline, it would allow evaluators to develop their own criteria, theoretical basis, and appropriate methods (Levin-Rozalis et al., 2003). According to Serbati (2020), evaluations should provide the broader knowledge, not only evidence on “what works” but also processes that will inform about the implementation of the intervention.
Sherman (2006) claimed that researchers are interested in “better” evaluations, but even more so, in solving problems. Knowledge about programs and interventions is built over time and requires different research approaches and the examination of programs in various contexts. Although one evaluation cannot solve all potential problems in a single study (Harrell, 2006), sharing knowledge on the processes involved in conducting an evaluation is a fundamental part of the discussion needed to move forward in establishing evaluations’ criteria and the evaluation field.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
