Abstract
BACKGROUND:
We argue that relevant research must focus on a problem of practice. We demonstrate this approach by developing a new product for practice to help counselors make informed vocational rehabilitation decisions. With only 18.7% of persons with disabilities employed in 2017, an accurate and simple prognostic tool could improve the effectiveness of the Individualized Plan for Employment and thus assist persons with disabilities to live independent lives.
OBJECTIVE:
To demonstrate the validity and practical relevance of a counseling decision support tool that accurately predicts clients’ employment outcomes based on demographic characteristics and vocational rehabilitation factors.
METHODS:
Using a historical sample of 53,629 persons with disabilities who completed vocational rehabilitation in a state agency, we derived our prediction model using logistic regression with 90-day employment as the outcome.
RESULTS:
The final prognostic model was derived from 12 client demographics and 20 vocational rehabilitation factors. The model correctly classified the outcome for 72% of the clients and demonstrated strong calibration and discrimination. The resulting app is available here: www.ablescore.com.
CONCLUSIONS:
AbleScore accurately classifies client probability of employment at closure. The app therefore has immediate application in providing evidence-informed rehabilitation counseling to people with disabilities to improve the odds of employment at closure.
Introduction
Despite the growing and spirited calls for the application of research evidence in the practice of vocational rehabilitation (VR), and the considerable resources devoted to evidence-based practice (EBP) and Knowledge Translation (KT) movements, a consistent conclusion in the literature and from the field is that available VR research evidence is rarely used by VR practitioners (Bennett et al., 2003; Chan et al., 2010; Dubouloz et al., 1999; Graham et al., 2013; Humphris et al., 2000; Leahy, Thielsen, Millington, Austin & Fleming, 2009). Even when research evidence makes it into practice, studies have shown no change in practitioners’ attitudes toward adoption and use of such evidence in practice (Graham et al., 2013; Menon et al., 2009). For example, an important study by Graham et al. (2013) shows that while 84% of practitioners (counselors) reported that they value research for practice, more than 40% found research evidence in academic papers impractical to implement or, frankly, they find research evidence irrelevant and somewhat detached from the real-world challenges they face in their local practices. Evidence from the history of the research-practice gap in healthcare suggests that adoption of research evidence in VR practice may not improve soon. For example, Westfall and colleagues (2007) found only 14 percent of research in healthcare is translated into practice and it often takes a long time to complete the research-to-practice cycle. Morris et al. (2011) have shown that in medicine, it has taken over 17 years for discoveries to be translated into practice.
Scholars and policymakers who have studied this issue have spent most of their energy debating the gap and attempting to explain why it exists. Scholars suggest that differences in cultural norms, rules, communication styles, and goals between researchers and practitioners make knowledge exchanges and interactions difficult (Graham et al., 2013; Salbach et al., 2007; Sherman et al., 2014; Winch, Henderson, & Creedy, 2005; O’Donnell, 2004). For example, Graham et al. (2013) pointed to a lack of state agency support and lack of funding for EBP and KT initiatives. Martin and Martin (1989) suggested that research evidence is difficult for counselors to comprehend and mostly irrelevant to their practice. O’Donnell (2004) cited counselors’ lack of time due to excessive caseloads, limited access to research and lack of expertise in evaluating and applying available evidence effectively in practice. Further, Salbach et al. (2007) argued that the stock of rigorous research evidence is limited because VR studies are more descriptive than empirical, making it difficult in determining best evidence. Others lay the blame on counselors’ negative attitudes toward the use of academic evidence in practice (Martin & Martin, 1989; Winch, Henderson, & Creedy, 2005).
Not surprising, practitioners have a different take. They point to several problem areas. First and foremost, is that research evidence is not relevant to problems they face in their situated practices (Armstrong et al., 2007), underscoring the lack of interest on the part of scholars to do research that is practice-oriented rather than theory-driven. Many practitioners find the language of research incomprehensible, and difficulties in interpreting technical/academic jargon is often cited (Armstrong et al., 2007; Martin & Martin, 1989). Another commonly mentioned issue is that even when relevant, the evidence is decontextualized, thus impractical and difficult to implement in a different setting.
Sadly, while this debate is raging, persons with disabilities are not getting the best VR interventions to help them find competitive employment. Regardless of the reasons cited by the debaters, this is unfortunate given the over $38 billion spent over the last decade in the state-federal vocational rehabilitation program (Department of Education: Rehabilitation Services and Disability Research budget allocation), and this despite $142 million specifically allocated in 2017 by the Administration for Community Living (ACL) to fund new research and translate existing research evidence into practice (ACL 2017 budget allocations). Notwithstanding this large investment, the result is the same: in 2008 the employment rate for persons with disabilities was about 39%, while today (ten years later) it is 36% (Erickson, et al., 2017; U.S. Bureau of Labor Statistics, 2017). While progress has been made legislatively (e.g., the Workforce Innovation and Opportunity Act [WIOA]), one might question whether persons with disabilities are reaping the full benefits of our investments in the Rehabilitation Services Administration (RSA) to help them find jobs and live independent lives.
To address the problem of making research more relevant to practice, the state-federal rehabilitation system has promoted knowledge translation with the hope to increase the uptake of research into practice and improve rehabilitation rates. The U.S. Department of Education, one of the champions of KT, defines KT as “the multidimensional, active process of ensuring that new knowledge gained through the course of research ultimately improves the lives of people with disabilities and furthers their participation in society” (U.S. Department of Education, 2006, p. 8195). The National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), whose key mission is to generate new knowledge and promote its effective use to improve the abilities of people with disabilities to achieve long-term outcomes, emphasized that “the KT process is active, as it not only accumulates information, but it also filters the information for relevance and appropriateness, and recasts that information in language useful and accessible for the intended audience.” The Knowledge-to-Action (KTA) approach, a variant of KT, has also been suggested as an appealing solution to bridge the gap between research and practice in VR (Graham et al., 2006; Leahy et al., 2014; Sudsawad, 2007). Unfortunately, there’s limited evidence to support or dispute the efficacy of knowledge translation strategies in rehabilitation. However, examining KT literature through the lens of practitioner-scholarship elucidates certain obstacles, which we next discuss.
First, implied in the NIDILRR definition of KT is the assumption that research evidence can indeed be filtered, recast, translated and made relevant to problems faced by practitioners in their situated practice. But an important point that is commonly overlooked in the KT literature, yet bears significant implications for applying evidence in practice, is that if the evidence, at its creation, is irrelevant to problems faced by practitioners in the field, relevance cannot simply be reengineered and added later. Research evidence and practice relevance cannot be separated empirically and united practically by filtering, recasting and translating. This is because the knowledge to be translated, at its creation, represents a particular reality, a specific worldview, influenced by the social context in which it was created, which often is distinct from the practitioner’s reality and worldview, and the context in which practice is embedded. Trying to translate scientific knowledge, which sometimes lacks relevance in practice, into practice knowledge is akin to force-fitting a square peg in a round hole. It is difficult, costly, time consuming and often wasteful. We argue that research evidence and practice relevance are mutually constituted, and therefore must be designed and integrated at the onset of a research project, not after evidence has been created. Thus, rather than starting with a research question, this view starts with a problem of practice (an important real-world problem acknowledged by members of a practice community), and through recurrent interaction between researchers, practitioners and other stakeholders, the problem of practice (PoP) is reconstituted as a research problem and a research question to address the PoP using rigorous academic methods.
A second salient concern pointed out by Doane and Varcoe (2008) is the conception of KT elements as dichotomies separating knower (researcher) from doer (practitioner), knowing (evidence) from doing (practice), and research context from practice context, overlooking the critical interrelationship between these elements in the production and application of research evidence in situated practice. In a parallel concern, Poole (2008) points to the oversight of extant KT literature that implies scholars are ‘experts’ translating knowledge without consideration for the grounded experience of the practitioner. Kitson (2008) and Reimer-Kirkham et al. (2007) warned that the KT assumption that knowledge can be decontextualized, repackaged and translated as a neutral, discrete entity is a problematic underpinning of the knowledge-to-action movement. We argue that the researcher and practitioner are co-creators of evidence, grounded in the context of practice. This partnership is achieved through intentional alignment of goals, clarification of roles and responsibilities, collaborative interaction and mutual adjustments of expectations.
A third frequently identified problem with KT is the lack of clarity on the end product (deliverable). A weakness commonly cited by practitioners is that, even after research evidence is translated, many findings were not readily fit-for-use in practice (Groah et al., 2009). The reason is that the starting point of KT and KTA is knowledge creation, not the problem faced by practitioners in the field. So KT’s primary focus is translating evidence to knowledge, but not translating evidence to a product to solve a problem. As such, the translated knowledge needs another translation by another knowledge translator to make it into the practice domain. As pointed out by Dijkers (2016), what has emerged is a multimillion-dollar cottage industry debating what constitutes research evidence and which of the never-ending iterations of KT framework should be adopted. We argue that if the goal of KT (not of science in general) is to create knowledge fit for practice, then the end product must be some practicable artifact. In the case of VR, perhaps such an artifact would be a tool counselors can implement to solve a rehabilitation problem in their situated practice; rather than just filtered, recast or translated knowledge.
A fourth handicap is availability of rigorous VR evidence to translate. A primary assumption underlying adoption of KT in a context such as VR, is that there is robust inventory of validated and accepted VR evidence derived from rigorous scientific research. However, as pointed out by Salbach et al. (2007), Groah et al (2009) and Tsang, et al. (2000), the stock of rigorous research evidence in the VR discipline is limited because VR studies are more descriptive than empirical, making it difficult to determine the best evidence for translation. Even when evidence exists, interpreting and translating such evidence into practice is often impractical. Because researchers strive for objectivity and generalizability (Bacharach, 1989; Sutton & Staw, 1995), while practitioners focus on problem specificity and ease of implementation (Van de Ven & Johnson, 2006), how to achieve alignment around a shared goal is a crucial missing link in the chain of activities defined in KT. Because scholars (e.g., Chan, Wang, Muller, & Fitzgerald, 2011; GAO, 2005, 2007) have suggested variation in state VR agency outcomes that can be attributed to differences at the individual level, program execution, contextual and economic characteristics of the state, and differences in performance of the VR agencies themselves, interpreting and translating generalized evidence to solve specific practice problems has not yielded expected results. Therefore, we argue that right from the onset of a research project (at least one claiming implications for practice), the agenda must be research that is mutually beneficial to the practitioner’s goal of practice specificity and the researcher’s goal of evidence generalizability.
Finally, as stated by Sudsawad (2007), KT in all its variants and abstractions is overwhelmingly complex, and as a multidimensional concept it demands a comprehensive understanding of its frameworks, strategies, methods, as well as the human behavior and contextual factors influencing its design and implementation. This conceptual and methodological complexity contributes to the difficulty of implementation in the field (McCluskey & Lovarini, 2005). With its numerous constructs, sub-constructs and contingencies, and limited details provided for each, knowing which combination of strategies and methods work for a particular knowledge and practice context is bewilderingly difficult. Additionally, the KT framework does not provide any behavioral change management process, or stakeholder alignment tools and guidelines on how to customize and implement research evidence in different practice environments. We argue that change leadership, stakeholder alignment and use case evaluation must be central to any KT initiative.
Given these identified obstacles, it should be no surprise that the attempt to use KT to unite evidence and practice has yielded poor results (Baumbusch et al., 2008). Because the KT movement in VR is in its formative stage, there is opportunity to reconstruct it to have a greater positive impact on creating and applying research evidence to solve rehabilitation problems to help more individuals with disabilities obtain and retain employment.
In response to this troubled context, the primary objective of this study is to propose a novel approach for creating research evidence relevant to the practice of rehabilitation counseling in a state vocational rehabilitation agency. The new approach, which we call Practice Motivated Research (PMR), starts with a problem of practice and ends with an evidence-informed solution to the problem of practice. As such, we reversed the current order of research-to-practice to a new order of practice-to-research, where a problem of practice is the motivation for research, and research evidence is both a product of and a solution to a PoP. In this paper, we applied the new approach in a situated case study to demonstrate how practice-motivated research can meet the requirements for academic rigor, practice relevance, and clinical usefulness, demonstrating how scholars and practitioners can work together to develop and apply evidence to solve important problems and make academic contributions at the same time. Finally, to enable practitioners to easily apply the evidence, we developed a user-friendly product for practice (PfP)—AbleScore—which applies prognostic/predictive analytics to help practitioners personalize rehabilitation by designing and implementing an optimal Individualized Plan for Employment (IPE) to improve the odds of employment for clients. Our proposed approach sheds light on how research should be motivated by a problem of practice, designed and conducted by a collaborative team of researchers and practitioners to give power to both practice utility and scientific contribution.
Case study: Evidence-informed counseling decision support application (ablescore)
Purpose and background
In 2016, a state VR agency engaged in a transformation project to improve the employment rate for persons with disabilities. To this end, the agency’s executive leadership team (ELT) chartered and sponsored an integrated, cross-functional team of experienced counselors and managers and engaged the services of a consulting firm led by a practitioner-scholar to assist the agency in the transformation project. The first phase of the project involved a detailed situation analysis to identify and prioritize opportunities for improvement in agency mission, vision, strategy, organization, client satisfaction, processes, decision-making, and policy. Further, the analysis aimed to understand the cultural, social and political contexts in which rehabilitation practice was embedded, and to capture the voice of key stakeholders – clients, counselors, providers, and agency executives and staff. The outcome of the analysis phase included strengths of the agency, problems faced by counselors in their practice setting, organizational capacity to address the problems, and cultural and behavioral barriers that must be overcome. The scope of the analysis included both internal and external perspectives, mixing qualitative (interviews and focus groups) and quantitative (analysis of historical performance data and survey data) methods of analysis to yield a more holistic view of problems faced by counselors and clients (the problem of practice) and the agency’s capacity and readiness for behavior change. From the internal analysis, the assessment included a review of the vision, mission, strategy, performance objectives, culture, organization structure, information flows, policies, and core processes that affect employment rate, and how decisions were made and by whom. From the external analysis, the assessment captured the voices of clients and service providers, as well as the voices of researchers and policymakers by reviewing extant literature and policy briefs.
One of the strategically important issues that emerged from the analysis was the lack of decision support tool to help counselors make informed decisions when designing and implementing the IPE to maximize employment outcome at closure. In the next section, we discuss the development of an evidence-informed prognostic tool to help counselors make informed rehabilitation decisions.
The problem of practice
Decision-making in VR counseling is multifaceted and complex, often requiring careful consideration of multiple prognostic factors before arriving at a course of action suitable for employment for a person with a disability (client). The problem of individualizing vocational rehabilitation for employment is related to the heterogeneity in disabilities, services, service delivery factors, and demographic characteristics (Gonzalez et al., 2011; Lusk, 2018; Weweiorski & Fabian, 2004). Although there are VR services that have been associated with 90-day employment outcomes (Dutta et al., 2008; Evensen et al., 2017; Glynn & Schaller, 2017; Huang et al., 2013; Rumrill et al., 2017), the prognostic problem faced by counselors and clients when designing the Individualized Plan for Employment (IPE) is knowing which of the fifty-seven or more VR services are most effective for a particular person with one or more of the known 19 disability classes and a complex set of demographic characteristics such as age, gender, race/ethnicity, marital status, educational level, work status, criminal history, alcohol abuse, drug abuse, receipt of disability income, etc. Despite this broad variability in VR service effects and demographics on employment outcomes, prognostic tools to support decision-making during the design and implementation of the IPE are not well established in rehabilitation practice.
Even the most experienced and knowledgeable counselors can’t possibly commit to memory all the knowledge about VR therapies they need to select from for each client situation. Even if they did have access to the massive amounts of knowledge needed to compare and select the best therapies to achieve the right outcomes for all the disabilities they encounter, they would still need time and expertise to analyze that information and integrate it with the client’s own personal characteristics. With increasing demand for VR services and counselors overwhelmed with increasing caseloads, this kind of in-depth prognosis and decision-making is beyond the scope of a counselor’s work. There are no evidence-based decision support tools we know of to assist counselors and clients to design the most effective IPEs based on significant predictors of 90-day employment outcome. In medicine, decision support tools such as predictive analytics or prognostic models are commonly used in predicting infections, determining the probability of disease, assisting a physician with a diagnosis, as well as predicting future wellness (Castaneda et al., 2015).
For the rehabilitation counselor to be effective in helping clients achieve successful employment outcomes, systematic effort must be undertaken to provide evidence-based tools to help counselors make informed decisions when selecting therapies and service delivery factors to be included in the IPE. In the absence of such decision support tools, counselors are forced to make ad hoc decisions about which therapies and various service delivery factors work for each individual, risking delays and adverse employment outcomes for those who ultimately complete the program. The counselor can use estimates of prognosis as a guide for ordering additional diagnostic tests and selecting appropriate VR therapies to develop a new Individualized Plan for Employment (IPE) or change an existing one. Utilizing such effective tool to aid counseling decision-making, can result in a better-informed decision today about outcomes tomorrow. It also reduces the inherent bias in ad hoc decision making.
Unfortunately, this prognostic challenge is longstanding. It has been over 5 decades since Gordon Paul (1967) proposed the challenge as the central research question in vocational rehabilitation: What treatment, by whom, is most effective for this individual with that specific problem, and under which set of circumstances? With only 18.7 percent of persons with disabilities employed in 2017 (U.S. Bureau of Labor Statistics, 2017) compared to 65.5 percent for those without a disability, an accurate decision support tool should improve the odds of employment and allow persons with disabilities to live independent lives. Thus, we summarize the problem of practice—the real-world problem faced by counselors and clients in their context-specific practice setting as follows: What prognostic tool can counselors and clients use to predict employment outcomes for eligible persons with disabilities who receive VR services from a state agency? To the best of our knowledge, this is the first comprehensive investigation of the predictors of employment and the construction of a practical decision support tool for counselors and clients.
Proposed evidence-informed solution: Ablescore
The purpose of this case study was to develop and validate an evidence-informed prognostic application (app) to assist counselors and clients to quantify the probability of employment outcomes across the full spectrum of VR therapies, key service delivery factors and personal characteristics. To this end, we began with the problem of practice and then we reviewed extant literature on prognosis and prediction research in vocational rehabilitation to frame the research question (RQ) and identify known predictors of employment outcomes. Though informed by theory, the RQ is based on the experience of difficulties and opportunities as encountered by VR counselors and their clients in their situated practice setting. Then we built a multivariate logistic regression model to isolate independent factors that significantly predict successful 90-day employment at closure. Next, we validated the model on a randomized test dataset to evaluate its predictive performance in terms of overall prediction accuracy, calibration quality, discriminative ability and practice usefulness. Afterward, we used the validated predictive model to construct an evidence-informed and practice-oriented prognostic tool we call AbleScore. In the final section, we discussed the implications of this tool for rehabilitation decision-making, application with persons with disabilities in other state agencies and future directions of prognostic or prediction research in VR practice.
Prognosis and prediction research in vocational rehabilitation
Prediction research, which aims to predict future events or outcomes based on patterns within a set of variables, has become increasingly prevalent in vocational rehabilitation research (Bellini, 1995; Bolton, et al., 2000; Capella-McDonnall, 2005; Chan et al., 2006; Dutta et al., 2008; Evensen et al., 2017; Glynn & Schaller, 2017; Hayward & Schmidt-Davis, 2003; Huang et al., 2013; Lusk, 2018; Rosenthal et al., 2003; Rumrill et al., 2017). These studies have related individual characteristics and certain rehabilitation services to employment outcomes using aggregate data from the RSA-911 database. However, these studies present three challenges, which tend to weaken their utility in rehabilitation counseling practice.
First, they lack evidence of validation and predictive performance, such as measures of calibration, discriminative ability, and predictive accuracy. Validation refers to the performance of the model when tested with clients from a similar population of a randomized sample (internal validation) and samples originated from a new and different population from the development model (external validation). Calibration describes how closely the predicted probabilities agree numerically with the observed outcomes. Discrimination refers to the ability of a model to correctly distinguish between 2 classes of outcomes—employed or not employed. Predictive accuracy refers to the model’s usefulness in counseling practice. These conceptual, methodological and performance problems continue to impede replication and comparison between studies.
Second, most of these studies, from which general models are derived, used cross-sectional designs and aggregate RSA-911 data. Because there is evidence (Chan, Wang, Muller, & Fitzgerald, 2011) suggesting significant variation in state VR agency outcomes that can be attributed to differences at the individual level, program execution, contextual and economic characteristics of the state, and differences in performance of the VR agencies themselves, studies using aggregate RSA-911 population data may not adequately account for these differences and thus make the outcome of these studies less useful at the state agency level to counsel individuals. These models are, therefore, susceptible to the “ecological fallacy” biases that may occur when an observed relationship between aggregated variables differ from the association at an individual level. For these reasons, the majority of models derived from aggregate national data are less useful in making individual client-centered counseling decisions.
Third, a majority of the studies were specific to a particular type of disability group. For example, individuals with substance abuse disorder (Lusk, 2018), schizophrenia (Evensen et al., 2017), learning disabilities (Rumrill et al., 2017), attention-deficit/hyperactivity disorder (Glynn & Schaller, 2017), orthopedic disabilities (Fong et al., 2006), the blind and visually impaired (Capella-McDonnall, 2005), and adults with cerebral palsy (Huang et al., 2013). Only Dutta and colleagues (2008) presented the most comprehensive analysis and summary of factors and characteristics associated with employment outcome, but in three discrete disability groups. None of these studies has examined comprehensively the effects of predictors on employment across all disability classes in a single state VR agency, where evidence can readily be interpreted and easily applied in practice without the aggregation problem.
Recognizing the dearth of evidence-based decision support tools in vocational rehabilitation counseling, this study addressed five interrelated and practice-oriented research questions to solve the problem of practice: What individual characteristics, VR service provisions, and service delivery factors predict an individual’s employment outcome at closure? How well do the predictors explain employment outcome at closure? How accurate is the prediction? What is the usefulness of the prediction tool in counseling practice? How can counselors use the prediction tool in making decisions in their practice setting?
Development and validation of the prognostic models
Data and statistical analyses
Data for this study were extracted from a large archival database of client case records in a state vocational rehabilitation agency. The data was recorded by counselors at various stages in the rehabilitation process and contains demographic characteristics including racial/ethnic identity, gender, age at entry, marital status, type and severity of disability, receipt of public benefits such as disability income, educational attainment, types of VR services received, service delivery attributes, other pertinent personal history such as drug and alcohol abuse, criminal record, and the 90-day employment outcome information during a 20-year period from fiscal 1996 to 2016.
The state agency data we used grouped people with disabilities broadly into the three major disability groups established by the RSA (Reporting Manual for the Case Service Report (RSA-911)) for analysis and reporting purposes: sensory/communicative (e.g., visual impairment/blindness and hearing impairment/deafness), physical (e.g., arthritis, spinal cord injury), and mental impairments (e.g., depression, schizophrenia, and learning disabilities). This standard grouping has been used in several influential, refereed VR publications (e.g., Dutta et al., 2008) in the Journal of Occupational Rehabilitation. Table 1 summarizes the VR program services we identified as potential predictors of 90-day employment outcome at closure. VR services were carefully defined using RSA and state agency published criteria.
Summary of VR program services
Summary of VR program services
Data was entered into the state agency database by certified rehabilitation counselors and was checked for errors by two data analysts and counselors who were knowledgeable about the database. Elements of this large dataset are furnished annually to the RSA by the state agency.
The sample frame included 53,629 clients for all disability groups whose cases were closed as either employed or not. Clients included in the model met the following criteria: (1) were eligible for vocational rehabilitation, (2) received at least one VR service, and (3) closed with or without employment. A major strength of this study is the quality, size and the representativeness of the sample and the comprehensiveness of the VR service provision. Table 2 summarizes the variables.
Description of variables
1. We note that the state agency data we used grouped people with disabilities broadly into the three major disability groups established by the RSA (Reporting Manual for the Case Service Report (RSA-911)) for analysis and reporting purposes: sensory/communicative (e.g., visual impairment/blindness and hearing impairment/deafness), physical (e.g., arthritis, spinal cord injury), and mental impairments (e.g., depression, schizophrenia, and learning disabilities). This standard grouping has been used in several influential, refereed VR publications (e.g., Dutta et al., 2008) in the Journal of Occupational Rehabilitation). 2. We dichotomized some variables (e.g., marital status) since including all separate values for each category did not add value to the predictiveness of the model. Dichotomizing it instead resulted in a more parsimonious model with just as much explanatory value.
The data were randomly divided into two equal samples: a development sample (n = 26,814) used in model development, and a validation sample (n = 26,815) used in model validation. These two samples are nearly identical across all variables. Such an outcome is to be expected with such large sample sizes.
Model development
We derived our prediction model with competitive employment as the outcome. Competitive employment defined in the RSA-911 manual represents employment for at least 90 days in an integrated setting, self-employment, or employment in a state-managed Business Enterprise Program (BEP) for which a person is compensated at or above the minimum wage. The employment outcome was coded “1” for successful employment and “0” for clients who were not working after completing their planned vocational rehabilitation program. Candidate predictors included those that had been reported to be prognostic and predictive, including client demographic characteristics, type and severity of disability, receipt of disability benefits, VR service provision history, and service delivery attributes. The initial set of 37 predictor variables included 12 client characteristics, 23 rehabilitation program services, and 2 service delivery variables. Thirty predictor variables were binary and four were multinomial (education, race, severity and type of disability). However, there were three continuous predictors including age, length of stay in program (time from entry to closure) and the number of counselors assigned to the client during rehabilitation. The model was developed using multivariable logistic regression1
Multivariable logistic regression predicts probabilities not causation. We chose to assume linear relationships (implied by logistic regression’s generalized linear model) for the ease of creating a prediction calculator and for ease of interpretation. As one of the key points of this paper was to provide a product for practice to solve a problem of practice, a more precise but more complicated model would likely not be as useful as a simpler although modestly less precise tool. Of note, Euroscore (http://www.euroscore.org/calc.html) and STS Score (http://riskcalc.sts.org/stswebriskcalc/#/calculate) are well-known and highly regarded calculators for cardiology and thoracic surgery which use a similar linear approach and analytic model.
To examine the performance and goodness of fit of the model, we evaluated measures of overall performance, calibration and discrimination. Overall performance was evaluated using predictive accuracy, Nagelkerke R2 and Brier score statistics. Predictive accuracy assessed how well the model predicted the likelihood of an outcome for an individual client. The Nagelkerke R2 quantified the percentage of the outcome variable (90-day employment) explained by predictors in the model. The Brier score quantified differences between actual outcomes and their predicted probabilities, that is, the mean square error (Steyerberg et al., 2010). The Brier score ranges from 0 to 0.25, values close to 0 indicate a useful model and values close to 0.25 a non-informative or worthless model (Steyerberg et al., 2010).
Calibration (goodness of fit) refers to the agreement between observed outcomes and prediction (Steyerberg et al., 2010). As recommended by Steyerberg et al., we used the calibration plot (Cox, 1958) to graphically assess model goodness-of-fit. The calibration plot is characterized by an intercept α, which indicates the extent that predictions are consistently too low or too high (‘calibration-in-the-large’), and a calibration slope β, which should be 1 and intercept α which should be 0 (Steyerberg et al., 2014; Cox, 1958), indicating good calibration and thus, model goodness of fit. The commonly used Hosmer–Lemeshow test produced statistically significant lack of fit due to the large sample size in our study. The Hosmer–Lemeshow test tends to fail even for good models when sample size is greater than 25,000 (Yu et al., 2017). For these reasons, we did not rely on the Hosmer–Lemeshow test of goodness of fit.
Discrimination refers to the ability of the model to discriminate between employed and not employed clients at closure and was determined from the area under the curve (AUC) of the Receiver Operator Characteristic (Royston et al., 2009). The ROC curve is a plot of the true positive rate (sensitivity) versus the false positive rate (1-specificity) evaluated at an optimal cutoff point for the predicted probability. A useless predictive model, such as a coin flip, would generate an AUC of 0.5. When the AUC is 1.0, the model discriminates outcomes perfectly. Therefore, a good AUC statistic is closer to 1.0.
Evaluation of model’s practice usefulness
Statistics such as sensitivity, specificity and area under the curve, while useful in assessing model performance, do not tell us whether the model would do more good than harm if used in counseling decision making. To address the question about the value of the model in terms of counseling decision making, we used a decision-analytic approach proposed by Vickers and Elkin (2006) to quantify its clinical usefulness based on the following premises: (1) prognostic tool is only valuable to the extent that it improves clinical decision making, and (2) the value of clinical decision making depends on the benefits and harms of these decisions.
A simple calculation for evaluating the usefulness of prognostic tools in clinical practice is the Net Benefit analysis. We used net benefit analysis to evaluate the potential clinical consequences of using our model in counseling decision making. Net benefit attempts to quantify potential harms and benefits of classification error – false positive and false negative classifications. The performance of a prediction model was assessed by Steyerberg et al. (2010) following Peirce (1884). In calculating error rates, we classified clients as positive when their predicted probability of the employment exceeds 0.515 (the optimal cut-off – the value that maximized sensitivity and specificity on the ROC curve) (Lalkhen & McCluskey, 2008) and as negative otherwise. This implies approximately an equal weighting of false-positive and false-negative classifications. The basic interpretation of a decision curve is that the model with the highest net benefit at a particular threshold probability has the highest clinical value. As reported later in the article, we also conducted a Use Case Evaluation involving experienced practitioners to ascertain the potential clinical benefit and harm of the practice product derived from the model.
Results
The estimates of beta coefficients (β), odds ratios (OR)2
The odds ratios and beta coefficients both estimate the effect size of the predictor on the dichotomous outcome variable; the latter one being the natural logarithm of the former one (Ialongo, C., 2016).
Multivariable logistic regression allows us to study the simultaneous effect of multiple predictor variables on a dichotomous outcome such as whether a client will be employed or not at closure.
Type and severity of disability impacted the chance of employment at closure. For example, from our archival dataset of 53,629 clients, a person with mental disability was 27% less likely to be employed than a person with sensory-communicative disabilities. Similarly, a person with physical disability was 30% less likely to be employed than a person with sensory-communicative disabilities. A person with the most severe disability was 35% less likely to be employed than one with a less significant disability. The employment status of client at the time of application was significantly associated with employment at closure. Clients who had prior work experience before entry were 2.3 times more likely to be employed at closure than those who were not (OR = 2.3, CI = 2.19–2.48).
Employment-related training and services had the largest impact on employment. This result is consistent with the rehabilitation literature that shows that employment-related services have a strong impact on employment outcomes. For example, clients who received occupational vocational training were 2 times more likely to be employed at closure than those without (OR = 2.0; 95% CI = 1.85–2.16), 3.7 times for on-the-job training (OR = 3.69; 95% CI = 2.54–5.36), 3.5 times for miscellaneous training (OR = 3.5; 95% CI = 3.24–3.81), 2.4 times for job placement service (OR = 2.4; 95% CI = 2.22–2.59), 3 times for job supports (OR = 3; 95% CI = 2.79–3.30), 1.5 times for community work adjustment training (OR = 1.5; 95% CI = 1.43–1.61), and 1.3 times for disability-related skills training (OR = 1.3; 95% CI = 1.00–1.53). Unfortunately, very few clients received these employment services that could have improved their chances of employment. For example, as pointed out earlier, only 0.4% of clients received on-the-job training, 1.2% received disability-related skills training, and 10.3% received job supports and job placement.
Higher education services were positive contributors to 90-day employment. Several enabling technologies and services such as rehabilitation technology, vehicle modification, transportation and other services increased the chance of employment for those who received them. Interestingly, we found that several VR services that are commonly provided to clients were associated with significantly reduced chances of employment at closure. Assessment and diagnosis & treatment were negatively related to employment at closure. Clients who received assessment were 13% less likely to be employed (OR = 0.87; CI = 0.83–0.91), and 17% for those receiving diagnostic and treatment (OR = 0.83; CI = 0.79–0.86). Further and importantly, receipt of disability benefits reduced the chance of finding employment. Clients who received benefits were 50% less likely to be employed than those who did not (OR = 0.50; CI = 0.48–0.53). Some commonly utilized services were also found to be non-significant (p > 0.05) predictors of employment at closure: facility-based work adjustment training, job readiness training, job search and interpreter services.
Full Model: Variables showing prognostic significance
1The high p-values are expected given the large sample size used in the development model.
The predictive accuracy, discriminative ability, and calibration quality of the development model was assessed prior to testing in the validation sample. The overall predictive accuracy of the model was good. The development model correctly classified employment outcome for 72% of clients compared to 54% in the null model. The Nagelkerke R2 indicated that predictors in the development model explained 29.3% of the variance in 90-day employment outcome, which indicates strong effect (Cohen, 1988). The discriminative ability of the model was evaluated by the ROC curve, showing an AUC of 0.78 (SE = 0.003), indicating a strong effect size (Rice & Harris, 2005). The model demonstrated good calibration (calibration slope = 1; intercept = 0.00). We summarized the performance of the development model in Table 4.
Development model performance measure
Development model performance measure
The development model was validated in the randomized validation dataset (n = 26,815) using the bootstrap technique. The results confirmed that the model performed well in its predictive accuracy, calibration and discrimination. Figure 1 represent the ROC curves for the development, validation and full samples.

AUC/ROC Curve for (a) Development, (b) Validation, (c) Full models.
Further, the models demonstrated good calibration, indicating good goodness-of-fit and good agreement between observed and predicted employment outcomes. The calibration plots are presented in Fig. 2.

Calibration plots: (a) Development, (b) Validation and (c) Full models.
The highest net benefit at a particular threshold probability has the highest clinical value (Vickers, 2008). As shown in Table 5, the net benefit value of the models were nearly identical: 0.26 for the development, 0.25 for the validation and 0.26 for the full model.
Model validation and performance comparison
Employed outcome: cutoff > = 0.515. Unemployed outcome: cutoff <0.515. Net Benefit cut-off = 0.515.
Productizing – development of product for practice
Product development and use case evaluation were the final steps in applying the research evidence to build a user-friendly prognostic tool to assist counselor in making informed rehabilitation decisions. We used the significant β-coefficients (p < 0.05) estimated in the full population (n = 53,629) to develop a prognostic calculator we call AbleScore. For a given client, AbleScore predicts a probability of employment according to the logistic regression equation with the following formula:
Predicted probability (AbleScore) = 1/(1+e–α), where α=β0+β1x1+β2x2+…+βnxn
In this formula, X1 to Xn are the values of the independent predictors included in the final prognostic model, β1 to βn are their corresponding regression coefficients and b0 is a constant provided in Table 3. For binary predictors in the model, Xi = 1 if variable is present and 0 if it is absent.
We conducted product design workshops to specify key design elements and online user interface criteria. We then incorporated the design criteria to build the prognostic tool. Finally, as reported in the Use Case Evaluation section, the product was evaluated in practice by expert counselors with many years in vocational rehabilitation. The evaluators (counselors) found the tool useful in augmenting their judgment and an effective device to motivate and engage clients in the rehabilitation process, thus indicating practical validity and potential counseling usefulness. The tool has two versions—a mobile device application and a browser version for use on non-mobile devices. A screenshot of AbleScore is shown in Fig. 3.

AbleScore Display: www.ablescore.com.
The Use Case Evaluation (UCE) method was employed as a valuable means for usability (usefulness and ease of use) evaluation. The UCE method was deployed in a facilitated full-day workshop and involved several key steps: (1) purposive selection of 10 evaluators – best-in-class counselors who were selected by the agency’s executive leadership team as subject matter experts; (2) development of 5 randomized and anonymized use cases derived from recently closed cases not included in the application development and validation samples; (3) explanation of use cases – evaluators received as much information as possible about each case; (4) description of the purpose and demonstration of the functionality and features of the application; (5) counselor self-scoring of the use cases to establish a baseline for evaluation – before evaluating the tool, each counselor, based on experience, predicted the probability of employment for each use case; (6) participant evaluation of the application’s features, functionality and ease of use. The facilitator encouraged the counselors to continuously think out loud while using the application – that is, simply verbalizing their thoughts, concerns and benefits as they move through the application; (7) application-scoring of the use cases; (8) comparative evaluation – the baseline expert self-scores and the application scores were compared to the actual outcome of each use case; and (9) finally, documentation of results and participant feedback – accuracy, usefulness, ease of use, and opportunities for improvement.
The results and feedback comments revealed key benefits and concerns of the application: The application was judged as novel and useful as a supplement to counselor’s expertise in new IPE design and in modifying existing ones. For example, the expert counselors commented (in writing): “I see how this can help to streamline the IPE and make it more effective...but to me, the real value is using the tool to engage my clients in building their IPE and motivate them to be accountable for their own rehabilitation...the motivation will come from knowing the potential outcome of the VR program before receiving services...this tool has potential to be a game changer.” “The fact that it correctly predicted employment outcome in 4 out of 5 cases is truly impressive...it did a better job than us [evaluators] with only 2 of 5 cases predicted correctly.” “The tool focuses action on evidence and away from rule of thumb, folklore and traditional VR approaches that fail to help clients find the jobs they need to live independent lives.” The app has potential to “significantly reduce rehabilitation cost-per-client-served by increasing IPE quality, client ownership, service delivery quality and reducing the time it takes to rehabilitate clients.” The app will “significantly reduce dropout cost (dropout cost is currently $3 million per year) by 35% – $1.05 million in potential savings which can be redeployed to serve more [clients] with disabilities.” All the evaluators commented positively on the ease of use of the application – the app is driven by information that is readily available in the counselor’s office and the click-and-select functionality and portability as a mobile app were highly valued. For example: One counselor said: “I find the tool very easy to use… there isn’t a lot of data entry and the data I need are easily available in my case files.” Another counselor commented: “What I like the most is that it generates a single number which one can easily interpret as a “yes” or “no” employment at closure.” However, the evaluators were concerned that the app could be used by inexperienced counselors as the sole tool for selecting VR services. Some of the counselors’ comments include: “I’m concerned that in the hands of inexperienced counselors, the tool may become a panacea...and that may have some negative unintended consequences.” “I hope that the tool will not be used to discriminate clients based on who can and cannot get a job.” “I’m concerned that the tool may replace counselors in the future and I’m concerned about their own careers.”
In response to the above concerns, other counselors pointed out:
“The tool is not a discriminatory tool. [Counselors] are accountable to our clients to provide the best level of service possible ... In no way does this remove the need for counselors.” “The app should not be used as the sole source of information to determine a clients’ employability.” “This is just one more tool in the counselors’ toolkit and never intended to replace clinical experience and judgement.” “The key to avoiding these issues is training… counselors will be trained on the objective of the app as a support tool, how it was developed, how to use it to select services and how to interpret the result correctly.”
These issues were further discussed by the product development team and used to improve the application and guide the development of a training document.
Discussion
In this paper, we responded to the growing call (Leahy et al., 2014) for innovative approaches to supplement existing models to make research more relevant to problems faced by vocational rehabilitation practitioners and their clients (persons with disabilities). We proposed a novel approach that addressed the weaknesses we identified in the KT framework, including the dichotomies between knowing and doing, knower and doer and knowledge and practice tool. We note that unlike KT, where the knowledge to be translated is often irrelevant to the real-world problems faced by counselors and clients, the new approach starts with a situated problem of practice and ends with the construction of an evidence-informed product for practice (PfP) to solve the problem of practice. We achieved this goal in the case study by (1) intentionally aligning the interests and goals of the counselors, clients and practitioner-scholars, (2) balancing practice relevance and research rigor, and (3) applying rigorous research methods to derive the evidence used in productizing and producing the PfP. Importantly, the new research approach reverses the current order of research-to-practice to a new order of practice-to-research, where a problem of practice (PoP) is the motivation for research, and research evidence is both a product of and a solution to a problem of practice.
We applied the new approach, which we call Practice Motivated Research (PMR) in a case study in a state vocational rehabilitation agency to demonstrate how practice-informed research can meet the requirements for academic rigor and practice relevance, revealing the recursive interdependence between practitioners and researchers in the production of clinically useful prognostic tool we call AbleScore. The practice output generated by using AbleScore is intuitive, empirically derived, practically meaningful, and easily utilized by counselors in their context-specific practice setting. To use the prognostic app, counselors need only collect demographic data for the client and potential VR services and then enter this data into the app on their mobile device or laptop. The computations are automatic, and the outputs are easy to interpret. Used as intended, AbleScore has the potential to augment counseling decisions, improve the effectiveness of IPEs and increase the chances of successful employment outcomes for persons with disabilities. To our knowledge, this is the first study that attempts to develop a prognostic tool in a state vocational rehabilitation agency. As such, the paper should foster additional discussion.
Results from this study support and further inform existing literature regarding the association between employment outcomes among people with disabilities and their demographic characteristics, vocational rehabilitation services and service delivery attributes. Accordingly, most of the factors used in our prognostic tool have been previously identified by other studies as prognostic factors associated with 90-day employment outcome. We examined comprehensively the effects of 38 predictors on employment across all disability classes in a single state VR agency, where evidence can readily be interpreted and easily applied in practice. The evidence used in building the app demonstrated good discrimination, calibration, accuracy and goodness of fit.
The prognostic app has several distinctive strengths compared with prior predictive research in VR. First, it consists of clearly defined, routinely available predictors and does not require any additional data not routinely available in the design of an IPE. Second, the accuracy and applicability of the tool are supported by its derivation and validation using historic data from 53,629 independent clients from the same state population. Third, our study samples are comprehensive and represent all disability groups and several demographic characteristics and service delivery attributes never before included in other studies. Fourth, because we used data from a single state agency, we mitigated the aggregation problem associated with using multi-state, aggregated data from the RSA, which presents a major problem of interpretation at the individual level in state agencies. Finally, AbleScore is a practice-ready tool, not just another intellectual contribution to knowledge that needs translation before it can be applied in counseling practice. AbleScore can be freely accessed at http://ablescore.com/.
Implications
We found that several predictors differed in their strength of association with employment outcome according to vocational rehabilitation services received, suggesting the importance of optimal IPE design. We have developed a methodologically valid, simple, and accurate model that has been shown4
See Application Use Case Evaluation sub-section of the Product Development section.
Perhaps of particular note are the strongest positive and negative predictors of successful rehabilitation: Criminal Record, Receipt of Disability Income, Co-occurrence of Alcohol and Drug abuse, Most Severe Disability ... Vehicle Modification, On-The-Job Training, Graduate University Training, Job Supports, Maintenance, Job Placement, Work Experience at Entry, Rehabilitation Technology and Occupational Vocational Training.
The state-federal vocational rehabilitation program emphasizes working with individuals with the most severe disabilities; i.e., those disabilities that significantly limit one or more life functions (Rehabilitation Services Administration, 1995). However, we provide evidence for careful consideration of other potentially co-occurring client factors, such as criminal record, substance abuse and receipt of disability income. VR practitioners may benefit from emphasizing the positive predictors, and at least being aware of the negative predictors, in order to better serve their clients. Again, careful judgement must be used; predictor variables in logistic models predict probabilities, not correlations. Thus, variables with positive or negative effects from our model should not be interpreted as causal, and should therefore be applied with caution and with an open mind to other possibly unmeasured variables.
As an illustration of direct application, below we apply the AbleScore app to three different client scenarios: difficult, average, and favorable. We then show how to apply some of our research findings to help guide these cases.
In these three scenarios, if we add on-the-job training to the client’s IPE, the AbleScores jump significantly to 45%, 65%, and 94% probability of employment at closure. What this means is that others (within our population of 53,629) with IPEs like these scenarios, but with the addition of just on-the-job training, have had much higher success rates of gaining employment. These illustrative examples show us that, for those in less than favorable scenarios, including employment-enabling training like on-the-job training in the client’s IPE has the potential to make a notable difference in their rehabilitation outcome.
Clearly, this tool should not be over-relied upon, or used as the sole determinant of guidance decisions and counseling. This application should be only one tool among many to help the counselor make the best decisions for her clients. Assuming that the predictions of the application will be accurate for any specific case is naïve at best, but potentially harmful. The value of the application is to indicate the potential difficulty of a particular case, as well as highlight the best possible uses of resources to lead to positive employment outcomes.
Because our model was developed with data from a single state agency, and validated only with clients from the same population, further prospective validation in independent client cohorts is needed to strengthen the generalizability of the model. To make the findings more generalizable, a sample of clients and counselors from multiple states (ideally all states) and different offices should be used. Additionally, the predictive relevance of the model is limited to the current sample, and should not be generalized to future samples in other states. Nevertheless, a comprehensive dataset such as the one we used for this study, which included all clients for the past 20 years from an entire state, does provide strong evidence of patterns and trends at least for that state.
Future research could also evaluate different ways, or formats, for presenting the model to counselors; their use in counseling practice; and whether ultimately, they have any impact on the management and outcomes of clients in other states. With prediction in mind, agencies may be inclined to collect more data on clients in the future. A more predictive model could be created if more variables were collected. Particularly missing from the current set of variables are local labor market factors and social factors, such as social support from friends and family, regular communication with the VR agent, etc. For our study, these variables, and many others, were not available because they had not been recorded by the state agencies who archived this information.
Limitations
Several limitations apply to our study. Foremost, the analysis with a single state VR agency data represents essentially a case study (albeit with a very large sample size). Although the structure of the dataset may be representative of other state agency datasets, the prognostic tool may not apply in a different population where the client characteristics and service protocols are different from those of the development environment. Therefore, this model should not be implemented outside of this state before its validity has been tested in the local client population.
Next, heterogeneity of client populations between practice locations or counties or rural locations versus urban locations was not taken into account in our analysis. We also did not consider differences in counselors or in offices. However, this may be an important consideration in situated counseling practice. This oversight may also inadvertently overlook critical contextual variables that may alter the results we found with the set of variables we did have available to work with. Nevertheless, extant research indicates there may be no substantial differences between these contextual levels in VR practice (Hayward and Schmidt-Davis 2005).
Also, we note that the state agency data we used grouped people with disabilities broadly into the three major disability groups established by the RSA (Reporting Manual for the Case Service Report (RSA-911)) for analysis and reporting purposes: sensory/communicative (e.g., visual impairment/blindness and hearing impairment/deafness), physical (e.g., arthritis, spinal cord injury), and mental impairments (e.g., depression, schizophrenia, and learning disabilities). There may be potential loss of information due to the aggregation.
Further, while we used logistic regression which is common in the healthcare prediction studies, other analytic approaches such as those leveraging machine learning, may provide better predictions, but more difficult to interpret. We encourage comparative studies of other methods that look promising from our analysis. Although we eliminated non-significant factors, performed internal validation and sensitivity analysis to avoid overfitting, testing the performance of the model in an independent population sample will be necessary in the future. We encourage future studies to do so. Of note, using the prognostic model generated with historical data to predict future events can be problematic. Though the model may fit historical data well, future circumstances may not be sufficiently similar to allow for reliable predictions from the model. Even when the internal validation bootstrap resampling technique was applied to correct for overfitting and optimism, the accuracy of our prognostic tool can be substantially lower (or higher) in new clients compared to the accuracy found in the clients in the development population. The differences may arise due to client mix. For example, the development sample may have more or fewer clients of a particular age group or with a particular disability and severity group than future clients. Vergouwe et al. (2002) described the effect as “case mix” differences. Case mix disparity can affect both discrimination and calibration of the predictive model. As such, the calculator must be calibrated regularly with new dataset as client population changes in the future.
We also note that the net-benefits analysis is still evaluated at the aggregate level however. Thus, implementation of the application in individual cases may vary widely. The usefulness of a model such as this one, which is trained on the individual circumstances of 53,629 client cases, is that it shows the trends and patterns that we can expect to be useful and informative for the majority of cases. This does not mean that every aspect of the model will be useful for every counseling case – and that is not its intent. The intent of the application is to provide guidance. For example, if a counselor enters her client’s information into the application and the results show that cases like this client’s have resulted in employment only 53% of the time, she may need to provide more guidance and energy on this case than for one who is more similar to cases resulting in employment 94% of the time. Additionally, this application is only one tool among many that the counselor will use to help her make good guidance decisions for any particular case.
Finally, our prognostic tool, like many in medicine (e.g., Euroscore (http://www.euroscore.org/calc.html) and STS Score (http://riskcalc.sts.org/stswebriskcalc/#/calculate), which are well-known and highly regarded calculators for cardiology and thoracic surgery), are nomothetically-derived but idiographically-applied by practitioners who understand the particular world and characteristics of the individual. In the present research, we started with cohorts of individuals from a large and defined population and used logistic regression to identify the presence or absence of an employment outcome given a set of predictors in the data – a nomothetic perspective. Then the tool will be applied idiographically by counselors to validate the nomothetic principles at the individual level. However, nomothetic models are probabilistic and usually incomplete when applied to an individual and must therefore be validated in the field before general application.
Despite these limitations, the application promises to inform clients and counselors of the probability of successful employment outcomes for a group of clients with similar profiles and proposed IPEs. This information is useful and should form part of the basis for informed choice through which the client and counselor can decide on an appropriate rehabilitation path.
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Acknowledgments
This study was a part of a consulting engagement involving several counselors, clients, providers and agency leaders. We acknowledge all the 46 individuals who participated in interviews, workshops and focus groups to provide information, validate models and findings and test the prognostic tool. The involvement of the agency’s executive leadership team (ELT) in providing direction, sponsorship, project resources, and role modeling behavior change was invaluable. Without the ELT this project would not have been possible.
