Test Review: Barkley deficits in executive functioning scale (BDEFS)

Abstract

Test Description

General Description

The Barkley Deficits in Executive Functioning Scale (BDEFS), authored by Russell A. Barkley and published by Guilford in 2011, is an individually administered assessment tool that may be used to evaluate adults ages 18 to 81. The purpose of this measure is to screen those who may be experiencing executive functioning (EF) deficits in self-organization, self-restraint, self-motivation, self-regulation of emotion, and self-management to time. Although it is not meant to be a diagnostic instrument, the BDEFS can be administered in a clinical, research, or industrial-organizational setting as a time and cost-efficient means of identifying those with potential difficulties. The BDEFS is both theoretically and empirically based, and its conceptualization was well grounded in the literature on executive functioning.

The BDEFS consists of three versions that may be used to inform clinical judgment of EF deficits. A self-report version is completed by the patient, which inquires as to the frequency at which they have exhibited certain behaviors over the past 6 months. The other-report version contains the same items as the self-report version and should be completed by someone who knows the patient well enough to accurately comment on their daily functioning, such as a spouse or significant other. Although completion of the other-report form is not required, it may be helpful in assessing the validity of the patient’s answers. Items on the self- and other-report versions are measured using a 4-point Likert-type scale, ranging from Never or rarely (1) to Sometimes (2), Often (3), and Very often (4). If patients cannot complete the rating scale on their own, a clinical interview may be completed instead. This version asks whether certain behaviors have occurred “Often or very often” in the past 6 months and requires a “yes” or “no” answer.

The self- and other-report versions have both long and short forms, whereas the clinical interview follows the short form. The long form is the most comprehensive and contains 89 items, taking the average adult approximately 15 to 20 min to complete. The short form contains 20 items that require just 4 to 5 min to complete. The BDEFS should be administered by a trained professional with a background in psychology, whereas scoring and interpretation require graduate-level training in psychological measurement.

Individual items on the BDEFS are easy to read and comprehend. A rating scale format allows examinees to rate the frequency of specific behaviors over time, as opposed to the momentary assessment often used in cognitive tests that evaluate aspects of EF (such as attention or inhibition). This unique capacity is particularly important because clinicians evaluate patients’ level of daily impairment when contemplating diagnosis and treatment. In addition, the BDEFS may be photocopied for cost-efficient administration. Instructions are standardized for all versions to ensure accurate administration and reduce examiner error or bias.

Specific Description

Subscales

All forms of the BDEFS contain five subscales that may be used to assess EF deficits in specific areas.

Self-management to time: Items pertaining to this subscale relate to procrastination, concentration, forgetfulness, planning, and time management capabilities.

Self-organization/problem solving: This subscale addresses difficulty with order and sequencing, information processing accuracy and speed, learning, and problem-solving abilities.

Self-restraint: These items evaluate aspects of impulsivity, foresight and hindsight, frustration tolerance, and the ability to inhibit responses prior to considering their consequences.

Self-motivation: This area examines one’s ability to work toward long-term goals or rewards, put forth consistent effort, work without supervision, and exercise willpower.

Self-regulation of emotion: This subscale relates to emotional control, self-soothing, tendency toward emotional excitement or overreaction, and the ability to perceive events objectively.

Scoring System

Long form: To score the long form, one must tally the scores in each subscale and add them together to find the total score. In addition, items answered with a 3 or 4 (Often or Very often) may be tallied to obtain an Executive Functioning Symptom Count. Using the appropriate scoring sheet (determined by age and gender), percentile ranks may be computed from the raw scores. Norms and percentile ranks are only available for the self-report version. An ADHD-EF Index may also be computed by aggregating the scores on 11 items that are typically related to adult ADHD; however, this is intended only to determine one’s risk status for the disorder and not for diagnosis.

Short form: Scoring procedures for this form are similar to those of the long form, with the exception of the norms that are available. The short form self-report version contains norms for age only, whereas the other-report version does not contain any norms.

Clinical interview form: An EF Symptom Count may be calculated by counting the number of “Yes” responses circled on this form. Although it has no unique norms, the clinical interview form is highly correlated with the short form of the self-report version (.90+); therefore, an estimate of one’s ranking when compared to others of the same age may be obtained.

Interpretation

Several methods may be used to interpret the BDEFS. First, each subscale may be interpreted alone, with higher percentile rankings within a subscale representing greater EF deficits in that particular area. Higher percentiles generated from the total score suggest that the patient may suffer from general EF deficits. Second, subscale and total scores may be compared to the normative population. Scores corresponding to the highest quartile may be labeled and range from “marginal clinical significance” (76th to 84th percentile) to “markedly deficient or severe” (99th percentile or greater). Third, the self- and other-report versions may be compared and examined for disparities between them. If a disparity is considered to be clinically significant (as determined by examining Barkley’s Reliable Difference Index in the manual), potential reasons for this should be explored. Finally, a risk analysis may be conducted, which suggests that higher scores represent greater risk for EF difficulties.

Technical Adequacy

Test Construction and Item Analysis

Item development for the prototype BDEFS (P-BDEFS) was largely informed by EF deficits noted in literature, observations, and clinical descriptions of patients with ADHD or prefrontal cortex damage. The P-BDEFS contained 91 items and consisted of three versions (self-report, other-report, clinical interview). Items for the self- and other-report versions were formatted similarly to the current BDEFS. Following a principle-components factor analysis of the P-BDEFS, a modified 100-item self-report version was then administered to the normative sample. A factor analysis of these results led to the solidification of 5 subscales of at least 12 items each, all of which had loadings of at least 0.40 and accounted for at least 2.5% of the variance. Eleven items were eliminated for not meeting these criteria, with 89 items arranged by subscale remaining in the final version of the self-report long form (BDEFS-LF). The short form of the self-report version (BDEFS-SF) was then constructed, using the 4 highest loading items on each subscale and a total of 20 items.

Normative Sample

The normative sample consisted of 1,249 adults, aged 18 through 96 years. To ensure that the sample was nationally representative, demographic characteristics such as sex, educational attainment, regional distribution, racial and ethnic group, household income, marital status, and employment status were comparable to 2000 U.S. Census data.

Reliability

Barkley reported internal consistency and test–retest reliability measures for the BDEFS self-report forms. Interrater reliability was reported for the self- and other-report forms but analyzed the P-BDEFS rather than the final BDEFS. Given these differences, the adequacy of interrater reliability could not be evaluated.

Internal consistency

Internal consistency measures used Cronbach’s alpha and Pearson’s product–moment correlation (r). Cronbach’s alpha was high, ranging from .91 to .96 for all subscales on the self-report BDEFS-LF. The BDEFS-SF internal consistency was reported at an alpha of .92. Internal consistency for the other-report version was not reported in this manual; readers have been directed to previous studies assessing P-BDEFS. Pearson’s r correlations across subscales ranged from .55 to .80, and correlations between versions were also found to be high.

Test–retest reliability

Test–retest reliability was measured by randomly selecting 62 participants from the normative sample to complete the self-report form a second time, 2 to 3 weeks later. Scores on the follow-up test were comparable to the original scores, with correlations between each administration being satisfactory (r = .62 to .80, p < .001, across all indices).

Validity

Barkley explored the construct and criterion validity of the P-BDEFS and BDEFS but did not report on content validity. Overall, analyses indicated satisfactory validity, though replication with the BDEFS is necessary.

Construct validity

Construct validity of the BDEFS was based on Barkley’s unique definition of EF and established using the P-BDEFS. As such, there are challenges in determining evidence for convergent validity. Regardless, subscales of the P-BDEFS were correlated to various EF tests, such as the Conners CPT and the Stroop Color–Word Test. The correlations ranged from .04 to .41 and –.01 to –.31, with the direction of all correlations being theoretically appropriate. Although some correlations were statistically significant, these were in the moderate to weak range. To establish discriminate validity, Barkley examined the ability of the P-BDEFS to differentiate between subjects with and without ADHD. Analyses found that those with ADHD were more likely to score in the clinically significant range than those in the control group. Barkley also examined the overlap of the P-BDEFS with IQ and academic achievement. Only one subscale, Self-Organization/Problem-Solving, was significantly related to IQ (r = –.15, p = .007). Weak statistically significant correlations were found with academic achievement. Thus, Barkley determined that the P-BDEFS is not mistakenly measuring alternate constructs.

Criterion validity

Correlations from the normative sample and pilot studies were used to examine how scores on the BDEFS and P-BDEFS relate to concurrent outcomes, such as ADHD severity, arrests, and psychopathology. For example, the BDEFS was correlated to ADHD severity using an adult ADHD rating scale, which found statistically significant correlations at p < .001. The results of these and additional outcomes measured indicate that scores on the P-BDEFS and BDEFS are related to many concurrent outcomes.

Commentary and Recommendations

It is important to note that within the manual there is a lack of detail on test construction, specifically related to the development of the P-BDEFS items. Further information on their construction would provide greater assurance that items cover the full spectrum of EF. However, factor analysis results offer sufficient support for correspondence of items to the subscales. In addition, the Likert scale in most versions may misrepresent patients’ exact deficits, having response options of (1) Never or rarely to (4) Very often. These response options fail to offer the complete absence or continuous presence of symptoms. Nevertheless, the scores provided are likely sufficient for a general screening of daily EF deficits.

In strictly using the four highest loading items on each subscale to construct the short form, Barkley diminished the potential for item discrimination and score validity. An analysis of both the highest and lowest loading items would have revealed that three items on the self-report short form have the highest frequencies within their subscale, increasing the probability of score inflation. However, six of the items have the lowest reported frequencies within their subscale, which can help to mediate issues with item discrimination.

The absence of norms for the other-report and clinical interview versions of the BDEFS limits the practical use of them independently. It is arguable that the other-report form should always be administered with the self-report form and that inclusion of other-report norms is unnecessary. However, the clinical interview is to be used when the self-report version cannot be administered, which would provide no means to make normative comparisons.

Assessment of the interrater reliability of the BDEFS was based solely on the P-BDEFS, as was the internal consistency of the other-report form. Furthermore, the P-BDEFS was used to establish construct validity and to analyze particular populations for criterion validity. As the items included in the P-BDEFS do not entirely match the items within the current BDEFS, a reexamination of the reliability and validity measures is necessary to confirm results and make definitive conclusions.

A detailed evaluation of the BDEFS found the assessment to offer a unique and comprehensive means of identifying EF deficits in adults. The BDEFS is the only instrument to evaluate the type and extent of EF deficits present in daily activities over an extended period of time, as other measures of EF merely provide a momentary assessment of EF deficits. The availability of multiple versions and various lengths allows for effective and efficient administration across settings and using multiple perspectives. Barkley’s efforts to design a simplistic scoring process may also prove beneficial to examiners. Despite the aforementioned limitations, the BDEFS may be a valuable tool for both examiners and researchers who wish to evaluate the presence of EF deficits in adults.