Abstract

Test Description
The Behavior Rating Inventory of Executive Function, Second Edition (BRIEF2; Gioia, Isquith, Guy, & Kenworthy, 2015), published by PARInc., is an updated individually administered rating scale of executive function (EF) for children and youth, aged 5 to 18 years. Primarily used in clinical, psychoeducational, and research settings, the BRIEF2 evaluates everyday behaviors associated with EF in home and educational environments.
The BRIEF2 is a Level S measure, meaning that it can be purchased, administered, and interpreted by individuals with a degree or license to practice in medicine, nursing, psychology, social work, occupational therapy, or other allied health professions (and also can be used by individuals who meet Level B or C qualifications; details of these qualifications are available from the test publisher’s website). Administration time is 5 min for the screening form or less than 10 min per comprehensive form; scoring takes approximately 15 min.
The BRIEF2 kit consists of the Professional Manual; Fast Guide (which provides targeted information about the measure’s composition, administration, and scoring); 25 of each of the Parent, Teacher, and Self-Report forms; and 25 of each of the Parent, Teacher, and Self-Report Scoring Summary/Profile forms. The manual is comprehensive and well structured, beginning with an overview of the measure and its history, and describes administration, scoring, and interpretation of the measure. Revision goals are outlined and evidence of the measure’s psychometric properties is provided.
Measure Development
The BRIEF2 was developed to enhance its utility with minimal modification from the original measure. No items were added to the primary scales or indexes; however, some items were modified and others were added to create a validity index called the Infrequency Scale. The remaining items were evaluated to determine content coverage, ease of understanding, consistency across forms, and representativeness of the theoretical and empirical models of EF.
Administration and Scoring
The BRIEF2 consists of Parent, Teacher, and Self-Report forms that can be administered either manually via paper forms or digitally via PARiConnect. The Parent and Teacher forms each consist of 63 items that contribute to nine factors: Inhibit, Self-Monitor, Shift, Emotional Control, Initiate, Working Memory, Plan/Organize, Task-Monitor, and Organization of Materials. The Self-Report form can be completed by children aged 11 to 18 and consists of 55 items that contribute to seven factors: Inhibit, Self-Monitor, Shift, Emotional Control, Task Completion, Working Memory, and Plan/Organize. There are also three validity scales (Inconsistency, Negativity, and Infrequency) comprised of items on each version of the measure that evaluate respondents’ response patterns. The primary scales combine to form three composite indexes: Behavior Regulation Index (BRI; comprised of the Inhibit and Self-Monitor domains), Emotional Recognition Index (ERI; comprised of the Shift and Emotional Control domains), and Cognitive Regulation Index (CRI; comprised of the Initiate, Working Memory, Plan/Organize, Task-Monitor, and Organization of Materials domains), and a unitary Global Executive Composite (GEC).
Administration and Scoring
The BRIEF2 is a questionnaire in which respondents record their answer via a Likert-type format with N (“Never”), S (“Sometimes”), or O (“Often”) that reflect the frequency to which the child being evaluated performs an indicated behavior. Scoring of the measure is completed either by tearing the perforated strips on the paper forms and completing the scoring sheet within the booklet or via PARiConnect which will score a digital administration and provide a summary of the results. Results are presented in T-score format (M = 50, SD = 10). Confidence intervals are also provided. The measure also provides a screening form for each respondent to identify individuals who may benefit from more comprehensive assessment.
Technical Adequacy
Test Standardization
The BRIEF2 standardization sample consisted of 3,603 children/youth who are typically developing (TD), 1,400 of which contributed to the Parent form, 1,400 to the Teacher form, and 803 to the Self-Report form (aged 11-18 only). The sample group was matched to the U.S. population based on gender, race/ethnicity, age, parent education, and geographical region. The TD sample was required to have no history of special education, psychotropic medication, or neurological disorders. In addition, teachers were required to know the student through daily contact for at least 1 month.
The sample also consisted of a clinical sample comprised of 5,295 participants (3,007 parents, 1,826 teachers, and 462 self-reporters). Clinical samples included Attention-Deficit/Hyperactivity Disorder Combined Presentation (ADHD-C), ADHD Predominantly Inattentive Presentation (ADHD-I), sluggish cognitive tempo, Autism Spectrum Disorder, learning disability (LD), comorbid ADHD and LD, anxiety, traumatic brain injury, epilepsy, neurofibromatosis, cancer, or diabetes. All diagnoses were based on Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; American Psychiatric Association, 2000) criteria.
Reliability
Internal consistency
Reliability coefficients assess the degree to which items within a scale measure the same construct. Coefficient alpha was calculated using data from each form’s normative sample. Findings indicate high internal consistency for all index scores in both the standardization and clinical samples. Parent forms revealed coefficients ranging from .76 to .97, with index and composite scores ranging from .90 to .97. Teacher forms revealed coefficients ranging from .88 to .98, with index and composite scores ranging from .94 to .98. Self-report forms revealed coefficients ranging from .71 to .97, with index and composite scores ranging from .84 to .97. Neither means nor standard deviations were reported.
Test–retest stability
Some participants completed two administrations approximately 3 weeks apart to determine temporal stability; correlations range from .67 to .92 (M = 0.79). Index and composite score correlations are all above .80 (BRI = .83, ERI = .82, CRI = .89, and GEC = .88). Teachers showed correlation coefficients ranging from .76 to .89 (M = 0.82), while index and composite scores were again all above .80 (BRI = .83, ERI = .89, CRI = .89, and GEC = .90). Finally, adolescents showed correlation coefficients ranging from .61 to .85 (M = 0.74), with some index and composite scores falling below .80 (BRI = .75, ERI = .77, CRI = .84, and GEC = .85). Standard deviations were not reported.
Interrater and inter-interviewer reliability
Interrater reliability assesses whether two raters provided similar scores. Interrater samples consisted of numerous pairings: parent–teacher, parent–self, teacher–self, parent–parent, and teacher–teacher. The manual notes that some of the interrater reliability scores are likely to be influenced by the nature of the pairing (i.e., between a teacher and a student versus a parent and a student). Correlation coefficients for the parent–teacher group had a mean of 0.64 for the TD sample (range = .55 to .72) and 0.34 for the clinical sample (range = .24 to .49). The parent–self pairing had a mean of 0.62 for the TD sample (range = .46 to .74) and 0.30 for the clinical sample (range = .20 to .36). Teacher–self had a mean of 0.49 for the TD sample (range = .37 to .62) and 0.16 for the clinical sample (range = .09 to .27). Parent–parent had a mean of 0.77 for the TD sample (range = .57 to .88) and 0.59 for the clinical sample (range = .44 to .67). Teacher–teacher group had a mean of 0.39 for the TD sample (range = .11 to .47) and 0.56 for the clinical sample (range = .42 to .70). Standard deviations were not reported.
Validity
Internal structure and content
The BRIEF2 characterizes EF as an umbrella construct (i.e., distinct but related abilities/behaviors). Internal structure of the BRIEF2 was measured by examining item-total correlations, intercorrelations, and a confirmatory factor analysis. Item-total correlations revealed moderate to strong membership for each scale. Correlation coefficients ranged from .44 to .77 for parents, .50 to .83 for teachers, and .44 to .74 for self-report. Intercorrelations determined whether theoretically related constructs were more correlated than dissimilar constructs (e.g., Working Memory and Planning/Organization as opposed to Emotional Control and Organization of Materials). Results for the TD sample indicated values ranging from .41 to .83 for parents, .46 to .88 for teachers, and .52 to .84 for self-report. Intercorrelations for the clinical sample ranged from .31 to .66 for parents; .24 to .79 for teachers; and .34 to .73 for self-report. Neither means nor standard deviations were reported for item-total correlations nor intercorrelations.
Previous studies have ascertained the goodness-of-fit of a three-factor model (BRI, ERI, and CRI) for the original BRIEF via exploratory and confirmatory factor analyses (Donders, DenBraber, & Vos, 2010; Gioia, Isquith, Retzlaff, & Epsy, 2002). This three-factor model is exhibited by the three index scores provided by both the BRIEF and BRIEF2. Gioia et al. (2002) examined how the eight individual parent/teacher scales would load on to a one-, two-, three-, and four-factor model; they reported that comparative fit index (CFI), standardized root mean square residual (SRMR), and root mean square error approximation (RMSEA) values favored the three-factor model. Furthermore, they posited that the three-factor model most closely aligned with Barkley’s (1997) description of EF, which describes inhibitory control, emotional regulation, and metacognition as representing distinct components. Thus, developers of the BRIEF2 conducted a confirmatory factor analysis examining a three-factor model to determine fit between the individual scales and the three indexes. Findings indicated an acceptable fit with the three-factor model, with CFI values ranging from .95 to .99, SRMR values ranging from .02 to .05, and RMSEA values ranging from .07 to .13. RMSEA values were slightly above the desired cutoff (.08; Hu & Bentler, 1998). Based on Gioia et al.’s (2010) previous study (which found the BRIEF to have RMSEA values of .11), this slight elevation was anticipated. During the Gioia et al. (2010) study, when the error of covariance among scales was estimated, RMSEA values decreased to .08. Consequently, the authors of the BRIEF2 determined the RMSEA values to represent acceptable fit.
Concurrent validity
Evidence for the validity of the BRIEF2 was gathered by comparing individual scales and summary indexes with other measures. Findings from these comparisons indicated moderate to strong relations between the BRIEF2 and the Child Behavior Checklist (CBCL; Achenbach, 1991), Behavior Assessment System for Children, Second Edition (BASC-2; Reynolds & Kamphaus, 2004), Conners Third Edition (Conners-3; Conners, 2008), and ADHD Rating Scale IV (ADHD-RS-IV; DuPaul, Power, Anastopoulos, & Reid, 1998). The BRIEF2 parent form demonstrated r values ranging from .48 to .67 on the ADHD-RS-IV, .33 to .45 on the CBCL, –.67 (adaptive skills) to .75 on the BASC-2 PRS, and .17 to .50 on the Conners 3-P(S). The BRIEF2 teacher form demonstrated r values ranging from .61 to .71 on the ADHD-RS-IV, .49 to .81 on the CBCL-TRF, –.63 (adaptive skills) to .80 on the BASC-2 TRS, and .36 to .61 on the Conners 3-T(S). Finally, the BRIEF2 self-report form demonstrated r values ranging from .39 to .57 on the CBCL-YSR, and –.41 (personal adjustment) to .75 on the BASC-2 SRP.
Special study groups
Individuals with various diagnoses were matched to TD peers based on age and gender. T-score elevation and mean T-score differences were compared. In most cases, score differences aligned with expected diagnostic differences. For instance, individuals with ADHD-C or ADHD-I received significantly higher T-scores on all scales, indexes, and composite scores (p < .001 to p < .05) with significant mean differences from the TD group ranging from 6.14 to 22.33. With the exception of the self-report form’s Emotional Control scale, these findings held true across all forms. It should be noted that significantly higher T-scores do not necessarily correlate with clinically significant T-scores (i.e., above 60).
Commentary and Recommendations
The BRIEF2 has several features that make it useful in the examination of EF. It is based on a strong previous measure with historical use in both research and clinical settings. Its scales, indexes, and composite scores mirror the umbrella construct of EF. Test standardization was conducted using a large and robust sample size and allows for differentiation between TD and clinical populations. It has three forms tailored to the respondent and can be administered either online or with a pencil and paper. The screening forms allow for quick determination of potential comprehensive assessment. The BRIEF2 scores have strong internal consistency and test–retest stability, and moderate to strong internal validity and concurrent validity. The full forms are completed in less than 10 min, thus promoting motivation for respondents to complete the form.
The BRIEF2 also contains several limitations to its utility. The interrater reliability of its scores is weak to moderate, indicating inherent differences among the raters. Clinicians may struggle to find convergence across forms/raters. As with many rating scales, the BRIEF2 offers a selection of behaviors and asks respondents to rate their frequency. Given the limited number of questions and frequency ratings on the measure, some behaviors may not be captured adequately. Finally, the BRIEF2 is reliant upon other observers rating the individual in question and so is susceptible to responder bias (potentially reflected in the measure’s variable interrater reliability scores).
Overall, the BRIEF2 provides a quick, effective, and efficient way to examine EF. Its construct breakdown, test standardization, and short completion time make it a useful measure for both clinical and research settings. Although the measure does contain several limitations inherent in most rating scales, it remains both useful and recommendable.
