Abstract
The Rey-Osterrieth Complex Figure Test (RCFT) permits quantifying diverse cognitive abilities, including executive function (EF). We evaluated the psychometric properties of a scoring procedure for the RCFT, the Savage Organizational Scoring System (SOSS), that awards points for drawing the largest structural elements of the figures as continuous wholes. This was a two-phase study: first, we conducted a systematic literature search for studies using the SOSS, and aggregated previously published data for healthy controls to create a normative database; second, we observed performances from veterans evaluated for traumatic brain injury (TBI), examining the reliability of their SOSS scores, the SOSS correlations with two EF measures and the participants’ self-reported cognitive functioning, and then compared their mean scores to normative expectations. Across our literature-derived normative database, the aggregated mean SOSS score was 4.12 (SD = 1.72), which was marginally higher than that of our veteran participants evaluated for TBI, 3.72 (SD = 1.79). The SOSS had modest internal consistency (α = .59). Unlike the criterion EF measures, the SOSS was not significantly related to self-reported cognitive functioning. The SOSS shared a small, significant correlation with Trails B and Shipley Abstraction; but RCFT Copy scores were more strongly related to these tests, and the SOSS added no significant incremental predictive value beyond the RCFT Copy score. However, SOSS scores did predict RCFT Recall beyond RCFT Copy scores. We conclude that the SOSS has modest reliability and is predictive of RCFT Recall scores, but it is not strongly correlated with other EF measures, and it is only minimally affected by mild TBI.
Keywords
Introduction
Executive functions (EFs) are higher-order cognitive functions that operate in novel, conflicting, or complex tasks (Godefroy, 2003). EFs are arguably the most difficult cognitive abilities to assess during neuropsychological evaluations, given “the paradoxical need [for examiners] to structure a situation in which patients can show whether and how well they can make structure for themselves” (Lezak et al., 2012, p. 667). Few performance-based EF measures provide an opportunity for examinees to demonstrate spontaneous and effective planning, organizing, and self-monitoring of behavior during a task. This project examines one such EF measure, the Savage Organizational Scoring System (SOSS; Savage et al., 1999) for the Rey-Osterrieth Complex Figure Test (RCFT; Meyers & Meyers, 1995; Osterrieth, 1944) with two broad objectives: (a) to provide normative SOSS data, and (b) to explore SOSS construct validity among adults evaluated for traumatic brain injury (TBI), a condition for which EF complaints are frequent.
The RCFT is a frequently used measure of visuo-construction and visual memory that enables assessors to observe executive-type behavior by analyzing the way participants approach the task. During this test, participants are presented a model of a complex geometric figure and instructed to copy the image as accurately as possible (see Figure 1). The RCFT is composed of many elements, some of which provide coherent structure and are integral to appreciating the gestalt of the figure. The accuracy with which examinees reconstruct the figure can indirectly reflect a systemic versus haphazard approach. Some elements provide structural foundations for other components, making them relatively more important than the more ornamental aspects of the figure. Because successful copying involves perceiving a hierarchy within the overall complexity of the figure, an aspect of strategy is introduced.

Illustrated Recreation of Rey-Osterrieth Complex Figure and Savage Organizational Scoring System Elements. Note. Complex figure illustration based on Meyers and Meyers (1995). SOSS element illustrations based on Savage et al. (1999).
Examiners have an opportunity to assess EF skills involved in RCFT performance. In principle, a clearly perceived and organized drawing is more likely to resemble the original figure compared to one that is constructed impulsively or with poor planning. Scores pertaining to accuracy, therefore, provide an indirect measure of EF skills related to organization and strategic thinking, such as planning and task monitoring. Indeed, RCFT scores tend to correlate with EF measures (e.g., Spencer & Johnson-Greene, 2009). Although these accuracy scores generally reflect EFs, it is possible, even perhaps common, for examinees to arrive at fairly accurate drawings despite ill-conceived approaches. For example, drawing from left to right or in a piecemeal fashion irrespective of structure, while laborious, could result in a perfect accuracy score. Assessing the process, and not merely the result, of copying is therefore more directly pertinent to EFs.
Examiners often evaluate copying strategy through qualitative observation. Although of possible clinical relevance, informal observational methods are subjective and lack psychometric grounding. From our review of the literature, at least 17 scoring systems quantify a copying approach to the RCFT, with popular methods including the Boston Qualitative Scoring System (BQSS; Stern et al., 1999), the Complex Figure Organizational Quality Scoring System (OQSS; Hamby et al., 1993), the Bennett-Levy Copying Strategy Scoring System (Bennett-Levy, 1984), the Perceptual Cluster Index (Shorr et al., 1992), and the SOSS (Savage et al., 1999). Comprehensive reviews of the available scoring systems can be found in the works of Strauss et al. (2006), Hubley (2006), and Knight (2003). Although these scoring systems are often characterized as “qualitative” approaches, they have operationalized and quantified variables of interest; thus, the term “process approach” is more accurate (Hubley, 2006). Scoring systems vary in comprehensiveness, clarity of scoring instructions, time required to complete scoring, and availability of normative and other psychometric data. These approaches also vary with respect to operationalizing “organizational strategy,” and common variables include fragmentation, sequencing, asymmetry, perseveration, planning, distortions, and confabulations (Knight, 2003).
The RCFT is often used to measure visual recall by having examinees reproduce the figure from memory following one or more delays. Well-organized approaches to copying tend to aid examinees when redrawing the figure from memory (Bennett-Levy, 1984; Hamby et al., 1993; Shorr et al, 1992; Troyer & Wishart, 1997). Because organizing enhances later recall, it is informative to consider each examinee’s approach when interpreting copy and recall scores.
Although some research has demonstrated that the organizational approach to the RCFT correlates with EF measures, evidence for convergent validity for many of these scoring systems is relatively sparse and inconclusive. For example, Troyer and Wishart (1997) found that only five of ten organizational scoring systems correlated with another test of EF, with the magnitude of significant correlations ranging from .31 to .40. Somerville et al. (2000) found that BQSS sub-scores correlated with performances on several EF measures, with the magnitude of significant correlations ranging from .22 to .49. Moreover, some of the available scoring systems are useful for differentiating healthy individuals from those with conditions like TBI, HIV, Attention Deficit Hyperactivity Disorder (ADHD), and psychiatric disorders (Hamby et al., 1993; Stern et al., 1999).
Knight et al. (2003) conducted a survey of clinicians and researchers (93% self-identifying as psychologists) on their usage of the RCFT. Most respondents indicated assessing accuracy, and fewer reported using a process approach. Despite available scoring methods, 11% reported using neither an accuracy nor a process scoring system, apparently assessing by visual inspection alone. The choice not to quantify accuracy or process variables may reflect the time-intensive or the complex nature of many of the available systems, and this may be particularly true of the process approaches. In fact, some respondents of the Knight et al.’s survey indicated as such in their additional comments, as for example, “[the RCFT] seems difficult to score” (p. 53).
Of the available process approaches, the BQSS is the most comprehensive system and has many favorable characteristics, including normative data from healthy controls. However, this scoring system requires a substantial amount of time to score, with estimates ranging from 5-20 minutes per protocol (Folbrecht et al., 1999; Stern et al., 1999). The SOSS is a briefer and less complicated alternative (Savage et al., 1999) for which administrators assign points to five organizational elements (i.e., the large rectangle, diagonal cross, vertical midline, horizontal midline, and large triangle on the right) based on whether the element is drawn as a whole, versus a fragmented unit. Drawing the large rectangle is worth two points, reflecting its importance to the fundamental organization of the figure, while the other elements receive one point each, with a total score ranging from 0 to 6. “Unfragmented” elements are defined as having been drawn as either (a) a continuous line without interruption, or (b) consecutive lines until element completion. The order in which the elements are drawn does not affect the score.
Research generally indicates that SOSS scores are lower among individuals with clinical conditions compared to healthy adults. SOSS scores effectively differentiate between healthy controls and clinical groups for many conditions associated with EF impairments, including obsessive-compulsive disorder (OCD; Buhlmann et al., 2006; Deckersbach et al., 2000; Kuelz et al., 2006; Mataix-Cols et al., 2003; Moritz et al., 2005; Penadés et al., 2005; Savage et al., 1999, 2000; Segalàs et al., 2011), hoarding (Hartl et al., 2004), eating disorders (Lopez et al., 2008; Sherman et al., 2006), and mood disorders (Behnken et al., 2010; Deckersbach et al., 2004). However, finding SOSS differences between groups is not universal, with several studies having observed no significant differences in SOSS scores between healthy controls and samples of individuals with eating disorders (Aloi et al., 2015; Lindner et al., 2013), OCD, and trichotillomania (Bohne et al., 2005).
Consistent with findings involving other qualitative systems, the organizational approach to the RCFT, as measured by the SOSS, predicts free recall performance, is moderately correlated with EF measures, and is usually higher in undiagnosed participants relative to those with clinical diagnoses. Thus, the SOSS shows promise as an efficient supplementary EF measure, giving the RCFT added value. However, several psychometric characteristics of the SOSS approach remain unknown as they have yet to be reported. Most SOSS research to date pertains to psychiatric samples and not to individuals with neurodegenerative disease or acquired brain injury. Although researchers report excellent inter-rater reliability for SOSS scoring (r = .94; Deckersbach et al., 2000), internal consistency and examinee normative data have yet to be published. Furthermore, only one study in our review (Hartl et al., 2004) reported correlations between this measure and examinees’ self-reported cognitive problems.
Mild traumatic brain injury (mTBI) is often associated with cognitive complaints but not necessarily with cognitive deficits on objective testing (Drag et al., 2012; Spencer et al., 2010). A potential explanation for this disparity is that the tests that are often used to quantify cognitive deficits in mTBI are not sensitive or sufficiently related to the types of EF problems experienced by these individuals. Therefore, it is clinically and theoretically useful to know whether the skills measured by the SOSS are compromised among individuals with mTBI and if SOSS scores correlate with cognitive complaints.
In this paper, we report a two-phase study to clarify the utility of the SOSS, by both aggregating normative data from previously published research and exploring select psychometric properties of the SOSS in a new clinical sample. In Phase 1, we assembled norms for performances by healthy controls by exploring and aggregating data from healthy controls in the SOSS literature. In Phase 2, we evaluated the reliability and convergent validity with criterion EF measures of the SOSS in a sample of United States military veterans who reported cognitive complaints in the context of a self-reported history of TBI. We also examined correlations between these veterans’ SOSS scores and their self-reported cognitive symptoms on an inventory of post-concussive complaints.
Phase 1: Method
Selection of Studies
We conducted a literature search to identify studies that used the SOSS. We searched PsychInfo and PubMed using the terms (a) Rey or Osterrieth or ROCF or RCFT, (b) Savage or Deckersbach, and (c) organization or structure or planning or executive. We also searched PubMed for the terms “Rey” and “Savage.” This search produced 33 articles within the PsychInfo database, and repeating the search in PubMed yielded another 10 articles. Additionally, we used the Institute for Scientific Information Web of Science to find all articles that cited Savage et al. (1999) and contained the term “Rey,” generating 85 articles. We identified 128 peer-reviewed papers across search methods. Next, we screened out duplicates, studies that did not use the SOSS as part of a neuropsychological battery, and studies that examined a predominantly pediatric sample (age <17; 27 retained). Lastly, we excluded studies that did not include demographic data for age, education, and SOSS performance for a healthy control group. One additional study was removed because of nonstandard administration (e.g., examiners offered instructions while examinees copied). After these exclusions, a total of 17 articles were retained for analyses (see Figure 2 for a schematic of the manuscript selection process). This literature search was conducted independently by two of the authors who met and resolved any discrepancies. We also included unpublished data from two samples of undergraduate students (see Kisser et al., 2012, for a description of the sample). Nine of the studies retained from the literature search included samples from outside of the United States; these were analyzed both separately and collectively with the U.S. samples due to possible cultural variance.

Schematic for Savage Organizational Scoring System Article Selection. Note. Schematic template obtained from Moher et al. (2009).
Procedure for Combining Sample Statistics
We calculated aggregate means and standard deviations for these participants’ SOSS score, age, and education using the methods described in Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al., 2021) section 6.5.2.10 (combining groups). This method uses sample size as weighting but produces a standard deviation for the combined group as if it were not divided. The main advantage of this method is that it does not underestimate the desired SD, which is a drawback of using the usual pooled method that produces a within-subgroup SD. The total number of participants (N) was calculated by summing the number of participants across each study’s sample (n). The aggregate means were obtained by multiplying each sample mean by the number of participants in the sample (n) and dividing the sum of these values by the total number of participants (N). To calculate the aggregate SDs, we began by temporarily removing Bessel’s correction by squaring each sample SD, multiplying these values by the degrees of freedom in the sample, and dividing these results by the number of participants in each sample (n). Next, these values were added to the squared deviations obtained from subtracting the aggregate M from the sample M, and the resulting sums were multiplied by the sample size (n). The aggregate variance was calculated by summing the product of these values and the number of participants in each sample (n), and then reapplying Bessel’s correction by dividing the sum by the total degrees of freedom (N-1). The square root of this value became the aggregate SD.
Phase 1: Results
Table 1 contains aggregate Ms and SDs for healthy participants’ SOSS scores, age, education, and sex across all studies meeting selection criteria, as well as by geographic location (i.e., U.S. sample or non-U.S. sample). Briefly, the overall average score for the SOSS was 4.12 (SD = 1.72), participants’ average age was 34.7 years (SD = 15.3), and participants’ average years of education was 12.4 (SD = 4.1). Males comprised approximately 34% of the normative sample.
Sample and Aggregate Statistics for SOSS Score, Age, Education, and Sex, Stratified by Location (United States and non-United States).
aCasewise deletion used when necessary to calculate a total or region-specific aggregate statistic.
Phase 2: Method
Participants
We examined the performances of 452 veterans who completed outpatient neuropsychological assessments at a Midwestern VA hospital. All participants screened positive for a history of TBI by self-report. Approximately half of these participants were used in previous studies (Drag et al., 2012; Spencer et al., 2010). Participants were excluded if (a) their neuropsychological evaluations were related to financial compensation, (b) they completed fewer than four performance validity tests (PVTs), or (c) they failed two or more PVTs. The final sample included 282 participants (97% male; 89% right-handed; Mage = 32.65, SD = 11.43, range = 20–78; Myears of education = 12.85, SD = 1.47, range = 7–18). Upon interview, 14.9% of participants reported no instance of an event that met criteria for a TBI, 77.3% reported an injury consistent with criteria for mTBI, and 7.4% reported an injury that met criteria for a moderate-to-severe TBI. TBI severity classifications were based on criteria from the American Congress of Rehabilitation Medicine (Kay et al., 1993). This study was approved by the VA hospital Institutional Review Board and it was given exemption of requirements for informed consent, given that it was a retrospective study that included de-identified data obtained during standard clinical care.
Measures
Rey-Osterrieth Complex Figure Test (RCFT)
The RCFT copy and recall trials were administered and scored for accuracy according to Meyers and Meyers (1995); their system does not include a scoring system for organizational approach to the copy. Participants’ approach to the copy was scored using the SOSS. Each participant was instructed to copy the figure and, three minutes later, and without warning, re-draw it from memory. Copy and recall drawings were scored by evaluating 18 sub-components of the figure with respect to drawing accuracy and correct placement (1 point each). Total points ranged from 0 to 36 for the copy and recall trials, which will be referred to as “Copy Accuracy” and “Immediate Recall,” respectively. Higher Copy Accuracy scores indicate detailed organization that is congruent in appearance to the target figure and higher Immediate Recall scores suggest stronger recall of the target figure after the three-minute delay. This scoring system has demonstrated high interrater reliability (r > .90; Meyers & Meyers, 1995). Prior studies have indicated the SOSS (Savage et al., 1999) has high interrater reliability (r > .90; Savage et al., 2000; Deckersbach et al., 2000, 2004).
Trail-Making Test – Part B
Although both parts A and B of the Trail-Making Test were administered, we analyzed only part B (Trails B; Reitan, 1955) as a criterion EF measure. Trails B involves drawing lines connecting numbers and letters in order, alternating between the two sequences. Trails B has a significant EF component as it requires both cognitive flexibility and ability to maintain mental set.
Shipley Institute of Living Scales
The Shipley Institute of Living Scales (Shipley, 1940; Zachary, 2000) includes two components, one pertaining to vocabulary and one pertaining to abstract thinking. We used the latter as another criterion EF measure due to its EF demands of attention, working memory, conceptualization and problem-solving. The Abstraction scale involves solving 20 brief letter- or number-based logical puzzles requiring problem solving and the ability to shift strategies between items. The puzzles require planning and working memory, allowing the scale to be conceptualized as an EF measure. Internal consistency for this scale is .89 (Zachary, 2000).
Neurobehavioral Symptom Inventory (NSI)
The NSI is a 22-item self-report measure of cognitive, somatic, and affective symptoms frequently reported following concussion (Cicerone & Kalmar, 1995). On this measure, respondents rate the severity of each symptom on a five-point scale that ranges from zero (none) to four (very severe). For the purposes of this study, we only examined responses to four items addressing cognitive symptoms. These items pertain to poor concentration, forgetfulness, difficulty with decision-making, and problems with organization and completing tasks.
Phase 2: Results
For our clinical sample, descriptive data for demographics, SOSS, and criterion measures are shown in Table 2. SOSS scores ranged from 0 through 6 and percentages for individual scores were as follows: 0 = 1.4%, 1 = 14.2%, 2 = 12.4%, 3 = 17.4%, 4 = 14.9%, 5 = 16.7%, and 6 = 23%. The internal consistency for the SOSS score was Cronbach’s alpha = .59. SOSS scores were not significantly correlated with age (r = .01, p = .89) or years of formal education (r = .07, p = .28). Because we did not have a control sample that we generated internally within this study, we compared our participants’ performances on criterion EF measures to those of relevant normative samples for the EF measures (which are cited below), presenting scores as standard scores (SS) with a M of 100 and a SD of 15. For individual cases, SS between 90 and 109 are generally considered to be in the average range (e.g., Wechsler, 2008). Relative to age-matched peers from Meyers and Meyers’ (1995) normative data, our TBI sample performed near the lower threshold of the expected ranges for the RCFT copy score (SS > 85) and recall score (SS = 91). As a group, the M SOSS score for our TBI sample was significantly lower than the aggregated M SOSS from norms we gathered in Phase 1 of this two-phase study (t (281) = −3.73, p < .001, CI [−.61, −.19]). Extrapolating from the norms we generated in Phase 1, scores of 0, 1, 2, 3, 4, 5, and 6 correspond to standard scores of 64, 73, 82, 90, 99, 108, and 116. Based on other normative samples, the TBI sample performed within expectations for same-aged peers on Trails B (SS = 95; Mitrushina, 2005) and Shipley Abstraction (SS = 105; Zachary, 2000). King et al. (2012) presented data for the NSI Cognitive subscale for veterans of similar demographics as those in our sample. In terms of the severity of self-reported cognitive symptoms, the scores from our sample were nearly identical (z score = 0.01) as individuals with histories of TBI in the sample of King et al. (2012). Participants in our sample reported more cognitive symptoms (z score = 0.69) compared to individuals without histories of TBI investigated by King et al. (2012). The NSI Cognitive subscale had a Cronbach’s alpha of 0.89 in our sample.
Means, Standard Deviations, and Ranges for RCFT, SOSS, and Criterion Measure Scores.
Note. RCFT = Rey-Osterrieth Complex Figure Test; SOSS = Savage Organization Scoring System; NSI = Neurobehavioral Symptom Inventory; SS = standard score based on normative comparison.
aCompared to Study 1 normative group.
bMeyers and Meyers (1995).
Criterion Validity
Table 3 shows the Spearman’s rho correlations between SOSS, traditional RCFT measures (Copy Accuracy and Immediate Recall), and criterion EF measures. The SOSS shared a statistically significant, though small, correlation with the EF tests, whereby higher SOSS scores corresponded with better EF test scores. SOSS scores were not correlated with self-reported cognitive problems on the NSI. Copy Accuracy and Immediate Recall on the RCFT correlated more strongly with EF measures than did the SOSS. Both EF criterion measures and both traditional RCFT measures correlated significantly with subjective cognitive complaints, although these correlations were small. The two performance-based EF measures correlated moderately with each other.
Spearman Correlations Between SOSS Scores, RCFT Scores, Criterion EF Measures, and Self-Reported Cognitive Symptoms.
Note. RCFT = Rey-Osterrieth Complex Figure Test; SOSS = Savage Organization Scoring System; NSI = Neurobehavioral Symptom Inventory.
*p <.05. **p <.01.
Incremental Validity
We examined the incremental value of SOSS by using separate hierarchical regressions to predict RCFT Immediate Recall and the performance-based EF criterion variables beyond RCFT Copy Accuracy. We did not examine incremental benefits for predicting subjective complaints because SOSS was not correlated with the NSI Cognitive subscale. As presented in Table 3, SOSS scores were significantly correlated with both Copy Accuracy and Immediate Recall scores and these correlations were of moderate strength. Therefore, Copy Accuracy and SOSS scores were entered into a hierarchical multiple regression as predictors for Immediate Recall. Copy Accuracy was entered first and explained 17.3% of the variance (R2 = .17, F(1,280) = 58.38, p < .001) in Immediate Recall. Inclusion of SOSS scores in the second step significantly increased explanatory power (R2 = .243, F(2,279) = 44.779, p < .001), accounting for an additional 7% (R2Δ = .07, F(1,279) = 25.97, p < .001) of the variance in Immediate Recall. SOSS was not incrementally predictive of either EF test beyond Copy Accuracy.
Discussion
The SOSS is an adjunctive process score for quantifying examinees’ organizational approach to drawing a complex geometric figure (i.e., the RCFT). SOSS scores are straightforward, quick to calculate, and measure a strategic aspect of EF that is both difficult to quantify and often overlooked during cognitive screening. In principle, this system presents as a helpful supplement to standard RCFT scoring by providing a method for measuring the degree to which examinees adopt a strategic approach to engaging in a complex and novel task. Many users of the RCFT do not quantify metacognitive processes during the copy trial, as some scoring protocols require prohibitive time and effort. As such, this project aimed to provide comparative norms for healthy controls (Phase 1) and thereby improve the psychometric understanding of the SOSS among clinical patients (Phase 2). Our findings were mixed, as the SOSS had limited reliability and modest construct validity as an EF measure in the context of TBI. Therefore, further validation research and perhaps some modification of the SOSS is needed.
The SOSS has several features that lend itself well to clinical assessment. The samples across prior studies that comprised our aggregated norms contained scores that were remarkably consistent across studies. These scores did not appear to change according to the average ages of the participants, which were 21.3 to 50.4 across studies (aggregate Mage = 34.7, SD = 15.3; the actual age range across studies was not possible to ascertain because some studies only reported the M age of the sample). Demographic effects on SOSS scores were also absent in our large sample of veterans assessed for TBI, suggesting that variables such as age and level of education have little bearing on performance. Across groups, the SOSS had an aggregated M of 4.12 and a SD of 1.72, and in the TBI sample the measure had marginal internal consistency. The clinical sample’s reliability coefficient of 0.59 falls short of standards generally considered acceptable for comprehensive neuropsychological assessments, but they are arguably adequate for cognitive screening (Mitrushina, 2009). In clinical settings, the measure may be best used descriptively rather than as definitive evidence of broad EF skills. When measures are constrained by poor reliability, making fine distinctions between contiguous scores becomes problematic. We therefore recommend that, as a conservative precaution, only scores at the extremes be interpreted, and then only descriptively. For example, in our TBI sample, perfect scores of “6” were obtained by fewer than one-quarter of the participants, suggesting an appropriate description of “well-organized” copying for these participants. Conversely, scores of 0 and 1 were obtained by fewer than one-sixth of these participants, indicating “poor organization” or “poor planning” for this sub-group. It is both tempting and intuitively appealing to hypothesize further about EF skills, but more validation research is needed before examiners can attach clinical meaning to SOSS scores. Confident descriptions are premature until the construct validity of the SOSS is thoroughly investigated using measures more closely associated with these aspects of executive functioning than the measures in this study.
Results from the TBI participants coincided with some previous research in which SOSS scores were marginally lower in clinical groups. These differences were subtle, however, and indicated that EFs captured by the SOSS were not meaningfully impaired among those with mTBI. Presenting normative data and establishing group differences is a necessary initial step towards validating the SOSS as a useful clinical instrument. However, the clinical utility for the SOSS remains unclear. The slight diminution in the SOSS score in this clinical sample indicates that the SOSS may detect something useful about EFs in TBI, but the difference in group means was probably not clinically meaningful. These results coincided with some previous research where SOSS scores were marginally lower in clinical groups. This lack of a clear divergence raises questions about the underlying construct captured by the SOSS, since the SOSS also failed to correlate strongly with processing speed or abstract problem solving. These results are disappointing, especially in light of the more robust correlations shared between RCFT copying and the EF measures in this sample. Furthermore, several RCFT scoring systems that are generally more elaborate than the SOSS have demonstrated discriminant validity when comparing TBI versus healthy control samples. For example, BQSS performance correctly identified 81% of TBI patients and 82% of healthy controls (Stern et al., 1999). In the present TBI sample, the significant but small correlations between SOSS and EF measures were no longer statistically significant when correcting for RCFT copying. This finding suggested that the SOSS had no incremental value in predicting EF scores, provided copying scores were already available.
Similar to the modest correlations with performance-based EF tests, SOSS scores did not correlate with self-reported cognitive problems. Lack of a strong correlation between self-report and objective testing is common in clinical neuropsychological research. But it is notable that in this sample there were small, yet significant, correlations between the performance-based EF measures and self-reported cognitive problems.
Regardless of whether the SOSS was related to criteria external to the RCFT, it was predictive of RCFT recall. That is, SOSS predicted RCFT recall, even after statistically controlling for RCFT copying accuracy. Test takers with high SOSS scores perhaps perceived the figure more clearly, viewing it as a collection of large, continuous components, versus those with lower scores who might have been burdened by perceiving many smaller parts. Clearly, appreciating the larger structural elements as wholes conferred advantages, likely by having simplified the figure and thereby aiding encoding. This finding converges with prior research (Bennett-Levy, 1984; Hamby et al., 1993; Shorr et al, 1992; Troyer & Wishart, 1997) indicating that SOSS performance is an important consideration when interpreting RCFT recall scores.
Limitations, and Directions for Further Research
Several limitations in our study can be addressed in subsequent research. First, the criterion EF measures in this study pertained to abstract reasoning and processing speed but not necessarily planning and organization that were the EFs assessed by the SOSS. Additional research is needed examining the degree to which SOSS scores correlate with measures assessing metacognitive skills and planning and organization. A similar critique can be made regarding the nonspecific cognitive complaints used in this study versus self-reported difficulties more specific to planning, organization, and impulsivity. For example, the Behavior Rating Inventory of Executive Functioning – Adult Version (BRIEF-A; Roth et al., 2005) contains subscales more pertinent to the skills that may be associated with higher SOSS scores.
A second limitation of this study relates to the participant sample. Comparing sex differences was not possible given the small proportion of female participants, and our sample included a truncated age range among participants (which is also a limitation of the SOSS literature in general). Specifically, individuals over the age of 50 were underrepresented in our study and in the SOSS literature. Although we did not observe age effects in our sample of individuals being assessed for TBI, EF skills often decline after the sixth or seventh decade of life, an effect that is apparent on other structured EF tasks (e.g., Heaton et al., 1993, 2004), and it remains to be seen whether SOSS scores decline with advanced age. This should be examined in future research. Relatedly, the literature examining the SOSS has not yet been applied to other conditions for which EF impairments are a common and primary feature. Research should examine the correspondence between the SOSS and conditions such as ADHD, frontotemporal dementia, and Huntington’s disease. Interpreting comparisons across all groups will be aided by the aggregated normative data we have gathered and/or our demographically-matched control groups.
A final critique of SOSS pertains to its scoring criteria. The scale is designed to be efficient and portable and so the scoring criteria are not as elaborate as other systems (such as the BQSS; Stern et al., 1999). One modification to scoring that is conceptually consistent with its intended purpose but also maintains the spirit of brevity is to award points for drawing the five structural elements first. In its current form the SOSS gives full credit for drawing these elements as whole units, however, it does not emphasize drawing them early in the copying process which allows them to provide examinees with structure in recalling and drawing other details in the overall figure. For example, for the large rectangle to be useful as a structural element, it should be drawn as one of the first elements of the figure. It is possible, although unlikely that an examinee might draw this element later or even save it for last and still be awarded full credit for drawing it with four consecutive lines. This temporal modification would not increase the RCFT administration burden and would yield a score that is conceptually more consistent with EF skills than is the current SOSS.
Finally, while our project established reliability estimates and preliminary norms, the SOSS still has significant psychometric limitations and only general clinical applicability with aggregated, literature-based norms. Scores are lower among those with some clinical diagnoses but are poorly correlated with other EF measures and with self-reported cognitive complaints, and more research may be needed to reveal the strength of the SOSS to measure EFs. Furthermore, additional normative research is needed to expand the utility of the SOSS because literature-based norms are limited in clinical use compared to norms based on standardized administration procedures that feature demographic stratification (i.e., norms based on age, sex and gender, education, etc.) of healthy control participants. This scoring system has promise as a measure of organization and planning, but additional validation is needed before it is appropriate for routine clinical use.
Conclusions
To our knowledge, this two-phase study was the first to present aggregated literature-based normative data for the SOSS system for the RCFT. In Phase 1, we provided a comprehensive review of the available literature pertaining to the SOSS and aggregated normative data across 19 samples of healthy participants involved in well conducted neuropsychological assessment and standardized administrations of the RCFT. In Phase 2, we applied the SOSS to examine EFs in a large sample of veterans who were evaluated for TBI. Again, this is the first study to (a) investigate the SOSS among individuals with TBI, a condition for which EF complaints are common; and (b) compare their SOSS-determined RCFT scores to those from a large normative data base, based on aggregated previously published research. We discovered significant psychometric limitations to the SOSS through this comparison, and those limitations raise cautions for its clinical use. Yet, there remains a need for further SOSS research in older populations, for other clinical groups, and with respect to possible SOSS correlates with other aspects of EF, not studied in this effort.
Footnotes
Acknowledgments
The authors acknowledge the support of Margaret T. Davis, PhD, Yale School of Medicine Department of Psychiatry, Yale University Department of Psychology.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The authors do not have any conflicts of interest to disclose. This work was authored as part of the contributor’s official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C 105, no copyright protection is available for such works under U.S. Law.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
