Abstract
Spatial ability has been valued as a talent domain and as an assessment form that reduces cultural, linguistic, and socioeconomic status biases, yet little is known of the spatial ability of students in gifted programs compared with those in general education. Spatial ability is considered an important indicator of potential talent in the domains of science, technology, engineering, and mathematics (STEM). This study explored undergraduate students’ spatial ability, focusing on mental rotation, by investigating the relationships of three variables with performance on a spatial ability test, in terms of test scores and test completion time. A three-way analysis of variance revealed statistically significant main effects of gifted program participation, choice of academic major, and gender, suggesting that students who participated in a gifted program, who majored in a STEM discipline, or who were male outperformed their counterparts on a measure of spatial ability when the other conditions were equivalent. No interaction effects existed among the three variables, indicating that none of them functioned as a moderator of students’ performance on the spatial ability assessment. However, when spatial ability was considered as a mediating variable in a path model, gender had the largest total effect on the probability of students majoring in a STEM area. In addition, the more time students spent on the spatial ability test, the better they tended to perform, which is a finding inconsistent with current literature.
The Study of Mathematically Precocious Youth has reported spatial ability playing a critical role in predicting successful performance and developing expertise in science, technology, engineering, and mathematics (STEM) disciplines (Shea, Lubinski, & Benbow, 2001; Wai, Lubinski, & Benbow 2009; Webb, Lubinski, & Benbow, 2007). Historically, researchers in gifted education have recognized spatial ability in various ways, including (a) one of the talent/intelligence domains as grounded in the multiple intelligences theory (Gardner, 1993; Plucker, Callahan, & Tomchin, 1996; Reid, Udall, Romanoff, & Algozzine, 1999); (b) an indicator of performance in STEM disciplines (Cooper, 2000; Kell, Lubinski, Benbow, & Steiger, 2013); and (c) an alternative assessment of giftedness identification with less cultural, linguistic, and socioeconomic status biases than traditional ones (Chan, 2010; Lohman, 2005; Naglieri & Ford, 2003). However, researchers in gifted education have frequently expressed concerns regarding the underidentification of students with spatial strengths and the limited research on appropriate learning environments necessary for spatially gifted students to excel in school (Gohm, Humphreys, & Yao, 1998; Mann, 2004; Webb et al., 2007).
Spatial Ability and Mental Rotation Ability
Spatial ability as a dimension of intelligence has been studied in a variety of different populations and settings (e.g., Carroll, 1993; Eliot, 1987). Eliot classified efforts to investigate spatial ability into three phases: in the first phase (1904-1938), researchers identified spatial ability as one aspect of intelligence; in the second phase (1938-1961), researchers distinguished different facets of spatial ability; and in the third phase (1961-1982), researchers explored variance in spatial factors, such as gender differences and uses of spatial strategies, and related spatial ability to other abilities within various intelligence models. Since then, researchers using varied spatial tests have continued to find several spatial factors with differing definitions (Burton & Fogarty, 2003; Carroll, 1993).
Among various spatial factors, mental rotation ability involves a cognitive visualization process to mentally rotate two-dimensional (2-D) or three-dimensional (3-D) objects. Figure 1 delineates a hypothetical factor structure model of spatial ability based on Carroll’s (1993) factor analysis and a compilation of substructures of spatial factors identified by Burton and Fogarty’s (2003) visual imagery factor study, Tartre’s (1990) factor model of spatial visualization, and Ho and Eastman’s (2006) model of mental rotation. Here, 3-D mental rotation ability is categorized as one factor of the structure of spatial visualization ability involving more complex spatial tasks than 2-D, emphasizing accuracy without time constraints, rather than simple spatial tasks performed quickly with time constraints (Carroll, 1993).

Schematic illustration of a hypothetical factor structure model of spatial ability.
3-D mental rotation ability has been assessed to examine its associations with students’ STEM performance. Manipulating visual representations of 3-D objects is considered an essential cognitive process in solving problems situated in STEM contexts, such as geometry, geology, chemistry (including cellular/molecular structures modeling), technology, and engineering (including graphic design and objects’ motions; Bodner & Guay, 1997; Contero, Naya, Company, Saorin, & Conesa, 2005; Field, 2007; Sibley, 2005; Titus & Horsman, 2009). Several studies identified the role of spatial ability as a mediator of gender differences in students’ mathematics performance (Burnett, Lane, & Dratt, 1979; Casey, Nuttall, & Pezaris, 1997, 2001; Casey, Nuttall, Pezaris, & Benbow, 1995). For example, using path analyses, Casey et al. (1997) revealed that mental rotation ability of high-ability students with a mean age of 13.8 significantly mediated gender differences in Scholastic Assessment Test-Math (SAT-M), while no direct effects of gender on SAT-M existed.
Differences in the Spatial Ability of Gifted Students
Of the various spatial factors, meta-analyses on spatial ability have shown mental rotation ability to evince the largest gender differences across a wide range of ages (Linn & Petersen, 1985; Maeda & Yoon, 2013; Voyer, Voyer, & Bryden, 1995). Studies with gifted students suggest that spatial ability varies by age, gender, prior exposure to spatial tasks, and types of spatial tasks (Chan, 2007; Gallagher & Johnson, 1992; Stumpf, 1998). Chan found that although gifted students in primary school (Grades 3-6) in Hong Kong demonstrated no gender difference in spatial visualization ability in 2-D mental rotation, a statistically significant gender difference favoring male students existed in secondary school (Grades 7-12), and a statistically significant age difference favoring secondary school students existed in spatial ability.
Particularly, when mental rotation types of tests were timed, male students performed faster and better than female students (Gallagher & Johnson, 1992; Maeda & Yoon, 2013; Stumpf, 1998). Gallagher and Johnson (1992) reported gender differences for mathematically talented secondary students on a mental rotation test; the Cubes test (Thurstone, 1938). To assess processing speed, students marked the test with a graphite pencil during the timed portion of the test, and then they were allowed unlimited time to complete the test, using a red pencil to mark their answers. Even though the effect size of the gender difference was reduced from 1.50 to 0.36, statistically significant gender differences favoring males occurred on both the timed portion and completed test scores. Notably, male students scored higher in terms of both attempted number of items and correct number of items on the timed portion test.
Stumpf (1998) also found that gender differences in the spatial ability of academically talented students in seventh and eighth grades varied by spatial tasks. Among the four subtests of the computerized timed version of the Spatial Test Battery: Surface Development, Block Rotation, Visual Memory, and Perspectives (Eliot, Stumpf, & Tissot, 1992), male students scored higher than female students on all subtests except for Visual Memory. While female students scored higher on the Visual Memory test than males, they spent more time on the Block Rotation test scoring lower than male students. Studies on gifted students’ spatial ability have reported consistent findings that students’ ability differs by age, gender, and types of spatial tasks.
Meta-analytic studies on general students’ spatial ability (Linn & Petersen, 1985; Maeda & Yoon, 2013; Voyer et al., 1995) also found consistent results across studies regarding the differences by gender, age, and speed of spatial problem solving. Linn and Petersen (1985) suggested four possible explanations that hinder female participants’ performance in mental rotation: slow speed of mental rotation, inefficient use of spatial strategies, reliance on analytic strategy, and/or taking time to check answers with caution. Overall, the literature showed that even though the trend of gender differences favoring men in mental rotation ability seems evident, the causes of the observed gender differences in mental rotation ability and speed of spatial problem solving have not yet been clarified (e.g., Masters, 1998; Moè, 2009).
Gifted Students and Their Choice of Academic Major
A few studies have investigated the preferences of gifted students with respect to college majors. Kerr and Colangelo (1988) analyzed the academic plans of 76,951 high school students, organizing them into three groups based on their performance level: 80th, 95th, and 99th percentile of ACT composite scores. Among 196 choices of college majors, engineering was the top preference selected by male students: 32.0% of the 99th percentile group, 30.5% of the 95th percentile group, and 22.1% of the 80th percentile group. Female students preferred majors in health professions, biological sciences, and social sciences. Similarly, based on the 20-year follow-up data from Study of Mathematically Precocious Youth participants, Lubinski and Benbow (2006) reported that female students preferred degrees in humanities, life sciences, and social sciences, while male students seek degrees in STEM areas. Again, engineering was the top preferred degree for male participants. As national statistics revealed gender differences in bachelor’s degrees earned over three decades, with STEM-related degrees favoring males (Peter & Horn, 2005; Snyder & Dillow, 2010), these studies show a similar trend of preference by gender exists in gifted students.
Purpose of the Study
Researchers in gifted education recognize the necessity of assessing spatial ability as an identification procedure (Lohman, 2005; Naglieri & Ford, 2003) as a means to reduce biases from differences in culture, language, and socioeconomic status. While most studies in the literature focused on the spatial ability of elementary and secondary gifted students, there has been a lack of research on differences in spatial ability between gifted and general students, spatial ability of postsecondary gifted students, and the relationship between spatial ability and their choice of majors in college. Therefore, this study explores the spatial ability of undergraduate students at a major Midwestern public university by investigating its association with gender, academic major, and gifted program participation. This study also examined gender and speed of spatial problem solving as factors in student’s performance on a spatial ability test and particularly, spatial ability was considered as a mediating variable of students’ choice of majors in college.
The following research questions guided this study:
Method
Sample
The target population for this study was undergraduate students, 18 years or older, enrolled in a Midwestern, research-focused university. The university has a culturally rich racial and ethnic representation. Between Fall 2009 and Spring 2011 semesters, instructors for required courses in liberal arts, education, technology, and engineering as well as introductory courses in psychology and science were contacted via e-mail seeking permission to recruit study participants from their courses. A brief overview of the study was presented to students in the courses where instructor permission was granted. Recruitment efforts yielded 1,123 participants. However, data from students who have taken the same spatial instrument used in this study before and students with missing key demographic information used for the exclusion criteria of this study were excluded. For example, excluded were data from graduate students, outliers identified through descriptive statistics later, and international students because of differences between the United States and other countries’ gifted education programs and policies. In total, 936 students were included for data analysis.
Demographics on the participants were collected via a survey consisting of 10 questions, such as students’ current major, gender, age, and race/ethnicity. Students also self-reported when/if they were placed in K-12 gifted programs and/or if they were currently attending an honors program at the university. Recognizing that not all K-12 school systems offer gifted education programs, acceptance in university honors programs was used as a surrogate means of identification. For this study, students’ academic major was categorized into two groups, STEM majors and non-STEM majors. The STEM majors include students enrolled in majors within the Colleges of Science (which includes mathematics), Technology, and Engineering, while non-STEM majors are students enrolled in the Colleges/Schools of Agriculture, Consumer and Family Science, Education, Liberal Arts, Management, Pharmacy, Nursing, and Health Sciences, and the Undergraduate Studies program.
Among 18 double majors, five participants who sought both degrees in STEM areas were assigned to STEM majors and nine participants who sought both degrees in non-STEM areas were assigned to non-STEM majors, respectively. Four students majored in both areas, so their data were not utilized when analyses were related to majors. If students self-reported as participating in K-12 gifted education programs or enrolled in an honors program at the university, they were considered members of the gifted program group. If students did not report any participation in gifted programs, then they belonged to the nongifted program group. Thus, grouping variables, such as gender, academic major, and gifted program membership, have two levels as categorical variables in statistical analyses.
Measures
Spatial Ability
Students’ spatial abilities were measured by the Revised Purdue Spatial Visualization Tests: Visualization of Rotations (Revised PSVT:R), which showed good internal consistency reliability, Cronbach’s α = .84 with data from approximately 2,400 first-year engineering students and construct validity of one-dimensional factor structure (Maeda, Yoon, Kim-Kang, & Imbrie, 2013) and measurement invariance across gender (Maeda & Yoon, 2016). Originally, Guay (1976) developed the PSVT:R to measure spatial visualization ability in 3-D mental rotation of individuals aged 13 years or older. Since then, the test has been frequently used in STEM education (Maeda & Yoon, 2013; Yoon, 2011). The Revised PSVT:R has 2 practice items followed by 30 test items, so the maximum score is 30. For spatial problem solving, the 30 items consist of 13 symmetrical and 17 asymmetrical figures of 3-D objects, which are drawn with a 2-D isometric format. All the figures contain shapes of cubes or cylinders with varied truncated slots (e.g., see Figure 2). In each item, respondents need to find a figure with a same rotation of the question figure as shown in a given pair of example figures, among five choices, which are rotated in different directions and shown at different angles.

A sample item from the Revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R).
Speed of Spatial Problem Solving
Originally, Guay (1980) imposed a 20-minute time limit on the PSVT:R with 30 items. However, in the absence of its test manual, the PSVT:R has been inconsistently administered with varied time limits in the literature (Maeda & Yoon, 2013; Yoon, 2011). Lu and Sireci (2007) argued that reliability and validity evidence of a timed test is questionable if the effect of time constraints (speededness) is not considered. If speededness occurs by imposing a time limit on a test, this can cause an overestimation of reliability evidence, changes in factor structure, and contaminated estimation of item parameters in item response theory. So far, studies using the PSVT:R rarely examined how the psychometric properties, such as reliability and validity evidence, of the PSVT:R were affected when a time limit is imposed on the test. Thus, we administered the Revised PSVT:R as a power test (no time limit) to measure an individual’s true spatial ability level on the test. Each participant’s test completion time was measured to explore the association between the accuracy and the speed of spatial problem solving on the Revised PSVT:R.
Procedures
Participants were given a paper-and-pencil version of the Revised PSVT:R. Since the Revised PSVT:R consists of 30 multiple-choice items, Scantron sheets were used to record participants’ responses on the test and facilitate accurate and fast scoring. Directions were read to the participants by the researcher at which time two example questions were reviewed. Participants were informed that they had as much time as they wanted to complete the test. They were told that the study was investigating different properties of the test, one of which was a test completion time. A digital timer was provided on a large display in classroom settings or on a computer screen for individual participants. Participants were asked to record their test completion time when they finished the test before beginning the demographic survey.
Data Analysis
During a descriptive data analysis, the demographic information was used to check sample characteristics and reviewed for outlier identification. Participants who were older than 25 years were considered outliers because participants older than 25 years may not be representative of an undergraduate population. In addition, those who finished the Revised PSVT:R in less than 5 minutes were also excluded as outliers (n = 5). Solving all 30 spatial problems successfully is practically impossible to accomplish in such a short time period and may not appropriately reflect the hurried participants’ spatial ability. The statistical methods for this study included independent samples t tests and analyses of variance (ANOVAs) to test mean differences between/among groups for Research Questions 1, 2, 3, and 4. To answer Research Question 5, path analyses were used to test a plausible model considering spatial ability as a mediating variable. Path analysis is a statistical technique to investigate a hypothesized model of the multiple variables that are indicated by a single measure, so that path analysis demonstrates direct and indirect relationships among the multiple variables. Path analysis has been known as a variation of multiple regressions analysis and a family of structural equation modeling (Kline, 2011; Stage, Carter, & Nora, 2004).
The initial path model was constructed based on the literature review to explore associations among the observed variables, considering spatial ability as a mediator, as shown in Figure 3. Here, students’ spatial ability and speed of spatial problem solving (as measured by test completion time) are continuous variables, while gender, academic major, and gifted program participation are dichotomous categorical variables. Spatial ability was considered as a mediator that transmits the effect of the other variables to academic major. For the initial path model, spatial ability and academic major were placed in a reciprocal relation (feedback loop) because spatial ability may affect a student’s choice of major, or experiences in STEM disciplines may affect their spatial ability. Direct effects of gender and the speed of spatial problem solving on spatial ability and direct effects of gender and gifted program membership on the academic major were also hypothesized. Here, due to gender differences in the speed of spatial problem solving in the literature, a direct effect of gender on the speed was also hypothesized.

A hypothetical path model to test theoretical relationships among factors that may contribute to spatial ability and choice of major.
For inferential statistics, IBM-SPSS 18.0 for Windows (SPSS, 2010) was utilized. Prior to statistical analyses, assumptions for each statistical method were checked: independent observation, normal distribution, and equal variance. For the correlation between the Revised PSVT:R scores and the other variables, Pearson r correlation coefficients were obtained. However, when dichotomous categorical variables (gender, academic major, and gifted program membership) were involved, phi correlation coefficients were obtained. All statistical results were evaluated with α = .05 and their associated effect sizes reported (Cohen, 1988).
For the path analysis model testing, Mplus 7.0 (Muthén & Muthén, 1998-2012) was chosen to employ the robust weighted least squares method, which is a recommended estimator for the analysis of categorical data (Brown, 2006; Kline, 2011). Particularly, robust weighted least squares utilizes probit regressions as a default for the dichotomous categorical outcome that is academic major in the study. Thus, the underlying latent variable of academic major was conceptualized to be a variable with a threshold that allows calculation of the probability of the academic major selected (Muthén & Muthén, 1998-2012). Since categorical variables were involved, raw data were used instead of matrix input, and test completion time was scaled in minutes to avoid an ill scaled condition (where the ratio of the largest to the smallest variance is over 10.0 [Kline, 2011]). Based on the initial path model, several path models were constructed to test feasibility among associated variables by deleting statistically nonsignificant paths after each modification of a model. When a final model with the best fit statistics was constructed, MacKinnon’s mediation test was applied, using bootstrapping to find the empirical standard errors (SEs) and asymmetric confidence intervals (CIs) for the indirect effect (Bollen & Stine, 1990; MacKinnon, Fairchild, & Fritz, 2007; Shrout & Bolger, 2002).
Results
Descriptive Statistics
Participants’ demographic profiles are presented in Table 1. Descriptive statistics, such as means and standard deviations on the Revised PSVT:R and the test completion time, are presented by participants’ characteristics, such as age, gender, race/ethnicity, grade level, academic major, and gifted program participation. Among 374 students with gifted program membership, 40 out of 58 students who were placed in an honors program were also nominated as gifted while in K-12. The average age of the 936 participants was 20.55 years with a standard deviation of 1.49. Figure 4 shows students’ performance on the spatial test by college/school.
Undergraduate Students’ Performance by Score and Completion Time on the Revised PSVT:R.
Note. STEM = science, technology, engineering, and mathematics; PSVT:R = Revised Purdue Spatial Visualization Tests: Visualization of Rotations.
Due to unspecified responses, the numbers are inconsistent with the participant numbers in test scores.

A total of 933 undergraduate students’ mean test scores with 95% confidence intervals on the Revised Purdue Spatial Visualization Tests: Visualization of Rotations (Revised PSVT:R) by college/school.
The mean score of the 936 participants on the Revised PSVT:R was 19.05 with a standard deviation of 6.08. The students’ scores ranged from 3 to 30 and 12 students (1.28% of participants) achieved a perfect score. While the score distribution was approximately normal with a Skewness index of −.22 and a Kurtosis index of −.79, Kolmogorov–Smirnov and Shapiro–Wilk normality tests were statistically significant due to the large sample size (Field, 2009). The reliability coefficient of internal consistency, Cronbach’s α was .86 on the Revised PSVT:R scores for the 936 participants.
The test completion time reported by 928 participants ranged from 5.02 to 35.10 minutes with a mean of 14.97, a standard deviation of 4.49, and a median of 14.28 minutes. A total of 95% of participants completed the Revised PSVT:R in 23.5 minutes or less. The test completion times of 12 perfect scorers ranged between 11.45 and 25.40 minutes with a mean of 17.53 minutes and a standard deviation of 3.95 minutes. The distribution of test completion time was also approximately normal with a Skewness index of .81 and a Kurtosis index of 1.05. Again, Kolmogorov–Smirnov and Shapiro–Wilk normality tests were statistically significant as expected due to the large sample size.
Table 2 shows intercorrelations among the measures: The Revised PSVT:R scores, test completion time, gender (female vs. male), gifted program membership (general vs. gifted program), and academic major (non-STEM vs. STEM). Overall, the correlations among the observed variables ranged from −.06 to .38 and variance inflation factors as collinearity diagnostics ranged from 1.04 to 1.18, indicating no concerns of multicollinearity (Kline, 2011). The correlations between the Revised PSVT:R score and other variables, except age were statistically significant at α = .05 level. Particularly, the correlations of the Revised PSVT:R score with test completion time, academic major, and gender were positive and moderate with Pearson r correlation coefficient, ρ r = .32, .38, and .36 respectively. The phi correlation coefficient between gender and academic major had also a moderate magnitude with ρ phi = .32.
Correlations Among the Revised Purdue Spatial Visualization Tests: Visualization of Rotations (Revised PSVT:R) Scores and the Other Demographic Information.
Note. STEM = science, technology, engineering, and mathematics. Intercorrelations among the Revised PSVT:R scores and the other variables are presented above the diagonal. When dichotomous categorical variables (3, 4, and 5, where gifted membership, STEM major, and male student are coded as 1 and their counterparts are coded as 0) were involved, Phi correlation coefficients were obtained. Missing cases were excluded pairwise.
p < .05.
Program Difference: General Versus Gifted Program Membership
A statistically significant difference, t(934) = 4.78, p < .001, Cohen’s d = 0.32, existed in spatial ability between two groups of students who were placed in gifted (M = 20.20, SD = 6.17) versus in general programs (M = 18.28, SD = 5.90) with a mean difference of 2.02. Students who had been placed in gifted programs spent more time, an average of 1.30 minutes, to solve all the problems than ones in general education programs. Under Levene’s test for equality of variance was statistically significant, the t test that does not assume equal variances was statistically significant, t(725.15) = 4.25, p < .001, with a weak effect size (Cohen’s d = 0.29).
Academic Major Difference: Non-STEM Versus STEM Majors
The Levene’s test for equality of variance was statistically significant because the standard deviation in STEM majors (M = 23.22, SD = 4.76) was smaller than the one in non-STEM majors (M = 17.76, SD = 5.83). Thus, the t test that does not assume equal variances was used to test the mean difference between the two different academic major groups. A statistically significant difference, t(446.27) = 14.11, p < .001, existed between the spatial ability of students in the two areas. With a mean score difference of 5.46, students who were studying in STEM areas performed better than students in non-STEM areas. The effect of the difference was large with the effect size of Cohen’s d = 0.98. However, students in STEM majors (M = 15.63, SD = 4.43) spent more time than the ones in non-STEM majors (M = 14.76, SD = 4.50) to solve the spatial problems with the mean difference of .86 minutes (51.88 seconds), which was statistically significant, t(922) = 2.50, p = .013, with a weak effect size (Cohen’s d = 0.19).
Gender Difference
Male students (M = 21.66, SD = 5.56) outperformed female students (M = 17.20, SD = 5.75) with the mean difference of 4.46 on the Revised PSVT:R, with t(934) = 11.84, p < .001, Cohen’s d = 0.79. On average, female students spent 0.07 minutes (4.07 seconds) more than male students, showing no statistical difference in test completion time between the two genders with, t(926) = −0.23., p = .82, Cohen’s d = 0.02.
Relationship Among Gender, Academic Major, and Gifted Program Membership
Because the correlation matrix (see Table 2) and the t-test statistics showed statistically significant relationships among the three variables (gifted program membership, academic major, and gender), a three-way ANOVA was conducted to examine the unique effect of each variable by partialling out the effect of the other variables (see Table 3). The Levene’s test of homogeneity of equal variance was statistically significant, F(7, 924) = 5.62, p < .001, which may be due to the large sample size (Field, 2009). The analysis revealed statistically significant main effects of gifted program participation, F(1, 924) = 16.84, p < .001, partial η2 = .018, academic major, F(1, 924) = 67.83, p < .001, partial η2 = .068, and gender, F(1, 924) = 67.08, p < .001, partial η2 = .068. Small to moderate effect sizes for the three main effects (r = .13, .26, and .26) were noted by gifted program participation, academic major, and gender, respectively. However, no statistically significant two-way or three-way interaction effects existed among those three variables. In total, the model explained 22.5% of the total variation (adjusted R2 = .23).
Three-Way Analysis of Variance Results on Differences in Undergraduate Students’ Spatial Ability by Student Characteristics.
Note. df = degrees of freedom.
Association Among the Observed Variables
In the models being tested, students’ age was not considered as a variable in the models because age in this sample was not significantly correlated with spatial ability as documented in Table 2. The initial hypothetical model with the feedback loop (Figure 3) did not converge because it was not identified to obtain estimates of the model, 1 even though the model has more observations than the number of free parameters to be estimated. Therefore, several recursive models followed. Statistically, nonsignificant paths were deleted one at a time to improve the model fit indexes. For example, the direct effect of speed on gender was not statistically significant, as expected from the nonsignificant correlation between the two variables in Table 2 and the result of nonsignificant mean difference in the independent sample t test. Thus, this nonsignificant path was deleted in the following model. The fit of each model was evaluated with the model fit indexes, such as chi-square, root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker–Lewis index (TLI). Finally, Table 4 shows indicator of model fits and both unstandardized and standardized parameter estimates of the final path model. The fit indices were χ2 = 0.663, p = .718, df = 2, CFI = 1.00, TLI = 1.00, RMSEA = 0.00 with 90% CI between 0.00 and 0.05. The fit indices satisfy all the criteria as a good fit in the literature (Brown, 2006; Kline, 2011). Note that a recent study by Kenny, Kaniskan, and McCoach (2015) cautioned about possible false indication of RMSEA fit indices because the RMSEA tends to be too large with small degrees of freedom and small sample sizes in structural equation modeling, which is not the case of this study.
Final Path Model With the Goodness-of-Fit Indices and Standardized Parameter Estimates.
Note. SE = standard errors; CI = confidence interval; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root mean square error of approximation.
When WLSMV, a robust weighted least squares estimator, was used as a parameter estimator for binary, SEs and p values are not available for standardized parameter estimates.
p < .05.
Figure 5 depicts the final path model diagram with paths of statistically significant estimates at α = .05. While the interpretation of the parameter estimates related to spatial ability, which is a continuous variable, is as usual, the interpretation of the parameter estimates centered on academic major is different because the outcome variable is a binary categorical variable. For example, R2 was .25 for spatial ability and .02 for test completion time, which are moderate and small effect sizes, meaning that the proportions of the variance of spatial ability and test completion time explained by the final path model were 25.1% and 2.0%, respectively. However, the proportion of the variance, pseudo R2, which was.34, refers to the underlying latent variable of academic major, conceptualized to be a continuous and normally distributed trait (Kline, 2011), so pseudo R2 is not informative for the binary outcome any more. In addition, residual variance, which was .84, is not an estimated free parameter but a remainder computed as 1 - the explained variance (1 - R2) (Kline, 2011).

The final path model with estimates of direct and indirect effect.
When an outcome variable is a categorical variable, three types of information are usable to interpret the parameter estimates; (a) signs, (b) significance of the estimated coefficients, and (c) probabilities converted from the estimated coefficients. Even though estimated coefficients can be converted to probabilities on the categorical variable, sign and significance of the path are the most important information because they confirm the putative associations among the variables through testing (Kline, 2011). Table 5 presents the direct, indirect, and total standardized effects of gender and gifted program participation on probability of students’ major with mediating effects of spatial ability and speed in spatial problem solving. The empirically estimated SEs and 95% CIs from 500 bootstrapped samples for the data showed that all indirect effects are statistically significant, revealing the roles of spatial ability as a mediator in the model as shown in Figure 5.
Decomposition of the Effects of Variables on the Probability of the Area of Academic Major (Non-STEM vs. STEM).
Note. Empirically estimated standard errors (SEs) and 95% confidence intervals (CIs) were calculated from 500 bootstrapped samples.
Here, the parameter estimates, like probit regression coefficients, show how gender and gifted program participation relate to the probability of students’ academic major directly and indirectly. The path model shows that gender had the most total effect on the probability of students’ majors in STEM areas with gifted program membership following next. The 95% CIs of the total effects indicate that the difference between gender and gifted program membership is statistically significant. While the standardized estimates explain the change in the probability of outcome variables for a one unit change in the variables, the computation to obtain the probability of binary outcome using probit regression coefficients is complex and the probability is converted from the z-score distribution (Muthén & Muthén, 1998-2012). According to the probability computation, gifted male students had about 14% higher probability of majoring in STEM areas than did gifted female students and about 8% higher probability than male students with nongifted program membership.
Discussion
This exploratory study investigated undergraduate students’ spatial ability and speed in spatial problem solving by gifted program membership, academic major, and gender. The three-way ANOVA resulted in statistically significant main effects of gifted program participation, academic major, and gender without any interactions among the three variables. This means that if participants satisfy at least one condition of the three main effects, such as gifted program participation, STEM majors, or male students, then they performed better than their counterparts on the chosen assessment of spatial ability when the other conditions are equal. In addition, statistically nonsignificant interactions among the three variables indicate that none of the three variables functioned as a moderator of students’ performance on spatial ability.
Based on the literature review, we constructed a hypothesized model, tested the model using path model analyses, and revealed associations among the observed variables in terms of direction and magnitude of the relationships. The path model featured in Figure 5 revealed the unique relationships by controlling the effect of other variables. The model indicates that the measured spatial ability functions as a mediator of the effects of gender, gifted program participation, and speed of spatial problem solving (test completion time) on the direct route to probability of students’ choice of academic major. Here, we discuss further details of the associations among the variables tested in the final path model along with the results from the inferential statistics.
Gifted Program Membership, Spatial Ability, and STEM Majors
Students who were placed in gifted programs tended to perform better on the spatial test than those who were in general education programs, but the effect was smaller (ω = .13) than the other effects, gender (ω = .27) and academic major (ω = .27). In the final path model (Figure 5), after controlling for the effect of spatial ability, gifted program membership also had a direct effect on the probability of students’ choice of major, meaning that regardless of spatial ability, gifted students were more likely to choose STEM majors.
On one hand, while spatial ability seems to have a minor effect as a mediator to the pathway to STEM majors for gifted students compared with the effects through the pathway from gender, the statistically significant effect suggests that traditional identification methods for gifted programs that do not include spatial ability may have missed students who are spatially gifted and may excel in STEM areas, therefore, spatially gifted students may be underserved in gifted programs. On the other hand, taking the results a step further, the statistically significant indirect effects through spatial ability on the probability of majoring in STEM areas suggest that spatially gifted students might be drawn into STEM areas, regardless of whether they are identified for gifted programs.
In this exploratory study, differences in the types of gifted programs were not accounted for as disparities in program offerings vary significantly; some individuals may have participated in a pullout program meeting for a few hours a week versus a magnet school for gifted students. For those students who were identified as gifted based on participation in this university’s honors program, the acceptance is based on a holistic approach similar to many K-12 gifted program identification processes: GPA, test scores usually focused on verbal and mathematical reasoning, and personal essay, and so on. As mentioned earlier, honors program participation as a pseudoidentification tool was chosen to find potentially gifted students who, for a variety of reasons, may not have had an opportunity to participate in K-12 gifted programming. Additionally, giftedness is seldom distributed equally across cognitive domains (Andersen, 2014) and it is common to find programs structured in a way to accommodate these differences, for example, gifted programs for mathematics or reading—some will be identified for one but not both.
Findings from this study suggest that participation in a gifted program has a relationship to higher performance on spatial ability assessments; however, causality is not indicated—is it the programing, the identification process, a combination of both or some other factor that creates this difference? As noted by Webb et al. (2007), the omission of spatial ability provides an incomplete picture of the intellectual talent. Kell and Lubinski (2013) reported that SAT-M and Scholastic Assessment Test-Verbal (SAT-V) scores together accounted for 10.5% (p < .01) of the variability of creative productivity among gifted students and adding a measure of spatial ability accounted for an additional 7.5% (p < .01) of the variance. In her discussion of ability constellations, Andersen (2014) notes that the typical consistent constellation for students who earn degrees across all STEM disciplines “ . . . consists of a high SAT-M, a lower SAT-V and a spatial ability score between the two . . . ” (p. 119). In future studies, collecting data on the type of gifted programming an individual participated in and types of measures used for identification may add to our understanding of the relationships between these three cognitive abilities (mathematics, verbal, and spatial) and STEM career choices.
Gender, Spatial Ability, and STEM Majors
According to the path model, the effect of gender (.42) on the probability of majoring in STEM areas had the larger total effect than gifted program membership (.20) and the differences were statistically significant (see Table 5). Even after controlling for the effect of spatial ability, gender still had a direct effect (.27) on the probability of majoring in STEM areas. This gender disparity in the choice of academic major has been a persistent phenomenon in the literature. The book, “Gender and Occupational Outcomes: Longitudinal Assessment of Individual, Social, and Cultural Influences,” edited by Watt and Eccles (2008), brought various factors together to explore this issue. Based on the outcomes assessed with longitudinal data from several countries and diverse perspectives from various disciplines, they discussed mathematics as a crucial filter that changes students’ career choices.
Yet the gender gap in mathematical performance in the United States is diminishing, with females performing better in high school mathematics and science courses than males and equally likely to take advance course work in these disciplines (Yoon & Strobel, 2017); however, SAT-M scores under predict performance in college mathematics courses for women (Institute of Medicine, National Academy of Sciences, & National Academy of Engineering, 2007).
Many studies suggest that differences in spatial ability may underlie differential mathematics performance. Some spatial tasks show sex differences favoring girls, others show differences favoring boys, and disagreement exists on the relevance and predictive power of each set of tasks. (Institute of Medicine, National Academy of Sciences, & National Academy of Engineering, 2007, p. 30)
In her work to narrow the “spatial skills” gap, Sorby (2009) cites the work of Linn and Petersen (1985) who found that males outperform females on timed tests of mental rotation tasks due to differences in strategy choices with males choosing a holistic approach and females choosing an analytic, stepwise strategy. Sorby (2009) developed a spatial skills training course for freshman engineering students exhibiting poor spatial skills. Her longitudinal studies documented improved grades in STEM courses and resulted in higher retention rates, especially among women, for those who completed her training compare with those who did not. While this raises the question on whether gender difference is biological or a result of cultural and social influences, Sorby’s (2009) research suggests the later. Uttal et al. (2013) also provided evidence on malleable spatial skills through a meta-analysis on the effects of training on improvement in spatial ability.
Other individual factors, including motivation, values, and self-concepts as psychological variables that shape career aspirations, and environmental factors, such as gender stereotypes, social, cultural, and institutional constraints often pose barriers to careers in STEM fields for women (see Hill, Corbett, & St. Rose, 2010, for a detailed discussion). An example of efforts to change the environment is the effort of National Academy of Engineering to change the perception of engineering from one dependent on aptitude and strong interest in mathematics and science to an inherently creative discipline that seeks to improve human welfare; a perspective that is more relevant to the career aspirations of today’s youth (National Academy of Engineering, 2013). Webb et al. (2007) evaluated student profiles using mathematics and science graduate students as their reference population and found that females with high spatial ability “ . . . were more congruent with their same-sex graduate student counterparts than were female adolescents with relatively low levels of spatial ability” and that “ . . . that spatial ability may be more relevant for identifying math science promise in girls relative to boys” (p. 411).
Speed in Spatial Problem Solving
While contradictory results exist in the literature on gender differences in mental rotation processing speed (Bryden, George, & Inch, 1990; Jansen-Osmann & Heil, 2007; Lajoie & Shore, 1986; Scali, Brownlow, & Hicks, 2000), in this study, there was no gender difference in test completion time. These results are inconsistent with the literature on gifted students. Stumpf’s (1998) work with high school juniors found that on average academically talented females take more time than academically talented males on timed block rotation tasks. Gallagher and Johnson’s (1992) study of seventh and eighth grade gifted students found statistically significant gender differences with male students performing better than female students during both timed and untimed test conditions; however, gender differences diminished under untimed conditions, with mean differences in number of items correctly answered decreasing from 7.49 items on the timed portion to 1.62 when time constraints were removed.
Students who spent more time on the spatial test tended to score higher than those who did not, and students who were in gifted programs spent more time on the spatial test than their general education peers. This is consistent with Reams, Chamrad, and Robinson’s (1990) findings where tests that value quick responses may give misleading results with gifted students as these individuals may be “planful and/or cautious, or so perfectionistic that they think twice before responding” (p. 108). As a similar trend, Yoon and Min (2016) found a positive moderate correlation between spatial test completion time and an introductory atmospheric science course grades of undergraduate students and suggested that the sincere attitude of students reflected in the spatial task may lead better performance of the coursework.
Regarding the association with age, no statistically significant difference across age in the spatial test scores implies no developmental effect in the age range of undergraduate students. The negative correlation between age and the test completion time indicates that older students tend to spend less time on spatial problem solving even though the magnitude of correlation is small.
Future studies looking at time spent on each test item by individual are warranted. With the wide range of self-reported test completion times (5.02-35.10 minutes with M = 14.97 minutes and SD = 4.49 minutes), it is reasonable to infer that not all students attempted all problems and that completion time may not be a true measure of effort. A computer-administrated version that can track item exposure time or time spent on each item would add to the fidelity of the data. The decision to excluded quick completers in this study (less than 5 minutes) may pose the risk of missing an extraordinarily spatially gifted student. In future studies “quick completion” cases should be analyzed individually. For those quick completers who score high, follow-up assessments would be needed to determine the role of chance played in their score.
Combining data collected in this manner with assessment of problem-solving styles (i.e., planful vs. guess-and-check) may offer further insight into differences in spatial ability measured block rotation assessments. Other areas for future research include how gifted students differ from general students in (a) processing spatial information, (b) speed and accuracy trade-offs by the degree of complexity of spatial tasks, and (c) preferred problem-solving style and traits that may affect test performance.
Limitations and Future Studies
We acknowledge that due to the convenience sampling, there might be a limitation in generalizing the findings of this study to other populations in different educational settings. In addition, grouping variables as well as demographic information used in this study are based on the self-report of participants, so possible response bias might affect the results of this study.
The statistically significant difference in test completion time between the gifted and nongifted groups but not between genders was intriguing and is worthy of further investigation. The literature suggests that spatially gifted students are underrepresented in gifted programs. In programs that used spatial assessments as one aspect of identification, is the timed nature of these assessments resulting in missed students with spatial strengths? Other areas noted in the discussion for future studies include a means to obtain better fidelity in the types of gifted education experiences of the participants; the inclusion of an interest survey and assessment of problem-solving styles, developing a more robust means of minimizing the role of chance in participant responses, and modification in the method of the assessment format to allow evaluation of individual items.
While this study formulated a theoretical basis on the relationship of spatial ability to gender, choice of majoring in a STEM discipline, and giftedness, we acknowledge possible existence of unaccounted influences that may affect these relationships, such as motivation, mathematics ability, IQ, and so on. Therefore, further exploration incorporating external variables that may affect both spatial ability and the other variables is necessary to understand the role of spatial ability in identification of gifted students and their talents in STEM. In addition, as a correlational study, relationships among observed variables are identified. However, presumed causal relations, in other words, some kinds of effect or relationship, are inferred, this inference does not guarantee actual causality that connects a variable (i.e., spatial ability) responsible for other variables (i.e., STEM major) (Hayes, 2013).
Conclusion
This study explored the role of spatial ability in undergraduate students’ choices of academic major by gender, gifted program participation, and speed of spatial problem solving. Data collected in this study support the theory that spatially talented students and students who participate in gifted education programs are more likely to pursue careers in STEM. Although questions on the best way to assess spatial ability are still unanswered, this study offers some insight into the relationships between the targeted variables as well as potential ways to improve future assessments. Webb et al. (2007) noted that Terman rejected two future Nobel Laurates in physics for his seminal work on intellectual talent due to an overemphasis on verbal-based assessments in the assessments used to identify participants. Adding mathematical assessments has broaden the pool but we may still “ . . . miss more than half of the top 1% in spatial ability” (Webb et al., 2007, p. 398). Adding spatial ability assessments to gifted program identification would complete the intellectual trilogy of verbal, mathematical, and spatial abilities and may widen the pool of potential gifted STEM students (Andersen, 2014; Wai et al., 2009).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
