Abstract
The present study examined the psychometric equivalence of administering a computer-based version of the Gifted Rating Scale (GRS) compared with the traditional paper-and-pencil GRS-School Form (GRS-S). The GRS-S is a teacher-completed rating scale used in gifted assessment. The GRS-Electronic Form provides an alternative method of administering and completing the 72-item scale, which eases scoring and interpretation of the measure. The GRS-E reduces the potential for human error by implementing automated scoring. An automated form may increase the likelihood that the measure is utilized, supporting wider screening in the schools for gifted identification. Five of the six subscales were completed by teachers in matched pairs on 185 students. Results indicate strong internal consistency across both administration forms and no significant difference in mean scores. This study contributes to the growing body of literature in support of the psychometric equivalence of paper-and-pencil report forms administered in a computer-based environment.
The use of technology in psychological assessment is rapidly developing. Research has demonstrated that computer-based assessments are convenient and efficient (Brock, Barry, Lawrence, Dey, & Rolffs, 2012; Carlbring et al., 2007; Naglieri et al., 2004). Although the merit of computer-based testing is well documented, research regarding its utilization within schools is still in its infancy (Daniel, 2012), and there is no research on its use in gifted assessment.
In order for gifted students to receive services, they must first be identified. Streamlining the identification process increases the likelihood that this population will receive much-needed services. The Gifted Rating Scale–School Form (GRS-S) is a paper-and-pencil teacher report form that assesses students’ talents across different domains. The GRS-S in the traditional paper-and-pencil form takes 10 to 15 min to complete and is intended to be one component of a multi-faceted approach to gifted identification (Pfeiffer, 2013, 2015; Pfeiffer & Blei, 2008; Pfeiffer & Jarosewich, 2003). The purpose of the present study was to investigate the psychometric equivalence of administering the GRS-S through a computer-based system.
The previously established reliability and validity of the GRS (Li et al., 2009; Pfeiffer & Jarosewich, 2007; Pfeiffer & Petscher, 2008; Pfeiffer, Petscher, & Kumtepe, 2008; Rosado, Pfeiffer, & Petscher, 2015; Ward, 2005, 2007) make it an excellent candidate for computer-based administration. The format, administration, and scoring are straightforward. Teacher’s ratings have been found to corroborate and often contribute unique information to the gifted identification process (Pfeiffer, 2012; Pfeiffer et al., 2008).
Computer-Based Assessment
Research has explored the potential for computer-based assessment of intellectual ability (Daniel, 2012), and information is available about computer-based testing, Internet distribution of assessment, and conversion of current paper-and-pencil measures to electronic versions (Brock et al., 2012; Daniel, 2013; Davidov & Depner, 2011; Naglieri et al., 2004). However, no study has yet explored the validity or utility of electronic administration of a gifted teacher report form.
Scoring computer-based tests is less expensive and more accurate than traditional paper-and-pencil administered assessments (Brock et al., 2012). Computer-based assessments are scored automatically as the data are entered. Error in scoring is no longer a threat, and time is no longer spent making calculations or conversions. Research has shown that test users feel computer-based administration is more convenient than traditional administration. This increases teachers’ motivation to be more careful in completing scales (Brock et al., 2012; Naglieri et al., 2004).
A myriad of psychological tests and assessments are being converted from traditional paper-and-pencil forms to computer-based versions. Questionnaires in couple’s research (Brock et al., 2012), panic and agoraphobia research (Carlbring et al., 2007), locus of control scales (Hewson & Charlton, 2005), personality measures (Meade, Michels, & Lautenschlager, 2007), and intelligence testing (Daniel, 2012; Preckel & Thiemann, 2003) are illustrative of areas in which online versions have been tested. Best practice recommends that test authors validate each converted measure independently to assess quantitative and qualitative equivalence (Brock et al., 2012; Carlbring et al., 2007; Naglieri et al., 2004).
Overview of the Present Study
This research aimed to answer three questions. First, we explored whether the GRS remains reliable when administered on the computer. To examine this, we looked at whether internal consistencies (i.e., Cronbach’s alphas) remained acceptable on the computer-based assessment, and whether test–retest reliabilities were comparable across forms of administration. Second, we investigated whether the GRS maintains quantitative equivalence when administered via a computer. To answer this second research question, we examined the mean differences across written and electronic methodologies through a paired-sample t-test. Third, we examined whether the GRS maintains qualitative equivalence when administered on a computer. We looked at correlations among subscales administered across both written and computer formats. We hypothesized that the paper-and-pencil and computer-administrated forms would demonstrate quantitative and qualitative equivalence and maintain adequate reliability.
Method
This study implemented a matched pairs design to examine the consistency of raters’ evaluation of students on the GRS across two forms of administration, paper-and-pencil and computer based. Our methodology intentionally modeled a recently published manuscript that appeared in the journal Assessment; we felt that the methodology reported by Brock et al. (2012) served as a “gold standard” for evaluating equivalence.
Participants
Participants were teachers from a University-based public K-12 charter lab school located in the southeastern United States. The school is affiliated with Florida State University and chartered to support research. The study solicited participation from all teachers of K-5th grades. All teachers agreed to participate. The student population is ethnically diverse and representative of the state-wide public school population. As an incentive for participation, teachers were entered into a raffle with the chance to win one of two US$25 gift cards.
This study consisted of 19 teachers (18 female, 1 male); each teacher completed both forms of the GRS on students in their classroom. A total of 246 pairs of forms (a corresponding electronic and paper copy) were distributed; 185 matched pairs (a corresponding electronic and paper copy) were collected. A response rate of 75.2% for students who returned parent consent and student assent forms was attained. On one of these 185 matched pairs, the Intellectual Ability scale was not completed, resulting in 184 matched pairs for the Intellectual Ability scale and 185 matched pairs for the remaining Academic Ability, Creativity, Leadership, and Motivation scales.
Several factors contributed to less than a 100% response rate: Several teachers completed either the electronic form or the paper form, but not both; missing teacher-completed GRS record forms for 25 students; and 12 corrupted or otherwise nonfunctional electronic forms.
The age range of the sample was 6.3 to 12.3 years, evenly divided by gender (94 male students, 91 female). This age range is consistent with the GRS standardization sample. The student population at the lab school reflects the distribution of demographic characteristics found in Florida’s public school system. The ethnicity of the students at the charter school is 50.09% White, 29.25% African American, 12.14% Hispanic, 3.14% Asian, 0.18% Native American Indian, and 5.21% Multicultural. Approximately 24% of the Elementary school students receive either free or reduced lunch. The demographic data collected on the GRS forms included how well and the length of time which the teacher knew the student, in addition to the student’s age, gender, and grade. Although the GRS is designed for assisting in gifted identification, it was made clear to participating teachers and parents that the study sought to include all students, regardless of perceived ability level.
Measures
Two measures were used in this study, the GRS-S and the Gifted Rating Scale–Electronic Form (GRS-E).
The GRS-S
The GRS-S is a 72-item teacher rating scale designed to assist in the identification of giftedness. The GRS is intended to be used as part of a comprehensive gifted assessment protocol and not as a “stand-alone” test. It is widely used in the schools both in the United States and internationally (Margulies & Floyd, 2004; Pfeiffer, 2013, 2015; Ward, 2007).
The GRS is an empirically validated measure for children aged 6.0 to 13.11 (Margulies & Floyd, 2004; Pfeiffer, 2015; Pfeiffer & Jarosewich, 2003, 2007; Pfeiffer et al., 2008); the scale rates ability in five gifted domains: Intellectual Ability, Academic Ability, Creativity, Artistic Talent, and Leadership. A sixth scale, Motivation, is part of the GRS but not considered a type of giftedness (Pfeiffer & Jarosewich, 2003, 2007). Each scale includes 12 items for which teachers rate a student on a 9-point Likert-type scale. Detailed information on each scale appears in the user manual (Pfeiffer & Jarosewich, 2003). Raw scores are converted to T-scores (M = 50, SD = 10) and percentile ranks, which reflect the likelihood that a child might be gifted.
Standardization of the GRS-S was linked to the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV); internal consistency for each scale falls within the excellent range (Cronbach’s alpha ranged from .97 to .99 for all six scales, and standard error of measurement ranged from 1.0 to 1.41). Correlations assessing test–retest reliability for each scale were .88 or above. Strong evidence supporting test content, internal structure, and external relations is also available; readers interested in more detailed information regarding reliability, validity, and standardization are directed to an article that reviews the GRS (Margulies & Floyd, 2004). Additional information regarding convergent and divergent validity, as well as the internal structure of the measure is located within the test manual (Pfeiffer & Jarosewich, 2003).
Consistent with prior experience using the GRS, we did not ask teachers to complete the 12 items on the Artistic Talent scale. The majority of traditional homeroom teachers often do not have adequate opportunity to observe evidence of (or potential for) ability in the arts and as such feel ill-equipped to rate students on this scale (Pfeiffer, 2013, 2015).
The GRS-E
The GRS-E is an interactive electronic form of the GRS-S developed for this study. The GRS-E is identical in visual presentation and item content to the GRS-S; it differs only in that it is distributed over a secure Internet connection, from a secure server, and completed electronically on a computer or other electronic device. The form was developed using a high-resolution scanned image of the GRS-S, and entry fields were applied using Adobe Acrobat X Pro.
Care was taken to ensure that the GRS-E remained easy to use, regardless of the teacher’s proficiency with computers. Response style for the GRS-E was intuitive; there was reinforcement to ensure the intended option was selected. The nine rating options for each item were made into buttons that, when selected, were filled in with a black dot. This allowed the teacher to see which option they had selected and confirm it was accurate. A demonstration of a completed item was included at the beginning of the form. We were consistent in following the latest guidelines for internet test development, and respectful of the new standards for educational and psychological testing in our creation of the GRS-E (AERA, APA, NCME, 2014; Birnbaum, 2004; International Test Comission (ITC) Guidelines, 2005).
Procedure
To manage, distribute, and collect the electronic form, the school’s interactive Blackboard website was used. The site required teachers to log in using their unique username and password, which ensured security and identified the individual who completed the form.
Students were randomly selected to be rated, counterbalancing administration of the two forms. Each teacher was assigned an equal number of GRS-S and GRS-E forms. Teachers were asked to complete the forms in one sitting and return them promptly. The completion of these forms occurred during the spring of 2013.
Results
Among the matched pairs, at the item level, less than 0.1% of the data were missing. A proportional substitution method was employed to account for the 16 missing items. For each subscale and for the entire sample, normality was examined. The sample was selected from whole classrooms; we expected the sample to include students across a wide range of levels of ability. Overall, tests of normality indicated that the distribution was normal (skewness and kurtosis <1).
Completion dates were requested on the forms; the completion dates for the electronic forms were able to be validated based on submission to the online portal. Forms were distributed at two separate intervals; teachers had access to only one version of the assessment for each student at a time. Teachers were requested to complete the assessments promptly upon receipt; however, in a few instances, it was necessary to remind some teachers to complete the forms, occasionally resulting in a latency period between the two administrations greater than the planned 2 weeks. The minimum amount of time for completion was 3 days (n = 5); the maximum amount of time that lapsed between the initial ratings and secondary ratings was 80 days (n = 2; M = 38, SD = 15.69).
Does the GRS-S Remain Reliable When Administered on a Computer?
Cronbach’s alpha coefficient was calculated to examine internal consistency for each of the five subscales included in the GRS-E: Intellectual Ability, Academic Ability, Creativity, Leadership, and Motivation. Each subscale obtained alphas in the excellent range (α > .98 for all five subscales). These results are very similar to the initial findings of Pfeiffer et al. (2008) where Intellectual Ability and Motivation were found to have internal consistencies of α = .99, and Academic Ability, Creativity, and Leadership achieved an alpha level of .98. This demonstrates that the measure does remain reliable when administered via a computer-based system. Alpha levels are displayed in Table 1.
Internal Consistency Statistics (Cronbach’s α).
Intraclass correlation coefficients (ICCs) were obtained across the two alternative versions of the assessment to evaluate the test–retest reliability. The ICCs for Intellectual Ability, Academic Ability, and Motivation were identical across administrations. Results are displayed diagonally in Table 2. GRS-S correlations are on the top diagonal path; GRS-E correlations are on the bottom of the diagonal.
Test–Retest Reliability and Correlation Matrices of Subscales.
p < .01.
A Fisher’s Z conversion was conducted to assess whether the discrepancy in ratings between administrations on the Creativity and Leadership subscales was significant. The Z-scores obtained ranged from 0 to .74, and none of the scores indicated a significant difference between administrations. This demonstrates that there was little variation between the mean scores achieved on each subscale depending on administration, suggesting strong test–retest reliability.
These findings are consistent with prior research that reports high test–retest reliability for the GRS (Pfeiffer & Jarosewich, 2007). This result is particularly encouraging, considering the wide variation in time elapsed between administrations (from 3 days to nearly 3 months). ICCs for each subscale are provided in shaded, diagonal cells within Table 2.
Is the GRS-E Quantitatively Equivalent to the GRS-S?
Table 3 displays mean scores and standard deviations for each of the five subscales in both forms of administration. A paired-samples t-test was conducted to examine the difference in mean scores by administration mode. The results of the paired-samples t-test showed no significant differences between means at the resulting p = .05 significance level. T-scores are provided in Table 4; mean differences as a result of administration order are provided in Table 5. This finding indicates that quantitative equivalence exists between the computer-based and paper-and-pencil methods of administration for the GRS.
Mean Scores and Standard Deviations for GRS by Method of Administration.
Note. GRS = Gifted Rating Scales.
Difference in Mean Scores and Correlations Between Matched Pairs on the GRS.
Note. GRS = Gifted Rating Scales.
df = 183.
df = 184.
Mean Differences as a Result of Administration Order.
p < .01.
Is There Qualitative Equivalence Between the GRS-E and GRS-S?
Correlations between the paper-and-pencil versions of subscales and the electronic versions of subscales were obtained and are displayed in Table 2. Comparisons of the correlations representing the association between a pair of subscales administered in the electronic form versus the paper-and-pencil form were examined using Fisher’s Z conversion. Findings were insignificant, supporting the subscales’ qualitative equivalence. Z-scores obtained ranged from 0 to .81 across subscales.
Paired-sample correlations were obtained for each measure. Strong correlations between methods of administration were found. Academic Ability correlated most strongly between methods of administration (r = .89), followed by Intellectual Ability (r = .88), Motivation (r = .84), Leadership (r = .77), and Creativity (r = .76).
Discussion
The present study explored whether computer-based administration of a teacher rating scale designed to identify gifted students had an impact on the measure’s internal consistency and validity. A total of 185 matched administrations of this measure were completed by 19 teachers on students in kindergarten through fifth grades and of varying ethnic backgrounds and ability levels. We modeled the methodology after a study by Brock et al. (2012) on test equivalence that appears in Assessment. This study demonstrated that it is possible to administer the GRS-S using a computer-based testing environment and obtain results comparable with those obtained through a traditional paper-and-pencil administration. The internal consistency ratings achieved by the GRS-E were excellent. These findings support previous research demonstrating that the GRS has exceptionally high internal consistency (Li et al., 2009; Pfeiffer & Jarosewich, 2007; Pfeiffer & Petscher, 2008; Rosado et al., 2013).
Strong evidence was found supporting the equivalence of the measures across methods of administration. No significant difference was exhibited between the mean scores achieved on the Intellectual Ability, Academic Ability, Creativity, Leadership, or Motivation subscales. The correlation coefficients between each administration of each subscale were strong, ranging from r = .76 through r = .89.
The two subscales that had the least strong correlations with their administrations on alternate forms were Creativity (.76) and Leadership (.77). These correlations are still strong, although less robust than the others. One possible explanation is that teachers may have less opportunity to observe behaviors indicative of creativity and leadership, leading to less precision in the ratings of behaviors on these two scales. Future investigators will want to corroborate whether Creativity and Leadership remain less strong across the alternate form and, if so, examine possible explanations.
Institution of an electronic form of the GRS would make the screening more accessible to many evaluators, easier to track, and more environmentally conscious (Brock et al., 2012; Naglieri et al., 2004). Teachers would easily be able to complete the form and transmit the results to school psychologists, who would be able to view T-scores and item responses in seconds. The electronic delivery makes it easier to request that teachers complete multiple forms, supporting screening for giftedness (Pfeiffer, 2013, 2015). There is the potential for development of standardized interpretive reports, which would be made available as quickly as the online quantitative results. Reports could be distributed to school psychologists and gifted placement teams (Coyne & Bartram, 2006; Naglieri et al., 2004).
Computer-based administration and Internet distribution of a measure also offers advantages for research (Brock et al., 2012; Naglieri et al., 2004). The cost of printing and mailing disappears, and instead small costs of Internet hosting are incurred. Environmental impact significantly reduced. Distribution of the assessment is no longer limited by region, and as such, it is far more feasible to distribute the measure to diverse ethnic and cultural populations. Local norms can easily and quickly be established, a value-added element in gifted identification (Lohman, 2006; Pfeiffer, 2013, 2015).
Areas for Improvement
One area in which this study could have improved is reducing the variation in latency between administrations of the assessments in the alternative formats. Ideally, the response window would be 2 weeks between administrations, offering enough time so that teachers no longer had a clear recollection of their previous responses but still be a short enough window where the impact of events would not alter the teacher’s ratings of the student’s behaviors.
Future Developments
Our methodology intentionally modeled a recently published study that appeared in the journal Assessment; we felt that this methodology was a “gold standard” for evaluating equivalence (Brock et al., 2012). The present study invites several opportunities for future research and development of the GRS-E. While evidence has been established in support of the reliability and validity of an electronic form of the GRS, additional validation can be sought through confirmatory factor analysis (CFA). Following the work of other scholars (i.e., Brock et al., 2012), we opted to not explore CFA in the present article. In a future manuscript, we plan to further examine equivalence through exploration of factor structure.
Computer-based assessments are scored automatically when the data are entered by the user. This means that the risk of scoring-related errors is drastically reduced. In the specific case of the GRS, total scores for each subscale were automatically tallied, removing the chance of an error in addition. The result of this is that there are 78 total fewer places for errors in scoring, in addition to the time it takes to score the GRS being eliminated. Each total score could have been automatically converted to a T-score and percentile rank, removing the possibility of 12 additional errors of conversion.
With the creation of an electronic form of the GRS, many clinical and research opportunities arise. Future studies could explore the difference in administration time, teacher willingness, and satisfaction in completing an electronic form as opposed to a paper form, the availability of choice of administration method, the development of a secure storage system for scores, summary report integration, and wider use as a screening tool. A number of countries are looking for inexpensive and reliable ways to identify their most talented youth. An electronic platform for administering and scoring a teacher gifted rating scale such as the GRS holds promise for wider access to gifted programs by high-ability students (Dixon, 2013; Pfeiffer, 2013).
Footnotes
Acknowledgements
We appreciate the support of Dr. Lynn Wicker, Director of the Florida State University Schools (FSUS), Eileen Lerner, Gifted Coordinator, FSUS, and the teachers of Florida State University Lab School. We would not have been able to conduct this research without their enthusiastic support.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The second author of this article is lead author of the GRS. The GRS is published by Pearson Assessment, and the second author receives royalties for sales of the test.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
