Abstract

Hills, J. R. (1983). Interpreting grade-equivalent scores. Educational Measurement: Issues and Practice, 2, 15 & 21.
Many speech-language and academic test manuals provide grade-equivalent scores in addition to standard scores, percentiles, or stanines. A number of test publishers caution against the use of GE scores but continue to include them in the data. Hills provided explanations for why the use of GE scores are problematic and should not be used to describe students’ performance on standardized tests.
Interpreting GE Scores
If a sixth-grade student, Kelly, obtains a GE of 9.2 in reading, what can be said about this student? Kelly is scoring above average for sixth-grade readers, but this score does not indicate that the she reads as well as ninth graders in the second month of school. A student can obtain a high GE score without being able to do the work of students at the grade level indicated by the score. Kelly may have gotten the 9.2 score by getting all items that were designed for Grades 4, 5, 6, and 7 correct and may not have done well on items designed for Grades 8 and 9. Furthermore, Kelly’s score of 9.2 does not mean that a group of ninth graders were tested on the ninth-grade reading material and received equivalent scores. Often GE scores are obtained by extrapolation; it is possible that no ninth-grade student was ever tested with the test administered to Kelly. Hence, one cannot conclude that Kelly would be able to participate effectively with ninth graders. The 9.2 GE reading score does not indicate whether Kelly has mastered ninth-grade skills.
Interpreting GE Differences in Subject Matter
If Kelly obtained a GE score of 7.3 in arithmetic on the same test battery, one cannot conclude that in reading, she is 2 years ahead of her math performance. The standard deviations of GE scores vary from one subject to another. Kelly’s score of 9.2 on reading and 7.3 on math could be equal scores if one used percentiles or a standard score. The difference in the two GE scores may be due to the fact that students tend to differ less within a grade on math than on reading. In addition, GE scores above a student’s grade do not mean that she has really mastered skills beyond her own grade level. Because standard deviations for different subjects differ, we cannot tell whether 9.2 in reading is relatively better than 7.3 in math.
Interpreting Pre–Post GE Differences
If Kelly received a GE score of 9.2 in the fall, but 8.0 in the spring, we cannot conclude that her reading skills declined during the year. When GE scores are extrapolated far above or below a student’s grade level, a single additional correct or incorrect response can change a student’s GE score by more than a year. Kelly may simply have gotten one or two fewer items correct in the spring.
Interpreting GE Scores for a Class
A GE score is based on a mean. One should not expect all students to be at or above grade levels on GE scores. In a typical class, about half would obtain GE scores below grade level and half above grade level. Hence, if 30% of a fifth-grade class had GE scores below 5.1 at the beginning of the year, one cannot conclude that they need remediation or that the third- and fourth-grade teachers are doing something wrong. In fact, because only 30% of the fifth graders were below grade level, the students in this fifth grade were doing a little better than usual.
Interpreting GE Scores Over Time
Another peculiar characteristic of GE scores is that the standard deviations get larger year by year. Suppose that a person is at the 16th percentile (which is one standard deviation below the mean), this same percentile is translated into a lower GE each year because the standard deviation gets larger from year to year. This gives the impression that a person is falling farther behind each year. Similarly, if a student is above the mean and stays at the same relative position, she appears to get farther ahead every year in terms of GE scores.
Conclusion
Based on our knowledge of how GE scores are determined and how they should be interpreted, I strongly recommend not including them in student evaluation reports. Such scores provide misleading information to parents and teachers. They do not provide evidence regarding students’ language/literacy strengths and weaknesses, and in fact, could mislead educators regarding the intervention materials that can or should be used with students. Percentile ranks provide more accurate information on how students compare with one another. But even percentile ranks on academic assessments do not provide evidence regarding what the students did or did not do to achieve those percentile ranks.
