Abstract
This study investigates the psychometric characteristics of Gordon’s Advanced Measures of Music Audiation (AMMA) in a region with strong non-Western music tradition. It also examines the possibility of measuring audiation with the modern psychometric theory. The AMMA test was administered to 513 students in the city of Ioannina and a number of villages in the region of Epirus in northwestern Greece. Nonlinear factor analysis based on tetrachoric correlation coefficients confirmed a tone and rhythm structure in AMMA according to the theory of Gordon. Cronbach’s alpha coefficients for the tone and rhythm factor scores were .70 and .61 correspondingly. The Kuder and Richardson’s (KR-20) reliability coefficient for the 30 items was .55. A Rasch measurement model has a good fit. The analysis of the Rasch residuals has showed that the dimensions of AMMA do not distort the estimation of Rasch parameters. Further analysis of the 30 AMMA items has shown that they can be ordered in 10 levels of difficulty. The authors present items’ difficulty and persons’ level of audiation on the same interval scale and discuss the usefulness of the music ability tests that are based on aural stimuli.
Introduction
This study examines the psychometric characteristics of the Advanced Measures of Music Audiation (AMMA) in Epirus, a mountainous region in northwestern Greece. The AMMA is a test formed by the American music educator Edwin Gordon (1985) to measure music aptitude of college students, high school students, and junior high students. By exploring its dimensionality in a region with strong non-Western music tradition we are essentially assessing the degree to which this test conforms to the theory of its author. Two statistical methods have been used in our study: nonlinear factor analysis, a statistical method based on the “true score” measurement theory, and Rasch analysis, a method based on the “latent trait” measurement theory.
According to Schellenberg and Weiss (2013, p. 499) music aptitude “refers to natural music abilities or the innate potential to succeed as a musician”. In Gordon’s theory, a measure of music aptitude is the extent to which a person retains the elements of another psychological construct that Gordon has called “audiation” (Cutietta, 1991; Gordon, 1999; Walters, 2010). With this neologism the American educator refers to the “non-aural sensory perception and processing of musical experiences” (Gordon, 1988, p. 34). Audiation is therefore a sense. It is not a skill like music imagery, for example (see Bailes, 2002), or Zoltán Kodály’s inner hearing (see Houlahan & Tacka, 2015). Audiation consists of rhythm and tonal elements, which are the equivalent of the phonemes and their structure in spoken language. “Sound becomes music”, writes Gordon, “when, as with language, we translate the sounds in our mind to give them context” (Gordon, 1999, p. 423).
If, however, music aptitude is based on a syntactic structure, as Gordon proposes, an interesting question would be how validly does a test of music aptitude based on aural stimuli function in a different cultural setting? This is because most of the items of AMMA are nearer to a Western musical style (Runfola & Swanwick, 2002) and test takers’ familiarity with the Western rhythms and tones can affect their answers, according to Cutietta (1991). The first research question of the current study is therefore whether AMMA conforms to the theory of Gordon in a region with strong non-Western musical tradition. If the theoretical elements of AMMA can be confirmed in such a region, they can most probably be confirmed in other parts of the world as well. The second research question of the current study is whether audiation can be measured on an interval scale. Audiation is originally measured on an ordinal scale. This scale allows for the test takes to be sorted according to their achievement, but it does not allow for the relative degree of difference between them. If we were able to prove that audiation can be measured on an interval scale, a much larger number of research questions could be examined. For example, “how is the distribution of audiation compared with the distribution of items’ difficulty?” And “are the items of AMMA equally useful in measuring audiation in different parts of the world?”
The region in which the current study has been carried out is Epirus in northwestern Greece. The history and the main element of the music of Epirus have been described in Broughton and Ellingham’s (1999) book on world music and also by Samson (2013). Kaliakatsos-Papakostas, Zacharakis, Tsougras, and Cambouropoulos (2014) have presented a computational analysis of the music of this region. The influence of the traditional music of Epirus on the modern Greek discography can be seen in the songs of a Greek rock band with the name Socrates (Spathas, Tourkogiorgis, & Trantalides, 1976, track 8), the recordings of the Greek jazz bands Mode Plagal (Rellos et al., 1998, track 7), and the songs of contemporary Greek artful musicians such as Thanasis Papakonstantinou (1995, track 1). The ethnographical article of Petrusich (2014) in the New York Times and the doctoral thesis of Messoloras (2008) support the view that the traditional music of Epirus is alive in the region even today. Messoloras notes that only lately has “the West” crept into local traditional musicians’ singing and playing by arguing that “Western harmonization has begun to appear in their songs and instrumental accompaniments” (2008, p. 11).
Literature review
Music aptitude tests based on aural stimuli
Music aptitude is an elusive construct. There is no agreement among music educators and psychologists as to what constitutes it or how to measure it. In their chapter on the relation between music and other cognitive abilities Schellenberg and Weiss (2013) write that although the debate about the existence of music talent or aptitude is beyond their scope, for the purposes of their chapter they assume that “music aptitude exists, it varies among individuals, and it is something that tests of music aptitude measure” (2013, pp. 499–500). In her review of the latest related theories, Hallam (2010) discusses the impact of practice and creativity in the development of music aptitude. She also emphasizes the effect of sight-reading and other motor skills that are basic for music performance.
Rather than broadening our discussion to the existence and the nature of music aptitude we shall focus on the tests that have been used to measure it. Such tests, like the ones created by Carl Seashore (1919) in America and Herbert Wing (1941, 1948) in Europe, are the precursors of the AMMA test that we examine in our paper. These 20th-century tests are based on participants’ reactions to aural stimuli and have been influenced by the 19th-century quantitative studies of mental testing (Humphreys, 1998).
In the early years of the 20th century, Carl Seashore, an American psychologist of Swedish origin, created the first test of music aptitude. He also invented the “tonoscope”, a mechanical instrument that produced the necessary aural stimuli through a set of tuning forks (Cary, 1923). Comeau (2009) describes how in Seashore’s (1919) original trials the participants listened to high and low pitches, strong or weak sounds, long and short sounds and also sounds of different quality. A rhythm dimension, added later, consisted of 30 pairs of rhythmic patterns. The meters in these items, were 2/4, 3/4, and 4/4.
A few decades later, British psychologist Herbert Wing (1941, 1948) formed his own test of “musical ability and appreciation”, as he initially called it. Wing’s test consisted of pairs of music phrases played by string instruments. Wing’s test, renamed later as “test of musical aptitude” (Ginsborg, 2014), was available by the 1960s in 10 long-play records (Apel, 1972). With the help of factor analysis, Wing (1941) theorized that a general factor of musical ability explained the variance in all the subsets of its test. The dimensions of musical ability in Wing’s test were “chord analysis”, “paired chords” and “paired melodies” (Ginsborg, 2014).
Gordon followed the tradition of the two aforementioned researchers. By 1965 he had created his own sounds for his germinal work, the Musical Aptitude Profile (MAP) (Gordon, 1965). The original MAP test was divided into three subsets: “tonal imagery”, “rhythm imagery” and “musical sensitivity”. Comeau (2009) describes how tonal imagery was further divided by Gordon into a “melody” and a “harmony” division. Scores in the Musical Aptitude Profile were reported either for the entire test or for each division and subdivision (Walters, 2010). Gordon’s subsequent tests, namely the Primary Measures of Music Audiation (1979), the Intermediate Measures of Music Audiation (1982), and the Advanced Measures of Music Audiation (1985) have been based on his original Music Aptitude Profile test.
Scoring the Advanced Measures of Music Audiation
In the case of Advanced Measures of Music Audiation, 30 pairs of short music phrases are played to participants. These pairs, henceforth “items”, comprise an initial music statement and a music answer (see the Appendix, available in Supplementary Materials online, for a transcription of the music phrases). Some of the music answers differ in their duration, meter or tempo. Some others differ in pitch, mode or the tonal center of the melodic line. After hearing each item (pair), the participants are asked to note if there is a tonal or rhythm difference between the phrases. The test yields a “tonal score”, a “rhythm score” and a “composite score”. The tonal score is calculated by giving one point for every correct answer on the 10 items that contain a tonal difference. One point is then subtracted for each wrongfully considered tonal difference. A constant of 20 is added to that score in order for the negative numbers to be avoided. The same algorithm is used for the rhythm score. The tonal score and the rhythm score can thus be a number between 0 and 40. Their sum, the so-called “composite score”, can be any number between 0 and 80.
Both the composite and the partial scores of AMMA are reported either in their raw form or as percentiles that show the relative standing of an individual who has taken the test with regards to the population. The tonal and rhythm scores can be reported separately (see Harrison, 1987a, 1987b), an indication that in the theory of Gordon the tonal and rhythm elements of audiation are supposed to be uncorrelated. These percentiles have been proved useful in the past for comparing the “norms” of music aptitude between persons with different exposure to the Western musical tradition (see Özeke & Humphreys, 2007; Stamou, Schmidt, & Humphreys, 2010).
The validity of music aptitude tests
The assessment of the validity of music aptitude tests is an issue that has been reviewed by Schellenberg and Weiss (2013) in the third edition of the Psychology of Music edited by Diana Deutsch. The authors cite a number of studies in which the validity of the tests is examined by assessing the associations between music aptitude scores and teachers or parents’ subjective ratings of the participants’ “musicality”, i.e. the ability “to enjoy music aesthetically”, according to Revesz’ definition of the term (1953, cited in Hallam, 2010, p. 309). Schellenberg and Weiss (2013) have reported positive correlations among musicality and music aptitude tests scores but only small to moderate in size. The authors explain that the small sizes of the coefficients are mainly due to the nature of the items used. They stress that “aptitude tests present participants with short auditory sequences that are a far cry from actual musical pieces” (2013, p. 501).
Another method to assess construct validity of an instrument is through exploring its dimensions (Gessaroli & de Champlain, 2005). In one of his own papers, Gordon (1986) himself used factor analysis to explore the validity of all of his music aptitude tests that were available at that time. Specifically, he calculated Pearson’s correlation coefficients for the final scores of every possible pair of the subsets of his Music Aptitude Profile test, the Primary Measures of Music Audiation (PMMA), and the Intermediate Measures of Music Audiation (IMMA) (Gordon, 1986). The arithmetic means of the raw scores of each individual on each subset of the aforementioned tests were on interval scales and thus appropriate to be factor analyzed. After extracting the common factors with the Principal Factor algorithm and rotating the axes a couple of times with the Varimax algorithm (see Saunders, 1962, cited in Mulaik, 2009, p. 311), Gordon (1986) was able to confirm his theory of the so-called “stabilized” and “developmental” audiation. He concluded that the factor that loaded more strongly on the items of MAP couldn’t be other than the stabilized music aptitude. Similarly, the factor that loaded more strongly on the subsets of the PMMA and the IMMA couldn’t be other than the developmental music aptitude. In the same paper Gordon (1986) attributed the marginal loadings on the rhythm subset of IMMA on items’ difficulty. He explained that the rhythm items in IMMA differed in meter whereas rhythm items in PMMA did not.
Method
During the first months of the 2009–2010 academic year, we administered AMMA to the students of the gymnasia (junior high schools) and the lycea (higher secondary schools) of Ioannina and the nearby villages (see Table 1). The city of Ioannina is located 450 km northwest of Athens (39N66’ 20E85’), at the heart of Epirus region. In each school that we visited the students were examined for 45 minutes in conditions very similar to their normal final examinations. The 30 pairs of music phrases of the original AMMA compact disc were reproduced in the classrooms through a compact disc player. Ethical issues, such as a participant’s informed consent and their right to withdraw, were safeguarded.
The schools and the number of participants.
Note. Gymnasia are junior high schools. Lycea are higher secondary schools.
In order to examine the validity of AMMA in the region in which our study took place, we factor analyzed the raw data. A factor analytic model is similar to a multiple linear regression model with the difference that in the former the researchers make decisions concerning the number of the common factors and their loadings. In line with the theoretical framework of Gordon, we sought two factors. With only partial theoretical knowledge as to what the real factor pattern in our study was, a semi-specified factor solution (see Browne, 1972) was preferred to a pure confirmatory factor solution. We thus specified an orthogonal rotation matrix, so as each one of the 20 AMMA items that measured either of tone or rhythm element to have near zero loadings to one factor and allowing the loading of the other factor to vary freely. Factor analysis answers to our first research question. If AMMA conforms to the theory of Gordon in a region with strong non-Western musical tradition, a “Rhythm” and a “Tone” factor should load on the corresponding rhythm and tone items of this test.
For this initial analysis we used the Factor statistical software (see Lorenzo-Seva & Ferrando, 2006, 2013) that offers the option of Parallel Analysis, a method for factor extraction for which Thompson (2004) writes that it is “among the best methods for deciding how many factors to extract or retain” (p. 34). Since participants’ answers on the items of AMMA can be either correct or incorrect, we used tetrachoric correlation matrix for our factor analysis (see Bonett & Price, 2005) and not the typical Pearson’s correlation matrix. The tetrachoric correlation is the cosine of the quantity
After factor analysis we conducted Rasch analysis with the same data in order to examine how the distribution of persons’ degree of audiation is compared with the distribution of items’ difficulty. This analysis thus answered our second research question: “Are the items of AMMA equally useful for measuring audiation?” Rasch analysis follows the same linear logic with factor analysis with the difference that in the former the latent entities are not “tone” and “rhythm” but participants’ ability and the degree of items’ difficulty. It has been shown that in the case that nonlinear factor analysis is employed, different parameterizations of its parameters can be transformed into Rasch model parameters (Kamata & Bauer, 2008).
An important characteristic of any test is their reliability, i.e. the consistency and accuracy of their measurement. In a study similar to ours, Stamou et al. (2010) measured the reliability of Gordon’s Primary Measures of Music Audiation (PMMA) and compared the distributions of total and partial scores of PMMA for different subgroups of Greek students. Following the example of Gordon (1986), Stamou et al., (2010) did not use the typical Cronbach’s alpha reliability coefficient (see Trobia, 2008) but the “split-half” method (see Ziniel, 2008). With this method Stamou et al. (2010) reported coefficients from r = .44 to r = .88. The meaning of these coefficients will be discussed later.
Factor analysis and Rasch measurement differ in the assessment of dimensionality, i.e. the number of dimensions of a construct. A special form of dimensionality is “unidimensionality”, i.e. the condition in which a theoretical concept is possible to be represented by a single number. Under the assumption of unidimensionality, Rasch models are thought to be “superior” to the factor analytic ones (Wright, 1977) because the distribution of persons’ abilities in a Rasch model does not depend on the distribution of items’ difficulties (Engelhard, 2013). In Rasch models, persons’ ability and items’ difficulty can be presented in the same interval scale (Iramaneerat, Smith, & Smith, 2008). This means that under the condition of unidimensionality, the distribution of participants’ ability in our study can be plotted across the different items of AMMA. In factor analysis the investigation of the common dimensions is part of the analysis itself. In Rasch analysis the assessment of dimensionality is complicated.
Results
Findings from the nonlinear factor analysis
By examining the coefficients in Table 2 we can conclude that on the basis of the nonlinear factor analysis, the Advanced Measures of Music Audiation conform to the theory of Gordon. Specifically, it was found that two uncorrelated factors load on the rhythm and tone items of AMMA. A “Rhythm” factor loads mainly on original rhythm items like 3, 8, 15, and 20. A “Tone” factor loads mainly on original tone items like 1, 5, 12, and 23. Some items, like item number 4, do not fit well in this solution. However, we can’t exclude these items from the analysis because by doing so we would change Gordon’s test. Before the main analysis we checked the diagnostics. The Bartlett’s statistic of sphericity (Dziuban & Shirkey, 1974) implied that the initial correlation matrix was acceptable for analysis, χ2(190) = 637.4, p < .000. The Kaiser-Meyer-Olkin (KMO) test of sample adequacy (Kaiser, 1970) was 0.604, over the 0.50 threshold (Hair, Anderson, Tatham, & Black, 1995). The scree test suggested the extraction of two factors with eigenvalues 2.50 and 1.05. The Goodness of Fit index of our model was (GFI) was 0.91, slightly over the minimum acceptable level of 0.90 (see Shevlin & Miles, 1998).
The two factor solution.
Cronbach’s reliability coefficients alpha for the “Tone” and “Rhythm” factors in our study were a = .70 and a = .61, correspondingly, indicating that a large percentage in the variability in our measurements can be attributed to the variability of audiation and not to errors in measurement. Kuder and Richardson’s (1937) formula 20 (KR-20) reliability coefficient for the 30 items of the test was .55. For the 10 tone items the KR-20 was .31 and for the 10 rhythm items KR-20 = .33. The values of the KR-20 coefficients are influenced by the measurement error, the small number of tone and rhythm items, and the scale of measurement. The importance in this analysis, however, is not the internal consistency of the original test items but the number of its dimensions. We want our test to be unidimensional in order to use the modern measurement theory. A high reliability coefficient does not necessarily imply unidimensionality and lack of unidimensionality does not necessarily imply a low reliability coefficient (Cortina, 1993; Green, Lissitz, & Mulaik, 1977; Sijtsma, 2009). The issue of unidimensionality will be discussed later.
Fitting the Rasch model
For the needs of the Rasch analysis we used the Winsteps software (Linacre, 2006). The results are presented in Table 3. In the second column of Table 3 we report the number of correct answers for each of the 30 items of AMMA. On average, each item was answered correctly by 276.3 participants (SE 0.16), ranging from only 122 correct answers for item 24 (most difficult) to 462 correct answers for item 10 (least difficult). The main idea in this Rasch solution is that the probability (pij) of person i to answer item j correctly is determined by his or her ability (theta) θi and item’s difficulty (delta) δj. Statisticians use the logit transformation or that probability and the Rasch model is expressed as
The Rasch model specification table.
In Table 3 we also present evidence for the fit of the data to the logic of Rasch measurement. Measurements with large squared deviations from their predicted values indicate inconsistent responses. The fit of the Rasch model is assessed by two measures for both items and person: the “outfit” statistics, that give emphasis on unexpected responses far from an item’s or a person’s measure, and the “infit” statistics, that place more emphasis on unexpected responses near a person’s or item’s measure (Wright, 1984). When these statistics are distributed as mean-squares, their expected value is 1.0. Mean square values substantially less than 1.0 (overfit) indicate determinacy or dependency in the data. Values substantially greater than 1.0 (underfit) indicate inconsistent responses or “noise” (Linacre, 2002). Infit and outfit statistics are reported in Table 3 as “mean square statistics” (MNSQ) and “standardized statistics” (ZSTD) (see Wright & Panchapakesan, 1969). In our analysis the infit and outfit mean squares were within the 0.75 to 1.2 interval, suggested for samples between 500 and 1,000 persons by R. Smith, Schumacker, and Bush (1995, cited in Iramaneerat et al., 2008, p. 1940).
The quality of AMMA items
A characteristic of Rasch measurement is that participants’ level of ability and items’ level of difficulty can be presented on the same interval (logarithmic) scale. This is shown in Figure 1, on the left hand side of which we can see persons’ ability ranging from -2.6 logits to +2.8 logits. Each hash mark (#) represents 5 participants and each dot (.) represents 1 to 4 participants. On the right hand side of Figure 1 we can see the items in terms of their difficulty. Items that are frequently answered correctly are on the bottom of Figure 1. As the difficulty increases, the correct answers are increasingly rare.

A Wright map. Persons’ ability and items’ difficulty on the same interval scale.
By examining the variable map in Figure 1 we can see that item number 10 has been answered correctly essentially by all the participants in the sample and therefore it provides little information to the measurement. From a musical perspective the 10th item contains a difference in its rhythmic structure. The meter in the first phrase of item 10 is in an asymmetrical 5/8, whereas in the second phrase the rhythm is a simple 2/4. This difference can be easily identified by the participants. The next easier item is item number 20. The melody in the 20th item is in Lydian mode, a very familiar scale in Greece. Were items 10 and 20 replaced by other more difficult items, the gaps between them would be smaller. In opposition, the most difficult items in the battery were items 22 and 24. In item number 22 there is no difference between the two music phrases. The melody however is long and modal and has a very narrow range and as a result many participants answered incorrectly. In item number 24 there is also no difference between the two phrases. However, the fact that the first six notes (in a succession of 12) lack a tonal center, makes item number 24 very difficult to answer correctly.
A measure of the quality of AMMA’s items is how well they differentiate between participants. For this purpose we have calculated the separation index G. The G index is a statistic based on the KR-20 reliability coefficient but with the advantage that it is on a ratio scale and can range from 0 to infinity (Iramaneerat et al., 2008). E. Smith (2001) suggests a minimum of two levels of items’ difficulty level in order for them to be representative of the assessed content. We also calculated the number of strata of items and persons. The number of strata is a refinement of the previously mentioned separation index G with the difference that in this case the very high and very low measures are considered to be additional levels of performance (Linacre, 2013).
The high item reliability index of .99 for items, the corresponding item separation index G of 8.31 and the number of items’ strata
The unidimensionality hypothesis and the Rasch model
The simple additive logic and the desired characteristics of the Rasch measurement that we discussed in the previous section come with the assumption of unidimensionality, i.e. that the test measures only one theoretical construct. However, as we saw in the factor analysis of the raw scores, the Advanced Measures of Music Audiation measure two dimensions: rhythm and tone. Therefore we need to make sure that this structure does not pose a threat to the validity of the Rasch model. The methods for assessing unidimensionality in a Rasch model can be complicated. For a start, the fit statistics that we presented in Table 3 are unlikely to detect departures from unidimensionality for reasons that Christensen, Engelhard, and Salzberger (2012) explain. To deal with this problem Wright (cited in Linacre, 2015) advises researchers to conduct a Principal Component Analysis to the Rasch model residuals, then split the test into two halves by assigning to each half the items’ top versus the bottom items of the first component, then measure the persons’ scores on these two halves and cross-plot these measures. We have proceeded with this kind of analysis.
The total variance (100%) in our Rasch model is presented in Figure 2. The “T”, the “total” variance, equals 36.6 eigenvalue units. The variance explained by our Rasch model (M) is only 6.6 units (18.1% of the total). From that explained variance, 1.6 units (4.3%) are explained by persons’ abilities (P) and 5 units (13.8%) are explained by item difficulties (I). The unexplained variance of the model (U) is 30 units (81.9%). What can this amount of unexplained variance mean? McGill (2009) notes the lack of consensus among statisticians for an acceptable level of unexplained variance in Rasch modeling, i.e. a level that would definitely imply unidimensionality. McGill (2009) lists different figures of acceptable explained variance, such as 50% (Carmines & Zeller, 1979), 40% (Linacre, 2008), and even as low as 20% (Reckase, 1979).

Standardized Rasch residuals and variance scree plot of AMMA items.
The explained variance in our model may be less than 20%, but what would constitute a threat to its unidimensionality would be a systematic structure in its unexplained part. This would be an indicator that another component, other than the Rasch measure, is affecting participants’ answers. We therefore examined the variance in the residuals. The numbers in the right hand side of Figure 2 show a scree test of the standardized residuals in the unexplained variance. In eigenvalue units the variance of the first contrast is 2.3, which is the strength of approximately two items. The second contrast has an eigenvalue of 1.8, which is less than two items. Since we need at least two items for a contrast to be ruled as dimension, we will examine the structure of Rasch residuals for the first contrast.
In Figure 3 we present the unstandardized loadings of specific AMMA items with the first contrast. Rhythm items, like 23, 27, 7, 19, 1, 24, 2, and 25, have the higher positive loadings to the first contrast. Tone items, like 10, 21, 26, 17, 3, 11, 15, and 5, have the higher negative loadings on the first contrast. After conducting separate Rasch analyses for the two clusters of items, we calculated the correlation between them. The Person coefficient was positive but small (r = .25, p < .00). The percentage of common variance between the two cluster was around 6% (r2 = .625).

AMMA items’ loadings on their first Rasch contrast.
Does the above finding mean that the use of Rash model for analyzing our results is erroneous? A response to this question could be to paraphrase E. Smith (2002) and state that “rather than asking if a test is ‘unidimensional or not’ we should ask ‘at what point on the continuum does multidimensionality threaten the interpretation of item and person estimates’” (p. 206). Reise, Cook, and Moore (2015, p. 13) similarly write that in Rasch modeling “the critical issue is not if a test is ‘unidimensional enough’ but rather the degree to which its dimensionality impacts or distorts the estimation of Rasch parameters”. Does the psychometric structure of AMMA distort the estimation of Rasch parameters? In our opinion this is not the case. The small but positive Pearson coefficient r = .25 is an indication that the two clusters of items measure roughly the same thing. Audiation therefore can be regarded as “unidimensional enough” to be measured on an interval scale. Gordon himself regarded audiation as a whole ability even when he published the scores for the tone and rhythm items separately.
Discussion
The purpose of our study was to explore the dimensions of the Advanced Measures of Music Audiation in a region with strong non-Western music tradition. Dimensionality assessment is actually one of the fundamental issues in Psychometrics (Gessaroli & de Champlain, 2005) and very important for the development of psychological theories (Debelak & Tran (2013). The region in which we conducted our study is a mountainous area in northwestern Greece. Although the penetration of the musical tradition of this region in the musical preferences of its residents is not known, we can say with confidence that the people of Epirus are familiar with their pentatonic and polyphonic tradition. These melodies are still sung and danced in many local social gatherings and are also taught in local schools. We have evidence to argue that even in a region with strong non-Western music tradition, the rhythm and the tone dimensions of AMMA are significant from a statistical point of view. This adds credibility to Gordon’s theory because if the two theoretical elements of his central notion of audiation can be confirmed in a non-Western culture, they can most probably be confirmed in other parts of the world as well.
The second finding of our study has been that audiation can be measured according to the simple additive logic of a Rasch model. The Advanced Measures of Music Audiation have been developed in the traditional “true score” theory, according to which every measurement is the sum of a person’s ability and a random error (Ward, 2010). Although Waugh and Chapman (2005) have warned that inferences from a Rasch analysis from items originally developed in a true score logic are not straightforward, we have evidence to support the idea that the items of AMMA have a good fit to the Rasch model and that AMMA is “unidimensional enough” to be analyzed according to the modern measurement theory. This is the theory of the international comparisons of student achievement (Wendt, Bos, & Goy, 2011) and audiation could practically be one of them. This brings us to the third point of our discussion that has to do with the worthiness of measuring music abilities with tests such as the Advanced Measures of Music Audiation.
Many important music educators, such as Sue Hallam (2010), are critical of the idea that music ability should be measured with 20th-century tests of music abilities. We tend to agree with Hallam’s view. Gordon’s test, however, does not measure music ability directly. His theory is, rather, a theory that parallelizes music and language, a topic broadly discussed in the literature on music perception (see Jakendoff, 2009; McMullen & Saffran, 2004). In the theory of Gordon the notion of audiation is only an intermediate concept for music ability and it is with the measurement of this particular concept that lines between music and language are drawn. The American music educator developed his theoretical interests from an initial exploration of young persons’ music aptitude profiles to the measurement of the concept that he finally formulated. It is the validity, the reliability, and the level of measurement of this particular psychological construct that the current study has investigated. By examining the psychometric characteristics of audiation in a region with strong non-Western culture, we hope that we have contributed to a broader interesting discussion. A transcription of the AMMA items can be seen in the Appendix (available in Supplementary Materials online). All research materials and data files are available to other researchers by directly conducting the authors of the current work.
Supplemental Material
Appendix_1 – Supplemental material for The psychometric characteristics of the Advanced Measures of Music Audiation in a region with strong non-Western music tradition
Supplemental material, Appendix_1 for The psychometric characteristics of the Advanced Measures of Music Audiation in a region with strong non-Western music tradition by Athanasios Verdis and Christina Sotiriou in International Journal of Music Education
Supplemental Material
Appedix_2 – Supplemental material for The psychometric characteristics of the Advanced Measures of Music Audiation in a region with strong non-Western music tradition
Supplemental material, Appedix_2 for The psychometric characteristics of the Advanced Measures of Music Audiation in a region with strong non-Western music tradition by Athanasios Verdis and Christina Sotiriou in International Journal of Music Education
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
