Abstract

On December 15, 2010, Dr. Stanley M. Sapon, the second author of the Modern Language Aptitude Test (Paper-and-Pencil Version, henceforth, the MLAT), passed away in New York. With the death of the first author, Dr. John B. Carroll, in 2003, the 53-year-old test has now lost both of its creators. Given that the single original form of the MLAT has remained unchanged since its first release in 1959, it is truly remarkable that the test is still widely used for both practical and research purposes (see below). This review presents the test in historical perspective, bridging its past, present, and future.
Test purpose: The MLAT measures ‘an individual’s ability to learn a foreign language’ (Carroll, Sapon, Reed, & Stansfield, 2010, p. 2). It targets English-speaking adults (over Grade 9) who are literate. The MLAT is a member of the aptitude test family published by the Language Learning and Testing Foundation, Inc. (LLTF). Other members include the British version of the MLAT and the Modern Language Aptitude Test-Elementary (MLAT-E; American and British English versions) for younger users (Grades 3–6).
Availability: The test has only one form, which has not changed since it was first published by the Psychological Corporation in 1959. In 1999, the copyrights of the entire aptitude test family were transferred to Second Language Testing, Inc. (SLTI) and then in 2004 to the Second Language Testing Foundation (SLTF, renamed LLTF in 2011), which has renewed the manual but not the content of the MLAT. In order to maintain test security, the test is open only to users to whom LLTF grants permission. As of 2011, ‘In the US, the largest groups of customers are the Foreign Service Institute, the World Bank, [and] missionary organizations,’ and ‘Outside the US, the main users are foreign governments (e.g. British, Australian, Canadian, and Singapore)’ (Charles Stansfield, President, personal communication, August 29, 2011).
Length and administration: The test can be taken either in short or in long form. The long form consists of Parts I to V, requires a CD player, and takes about 70 minutes to complete. The short form consists of Parts III to V, does not require a CD player, and takes about 30 minutes to complete. Users can score the test using a hand-scoring stencil. Information about the mean scores and percentile rankings of different types of learners is available in the manual.
Publisher: Language Learning and Testing Foundation, Inc., Maryland, http://www.2lti.com/home2.htm; Future, http://www.lltf.net
Price: The Test Kit including five Practice Exercise Sheets, five Answer Sheets, one Manual, one Test Booklet, one Scoring Stencil, and one MLAT Audio (CD) and costs US$95 (as of 2011).
Description: The MLAT consists of five subparts. Each subpart is intended to measure one or more of Carroll’s four components of foreign language aptitude (see below). Note, however, that ‘inductive language learning ability’ is not measured as strongly as the other three components in any of the subtests (Carroll, 1981). Descriptions of each subtest with examples from publicly available sample items follow:
From the
A B C D E
The MLAT from past to present
Development of the MLAT
John B. Carroll, an educational psychologist, started to develop the test in 1953 with five years of funding by the Carnegie Foundation (Carroll, 1981). Beginning in the 1920s, before Carroll launched this project, there had been several attempts to create a valid and reliable measure of foreign language aptitude, but these had been generally unsuccessful. Such tests were in strong demand by the US Army during and after World War II because the military needed to equip a large number of personnel with a functional command of foreign languages. Compared to such tests, screening individuals by first placing them on a short-term ‘trial course’ was ‘expensive and logistically troublesome’ (Carroll, 1981, p. 88). Thus, Carroll’s project was carried out in response to a governmental need.
During the five-year project, with the help of linguist Stanley M. Sapon (among others), Carroll examined several different types of tests measuring constructs potentially relevant to foreign language learning success in different learning situations (e.g. Carroll, 1962). The results of these investigations (based on correlation and factor analysis) led Carroll (e.g. 1981, p. 105) to propose the following four major components of foreign language aptitude, which became the main subscales of the MLAT:
Phonetic coding ability – the ability to identify distinct sounds, to form associations between those sounds and symbols representing them and to retain these associations;
Grammatical sensitivity – the ability to recognize the grammatical functions of words (or other linguistic entities) in sentence structures;
Rote learning ability for foreign language materials – the ability to learn associations between sounds and meanings rapidly and efficiently and to retain these associations; and
Inductive language learning ability – the ability to infer or induce the rules governing a set of language materials given samples of language materials that permit such inferences.
Carroll claimed that tests designed to measure these four components accounted for significant amounts of variance in participants’ foreign language achievement but that these four components were relatively independent of each other and distinct from general intelligence. These findings fed into the development of the prototype of the MLAT, which was later published by the Psychological Corporation in 1959.
Application of the MLAT to the present
When the MLAT was first created, the predominant theory in psychology in the United States was behaviorism, and the most popular second language (L2) teaching method was the audio-lingual method (e.g. Fries, 1945). From the 1960s, following the ‘cognitive revolution’ (Myers, 2010, p. 5), cognitive psychology, best represented by the information-processing perspective, became the dominant paradigm, and in L2 education, communicative language teaching displaced the audio-lingual method. Yet, throughout these paradigm shifts, the MLAT has managed to maintain relatively high predictive power (r = .4 to .6) not only for situations similar to those in force when the test was originally created but also for those governed by newer theories and methods. That is, the MLAT has been found to predict L2 learning success relatively well for both formal and informal L2 learning across different skills (e.g. reading, speaking), for both content-based and form-focused instruction, for both audio-lingual and communicative instruction, and even for both implicit and explicit learning of an artificial language in a laboratory setting (Sawyer & Ranta, 2001).
Below, I offer three possible reasons for this remarkable longevity. First is the two authors’ foresight. For example, in his graduate days at the University of Minnesota, Carroll discontinued his studies under B. F. Skinner, a major advocate of behaviorism at that time, because he ‘was not interested in his particular approach to language learning’ (Stansfield & Reed, 2004, p. 47). With Skinner’s approval, Carroll instead wrote his PhD dissertation under L. L. Thurstone, a pioneering psychometrician who used factor analysis to identify different types of cognitive abilities (American Psychological Association, 2002). This approach, which was also used for developing the MLAT, is still used in aptitude studies today. The second author, Sapon, was an equally farsighted researcher in that he used an artificial language to measure language learning ability (Carroll, 1962), a method that is also still used to measure language aptitude (e.g. Grigorenko, Sternberg, & Ehrman, 2000). In fact, the cognitive approach adopted by both authors could be seen as a precursor to more recent information processing perspectives on L2 learning.
A second reason for the MLAT’s longevity is the fact that its development was guided by ample empirical data collected from different types of educational settings (high schools, universities, military schools) using different types of teaching methods (audio-lingualism, grammar-translation, artificial language learning, etc.; see Carroll, 1988) across different languages (e.g. French, German, Russian) in various parts of the USA. The fact that over the 53 years of its life the MLAT has maintained high predictive validity for these multiple settings involving a range of language and teaching methods is evidence that the test is a robust measure of important and fundamental aspects of language aptitude.
Third, the MLAT may well have survived for so long because of the relatively slow development of language aptitude research. Indeed in 1988, about 30 years after the MLAT’s first release, Carroll stated that the ‘considerable research’ that took place between 1959 and 1988 ‘has not suggested any major change’ in the original four components used for the MLAT. More recently, a larger number of researchers working from information processing perspectives have started to pay greater attention to the relationship between aptitude and foreign language learning processes, and tests based on expansions of the aptitude construct have accordingly been proposed (e.g. Skehan’s 2002 model, which corresponds to different stages of L2 learning). However, these newer tests have tended to complement the MLAT rather than supersede it, and their validation has tended to include the MLAT as ‘the benchmark’ (Gregorinko et al., 2000, p. 397).
The MLAT in contemporary perspective
No language aptitude test to date has been as practical as the MLAT, especially in terms of the time needed to take the test, compared, say, with Gregorinko et al.’s 2000 CANAL-FT, which consists of nine sections. We can thus safely assume that the MLAT is still ‘on active duty.’ I therefore now evaluate the test from a contemporary perspective, drawing on Bachman and Palmer’s (1996) model of ‘test usefulness’ and using Chapelle, Jamieson, and Hegelheimer’s (2003) application of Bachman and Palmer’s model as a guide. Because of space limits, my evaluation of the test will pertain only to the most frequent use of the test, namely for screening. I hypothesize a case and construct my argument based on the results of previous studies that have used the MLAT for that purpose.
The case
The MLAT is used to select who should receive special intensive training in the languages taught by the Canadian Foreign Service Institute.
Table 1 summarizes my evaluation of the MLAT for this case, using Chapelle et al.’s (2003) argumentative procedure and presenting the positive and negative attributes of the test in terms of Bachman and Palmer’s five relevant criteria for test usefulness. I exclude the ‘authenticity’ criterion because it is arguably irrelevant to aptitude tests such as the MLAT, which measure cognitive traits rather than language use.
Characteristics of test usefulness for the MLAT used for screening
The MLAT of the future
I conclude this review by suggesting several promising directions the MLAT might take in the future for both practical and research purposes. First, we have seen that using the MLAT for screening meets Bachman and Palmer’s (1996) test usefulness criteria to some extent. For such practical purposes, the MLAT does its job well, though users should be aware that the test has some ‘aging’ problems. However, these problems can be mitigated by new knowledge accumulated over the past 50 years. For example, recent studies in cognitive psychology have revealed that the memory capacity most crucially related to language learning is not the static rote learning ability measured by the MLAT but working memory, which both stores and processes information (see Table 1). Furthermore, Gregorinko et al.’s (2000) new theoretical approach based on recent developments in cognitive psychology suggests that the ‘ability to deal with novelty’ (p. 401) is also a crucial part of foreign language aptitude. Using an expanded battery including (part of) the MLAT and measures of these new constructs may result in a higher predictive power than using the MLAT alone.
In addition to group-level use, all or part of such an expanded battery of aptitude tests could be also used at the individual level for both practical and research purposes. For example, if we can measure different dimensions of aptitude through such a battery, we can draw each learner’s profile of ‘aptitude complexes’ (Robinson, 2002, p. 114), including strengths and weaknesses in each area (e.g. working memory, grammatical sensitivity). We can then choose types of instruction that best match each learner’s profile. Or, as Ortega (2009) explains, different capacities may be more strongly related to different developmental stages (e.g. working memory may have higher predictive power in earlier than in later stages). These new ways of theorizing the construct of foreign language aptitude suggest not only innovative approaches to L2 instruction but also promising potential for future studies, of which the MLAT might continue to be a useful component. Lastly, despite this positive future for the MLAT and for language aptitude research in general, we should also caution that aptitude is only part of what makes L2 learning successful. Students’ motivation and other affective factors may all be equally influential. Here again, we should acknowledge Carroll’s foresight in having considered this issue while the MLAT was still being developed (Carroll, 1962). I thus conclude this review with advice from Carroll himself in his last interview, namely, ‘go back to the 1950s, when language aptitude tests were developed, and try to emulate the good work we did then’ (Stansfield & Reed, 2004, p. 55).
