Abstract

The National Center Test for University Admissions (henceforth, the Center Test) is a unified national test of all high school subjects taken by more than 500,000 students each year in Japan. The Center Test has been implemented since 1990, superseding its predecessor, the Common First-Stage Examination, which was administered from 1979 to 1989. Its English component had consisted only of the written section until 2005 when the listening section was first implemented after a series of preparatory research studies (Nishigoori & Kuramoto, 2010). The test is designed and produced by the National Center for University Entrance Examinations (henceforth, the NCUEE, www.dnc.ac.jp/), an independent administrative institution. The content of the Centre Test is aligned with the guideline titled the Course of Study for secondary or high schools prescribed by the Ministry of Education, Culture, Sports, Science and Technology in Japan (MEXT). Before going into the details of the test, the context for its use is briefly described below.
In Japan, there are two major types of universities. Of approximately 750 four-year-course universities, 20% are national and local public and 80% are private. In the case of national and local public universities, they all have to follow a two-stage admission process. At the first stage they administer a common test, and at the second stage they carry out their in-house examination on their own campus. Unlike national and local public institutions, private universities have the freedom of employing their own methods, which vary widely. Some universities administer their own paper-and-pencil test on their own campus, while other universities conduct interviews and/or essay exams as well.
The Center Test was initially developed as the first-stage examination of national and local public universities, though recently an increasing number of private universities (500 as of 2012) are also administering the test to make use of the scores as part of their admission decisions. The test covers all six academic school subjects with 28 subdivisions, including Japanese, geography, history, mathematics, science, a foreign language, and so forth. The foreign language section includes French, German, Korean, Chinese and English. The present review deals only with the test of English.
Test purposes
The Center Test is an achievement test, in that it purports to measure the student’s achievement level at the point of finishing the last year of upper secondary education, and the coverage has to be within the content of the MEXT guidelines. It is also a certification test, in that the test score is used to guarantee that the student who has obtained a certain level of scores is judged to possess the knowledge and skills required to enter or take the second-stage examination that each university carries out on its own campus (NCUEE, 2012a, p. 466). Although its primary purpose is to provide information that is to be used as part of selecting students for university admission, as with any other test of this nature the Center Test has a secondary purpose that requires independent validation. That is, it aims to help improve teaching and learning at pre-college-level education by reinforcing the content of the Course of Study, which requires teachers to instruct students so that they can acquire practical communication skills in the four areas of comprehension, production, knowledge of language and culture, and positive attitudes towards communication in English.
Length and administration
The Center Test is held over the weekend in the middle of January of the year in which the admission process begins. Note that in Japan the school year begins on April 1 and finishes at the end of March the following year. The test is administered at more than 700 places including six specifically equipped for students with disabilities, so that all the candidates may sit the test under the same conditions. The supervisors and proctors, consisting of university faculty staff, follow the manual and carry out its provisions under strictly controlled conditions. Those candidates who are unable to take the test on the administration day are given the chance to take the make-up test a week later.
The total testing time is 80 minutes for the written component and 30 minutes for the listening component (excluding 30 minutes for preparation). The listening test is administered by an audio device with a headset, which is distributed individually to each test taker, so it may guarantee fairness in the quality of recordings that may vary at different seating places. The materials that have been used in the past are available online and the audio-instrument can be borrowed on request (www.dnc.ac.jp/modules/center_exam/content0224.html).
The NCUEE releases all test questions along with the answer key online and in the press immediately after each administration. This is not only for the sake of accountability, but for practical reasons as well. As has already been noted above, national and local public universities follow a two-stage admission process, though the method may differ among institutions. One group of universities sets a cut-off score for the Center Test, so the students who fail to reach the predetermined score are not accepted to sit for the second-stage examination. The other group gives permission to all the candidates to proceed to the second-stage examination irrespective of the Center Test scores. In this case, to make a final admission decision, scores from the Center Test and the second-stage examination are combined. Given the importance of the outcome of the Center Test, then, those students who are going to apply for national or local public universities have to know their scores prior to registering for the second-stage examination of their target universities. This necessitates that the students write answers on their own test books, and bring them back home to check their answers in the light of the answer key released.
Author/publisher and contact information
The NCUEE is in charge of producing the Center Test and administers it in cooperation with each university. The test is developed and its items are written by a committee consisting of university faculty staff. Each writer serves for a two-year term, new members being designated every two years. The exam costs between ¥12,000 (approximately US$150) and ¥18,000 (approximately US$230), depending on the number of the subject tests the candidate takes.
Among several publications the NCUEE releases each year, the most important one is the Annual Report (www.dnc.ac.jp/modules/center_exam/content0408.html), which contains the review of the Center Tests of the year and serves as an important source for validating the test, and its details will be given shortly. Other publications include Forum, a collection of academic essays dealing with various issues of educational assessment intended for a general audience, and Research Bulletin, containing academic articles written by professional researchers intended for a professional audience. These publications as well as various others addressed to a range of test users, including test takers, are available online (www.dnc.ac.jp/modules/center_exam/content0011.html). This website also includes past exam papers with sample audio materials and the purpose and the content of the test.
General description
All the items of both the written and listening components are multiple-choice. NCUEE (2012a) requires the test constructors to follow specific guidelines in summary as follows: (1) to assess practical communication skills (specified in the MEXT guideline); (2) to use present-day English; (3) to assess sociolinguistic, discourse and strategic knowledge as well as discrete linguistic knowledge; (4) to cover a wide range of topics; (5) to include items of various levels of difficulty; and (6) to use vocabulary from the coverage of high school textbooks.
The written component (a total of 200 points) is divided into six major sections, while the listening component (a total of 50 points) has four major sections. Tables 1 and 2 show a general description of the 2012 test battery.
A description of the 2012 Center Test in a summary form (NCUEE, 2012a): Written Component.
A description of the 2012 Center Test in a summary form (NCUEE, 2012a): Listening Component.
As Tables 1 and 2 indicate, and as Guest’s (2008) detailed analyses of previous years’ test papers attest, the Center Test contains various tasks requiring test takers to employ a range of skills and strategies for language use to answer the questions. It certainly suffers from limitations (see also Brown, 2000), because it has to operate under a number of practical constraints. Amongst others, as has been already noted, the most important and possibly unique practice involves releasing all test items and key into the public domain immediately after administration, which in turn necessitates the use of objectively scorable items. As if compensating for the limitation, the NCUEE strongly recommends that each university use open-ended, subjective-type questions at its second stage (NCUEE, 2012b, p. 2). In fact, most of the second-stage exams involve translation of English into Japanese, a summary of an English passage, short-answer questions and other subjectively marked test tasks.
Basic test statistics
In January 2012, a total of 526,311 students sat for the tests of all the 28 subdomains of test subjects included, while 665 universities, including private, local public and national, used the test scores. Test scores with basic statistics are to be released within three days after the administration for the sake of test users. The mean score of the English component of the 2012 examination was 124.15 (out of a total score of 200, N = 519,868, SD = 42.05) for the written test, and 24.55 (out of a total of 50, N = 514,748, SD = 8.03) for the listening test. The number of students and the mean scores do not differ greatly across years, ranging from 2011 (N = 519,538, mean = 122.78), 2010 (N = 512,451, mean = 118.14), 2009 (N = 500,297, mean = 115.02) to 2008 (N = 497,101, mean = 125.26) in the past five years. Cronbach’s alpha is around .85 for the entire battery of the English test each year, though the coefficient is not released officially to the public. However, the figure is not very useful because the test does not meet the condition that each item measures the same construct.
Test review and validation
If the primary purpose of testing is ‘to collect information for making decisions’ (Bachman & Palmer, 2010, p. 22), then the question is whether the Center Test measures the right constructs, as specified in the MEXT guidelines. However, any large-scale test entails educational consequences, whether intended or unintended, and the Center Test is no exception. On the intended side, its purpose involves communicating in English what is important in teaching and learning at secondary-level education, thereby serving as a model for pedagogical practice as well as instructional materials and resources, and further, if possible, it may motivate students and teachers to that end (NCUEE, 2012a). Each of these requires independent validation, though in fact it is not yet integrated as a routine part of post-test analysis.
Evidence-based argument in an admission test may run as follows (adapted from Herman, 2011). The use of scores in college admissions is justified if the following conditions are met: a test should accurately and fairly measure knowledge and skills that represent college readiness; the scores should result in fair, reliable, and accurate inferences about students’ college readiness; and the scores should predict college success. In the case of the Center Test, the evidence needed for its validation ought to include the information that helps to assess if (1) the test is fair to all students; (2) the test provides a reliable and valid measure proving that candidates are ready to study at university level; (3) test scores prove that students have reached the level of achievement that is required by the MEXT guidelines; (4) test scores predict the candidate’s success at university along with the second-stage examination and other sources of information; and (5) the test helps improve test preparation teaching, learning and materials at pre-college- level education by reinforcing the MEXT guidelines.
The information that provides evidence for the conditions (1) and (2) above in particular comes from the Annual Report (NCUEE, 2012a), as mentioned above. Each year after the administration of the test, NCUEE collects test reviews from three different groups supposedly representing different perspectives. The reviewers examine the test questions in the light of the MEXT guidelines, the previous year’s review and the year’s guidelines prepared by NCUEE (NCUEE, 2012a, p. 455). Most of the reviews are sensible, admitting that the test has to inevitably focus on receptive skills, and positive comments are not uncommon: ‘test tasks epitomize what a test of practical communication skills should look like’ (NCUEE, 2012a, p. 462, my translation). Yet there are often differences in views between the test construction committee and the groups of test reviewers, which turn out to be extremely useful for revising the test for the following year. For example, the reviews of in-service teachers and the research society seem to be concerned about score weight in the light of the difficulty of each task, the quantity of the entire test battery in terms of the total number of words because the students have to process them within a given length of time, and other practical concerns about test takers (NCUEE, 2012a, p. 462).
The third condition concerns the achievement level of the test takers. This is to be examined in the light of candidates’ performance on each item of the test. The 2012 post-test report considers the result of the mean scores being 124.15 with a 42.05 standard deviation to be on the appropriate level overall (NCUEE, 2012a, p. 466). Details of the item statistics are not revealed to the public but are available for internal use only, for the sake of committee members. Nevertheless, each section of the written and listening components is presented and discussed in the report, so anyone who is interested in this type of information may look at it online. This in turn helps to keep the NCUEE and the test development committee accountable to the stakeholders for the quality of the test.
Of all types of evidence, the most difficult to obtain is the fourth one, concerning the predictability of the test scores for the readiness of the candidates for university-level education. This is particularly true because the test of a foreign language comprises only a part of the entire set of information along with other subjects. Besides, as has been noted above, admission decisions may be made by combining the score of the Center Test with that of the second-stage examination. The issue then may require an in-depth analysis of the function of the Center Test at each institution. Likewise, the fifth type of evidence, which relates to the impact of the test on pre-college-level education, has been lacking and therefore research in this area is urgently needed.
Practicality, real world conditions and constraints
Practicality or ‘real world conditions and constraints’ (Bachman & Palmer, 2010, p. 249) is severe, particularly in the case of high-stakes testing. Those particularly unique to the Center Test are as follows. First, all the items need to be released to the public immediately after the administration. This makes it impossible to construct an item bank in the expectation that the items will be recycled. Indeed, recycling of the items is allowed, but it is not common yet. The second constraint is timing. While the Center Test is administered to a large number of students at the same time nationwide on the same day, the scores have to be processed and released without delay, which makes it unfeasible to include subjective items, such as essay or speaking tasks, though such tests are usually employed at the second-stage examination. Third, revision of any part of the test is a time-consuming process, partly because of time constraints and particularly because it often has to be done in parallel with the tests of other subjects. If changes are made to any degree, an announcement has to be issued in advance to all the stakeholders including parents, teachers, university admission officers, as well as test takers (Watanabe, 2010).
Washback and impact
Washback and impact of the Center Test are central concerns of all stakeholders. Despite its importance, however, there has been very little empirical research to date. There are a huge number of test-coaching institutions and a large volume of test-coaching materials available on the market, containing practice exercises created on the model of the previous year’s tests. All this seems to be an indication of the presence of washback at a societal level. But it is not clear exactly what type of coaching is conducted and how much teachers know about the real nature of the test. Indeed, the Center Test as it stands has the potential to cause positive washback (Guest, 2008) albeit limited in scope. For example, successful completion of reading sections of the test would require students to be adept at using a range of reading strategies, including scanning, skimming, understating the text with the help of nonlinguistic information and other sub-skills, appropriately as required by the task. This suggests that the coaching of students has to involve instruction in these areas. Yet this type of expectation remains a prediction to be proven empirically, as many recent studies on washback attest (e.g. Cheng & Watanabe with Curtis, 2004).
Appraisal
The Center Test makes a unique contribution to the field of educational assessment in Japan. However, there are several challenges that the NCUEE will have to face in the future. Amongst others, one of the most urgent involves empirically validating the use of test scores by establishing backing to support assumptions, such as the types of evidence illustrated in the previous section on validation. Indeed, the test use is currently validated in the light of post-test information from different perspectives. The NCUEE also receives informal reviews from various private institutions and opinions from test users. Nevertheless, they are yet to be systematized into the whole process of test validation to help inform the test committee during the following year’s test construction process. Validation should preferably be incorporated into a routine that the NCUEE carries out every year.
Another challenge involves clarifying the division of labor between written and listening components. One of the recurrent issues is the possibility of eliminating the section of pronunciation in the light of empirical research (e.g. Buck, 1989; Shirahata, 1991). However, the section continues to be part of the test battery, importantly because of the favorable comment from the two bodies of high school teachers (NCUEE, 2012a, pp. 466–467) that it has possible positive washback to school education, in the way in which the task is likely to gear teachers’ and students’ attention to the sounds of English. However, this is an assumption that is yet to be validated empirically.
Finally, it would be very useful for test users and stakeholders if the NCUEE were to release the item statistics, reliability coefficients, standard error of measurement, and so forth, along with explanatory notes written in down-to-earth terms. This type of information will serve as an extremely useful resource for many people involved with the Centre Test. For example, it will help inform the faculty of each university as to what type of knowledge and skills are to be tested in their second-stage examination to compensate for what the Center Test could not appropriately measure. For the high school teachers, the information could be used to provide students with informed guidance for their test preparation.
Conclusion
The NCUEE has been playing an important role in developing and administering a series of tests under severe practical constraints in the Japanese educational system. It also conducts a number of empirical research studies and publishes them, thereby releasing a huge amount of information online as well as in print to enlighten the general public. Meanwhile, it also hosts a number of international conferences. Indeed, a number of challenges are there for them to face. But it is hoped that the effort the institution has been making will continue to bear fruit and ultimately bring even greater service to the international research community as well as those practitioners who are involved in high-stakes assessment where there are similar types of constraints.
Footnotes
Acknowledgements
I would like to express my hearty thanks to the NCUEE, in particular Ms. Yaeko Ito of Information Service Office and Dr. Shojima Kojiro of Division of Research Development.
Funding
Production of this paper has partly supported by the Gants-in-Aid for Scientific Research by the Japan Society for the Promotion of Science #23520701.
