Abstract
U.S. public schools are required to establish policies ensuring that English language learners have equal access to “meaningful education.” This demands that districts put into place mechanisms to determine student eligibility for specialized English language services. For the most states, this federal requirement is fulfilled through the local administration of the WIDA–Access Placement Test (W-APT), arguably the most widely used, yet under-studied, English language assessment in the country. Through intensive participant observation at one, urban new student intake center, and detailed qualitative, discursive analysis of test administration and interaction, we demonstrate how the W-APT works as a high-stakes assessment, screener, and sorter, and how test takers and test administrators locally negotiate this test and enact this federal and state policy. Our analysis indicates that the W-APT is problematic in several respects, most importantly because the test does not differentiate adequately across students with widely different literacy skills and formal schooling experiences.
Keywords
Introduction
The 1974 Supreme Court Lau v. Nichols (414 U.S. 563) decision required U.S. public schools to establish policies ensuring that English language learners have access to linguistically appropriate accommodations for them to experience a “meaningful education.” The resulting so-called Lau Remedies require, among other things, that districts determine student eligibility for specialized English language services. Eligibility is established by determining (a) the student’s first language, and the language most often spoken by the student, is not English, and, subsequently, (b) the students’ linguistic ability in English. A student’s first language generally is determined by asking parents a series of questions about home language use. For the majority of U.S. students new to the country, or with a non-English language spoken at home, English language ability is then determined by taking the World-Class Instructional Design and Assessment (WIDA)–Access Placement Test (aka, the W-APT). 1 The W-APT is often the first testing experience an English learner has upon entering a school district, and his or her test score affects his or her initial placement in a school and in an English language program. This test is perhaps the least known, most widely used English language assessment in the country. Indeed, the W-APT has been in constant and quickly expanding use by states since its development in 2005, yet has never been closely examined.
This article addresses this gap by analyzing the W-APT as a case study of U.S. language policy in practice. Through intensive participant observation at one, urban new student intake center, we demonstrate the ways in which the W-APT works as a high-stakes assessment, screener, and sorter. We critically analyze what the W-APT tells us—and what it fails to reveal—about students’ language, literacy, and academic skills. Through profiles of five adolescent test takers and discourse analysis of test administration, we show how the test is shaped by interaction, and in particular, how both test takers and test administrators enact and negotiate the language policy dictated indirectly, by federal law, and directly by the test’s script. This article thus provides the first detailed analysis of the major English language assessment used with most U.S. K-12 English language students. Furthermore, this case study of a particular assessment provides a window into how federal language and education policy are enacted at the local level. In this sense, the article offers both a qualitative and discursive analyses of the enactment of a particular language education policy, as well as a critique of a widely used assessment tool.
Language Policy and Tests
Language testing is a central mechanism of both language policy and education policy (Shohamy, 2001; Shohamy & McNamara, 2009). Researchers have documented not only how tests serve as a powerful means of implementing language and education policy, but often as the policy itself. Menken (2008), for instance, documented how New York City English Learner (EL) high school students experienced high-stakes graduation tests. She found that the graduation test shaped what content was taught in schools, how it was taught, by whom, and in what language, with the graduation test essentially becoming the language policy in New York high schools. In this New York case and in many other contexts worldwide, researchers of language policy have examined how the testing mechanism not only enacts but often becomes the default or de facto language policy and education policy, driving curricular decisions and a wide range of educational practices (Shohamy, 2014; Spotti, 2013). The field of language policy (and its subfield, language education policy) increasingly has focused on how particular polices are taken up or enacted by participants in everyday situations (King & De Fina, 2010); this work often requires researchers to engage in discourse analysis or other qualitative research approaches (see Mortimer & Wortham, 2015, for an overview).
As the field of language policy has taken an ethnographic and discursive turn (McCarty, 2014) and increasingly turned its gaze to how language policies are experienced on the ground, language tests have received greater critical scrutiny (Maryns, 2004). McNamara, Khan, and Frost (2015), for instance, documented how language tests for migrant adults were legitimized as valid gatekeepers in the immigration process in Australia, with test takers viewing their score as a marker of “outsider” status, despite their successful professional and social integration. Within K-12 contexts, researchers have documented how language proficiency tests potentially mislead us about students’ abilities. MacSwan and Rolstad (2006), for instance, demonstrated how widely used measures of ELs’ native language skills (e.g., Language Assessment Scales-Oral, Español) dramatically underestimated children’s language capacities, leading to ELs’ disproportionate representation in special education. Findings from this line of work can be linked to researchers’ calls for greater attention to revising existing approaches to testing ELs given that present research and practice assessment paradigms overlook the complex nature of language and bilingualism (e.g., Solano-Flores & Trumbull, 2003; Valdés & Figueroa, 1994). As Hakuta and McLaughlin (1996) asserted more than two decades ago (but is still often the case today), EL testing practices are often driven by policies rather than theory and research on language development.
Although both EL testing practices and language proficiency assessments have received growing attention in the United States and beyond (Cumming, 2009; Davies, 2009; Jones & Saville, 2009; Leung & Scott, 2009; Menken, 2009), none to date has examined the most widespread assessment used to identify, and often to place, K-12 students. This is particularly remarkable in the United States given that the Lau decision is perhaps the most important federal language education policy to date, mandating that all U.S. schools take “affirmative steps” to ensure equal access to the curriculum, with identification of non-English-proficient students arguably the most crucial component. As detailed below, for many U.S. school districts, the W-APT serves as both a central mechanism to ensure compliance with the Lau ruling and policy guidelines and, moreover, an important tool for placing students in EL programs.
W-APT
The 2002 No Child Left Behind (NCLB) Act focused new attention on EL students and required U.S. states to annually measure EL growth. In response to state demands for an assessment tool for ELs, WIDA 2 partnered with Center for Applied Linguistics (CAL) to develop, refine, and expand the annual Assessing Comprehension and Communication in English State-to-State (ACCESS) test. In 2013-2014, 33 states administered the ACCESS test to 1,372,611 EL students (WIDA, n.d.-a). When state agencies join the WIDA Consortium, they gain the right to administer ACCESS tests to ELs, but also to professional development and WIDA curricula and standards. In addition, member states also receive access to the W-APT, often referred to as “the screener,” at no additional cost. 3 Because the W-APT is a pen-and-paper test that is locally administered and scored, no data exist on how many students take this test each year. Given that most newcomers to the United States take this test and most EL transfer students (from a different U.S. district or charter school) also take this test, it is likely administered to hundreds of thousands of students each year. The latest 2015 federal education law, the Every Student Succeeds Act (ESSA), makes accountability for ELs a priority, for instance, requiring that each state develop and adopt a uniform, standardized system for identifying and exiting ELs, policies that suggest states will continue to partner with WIDA (and to use a WIDA screening tool).
The W-APT was developed concurrently with the ACCESS test in 2005 and re-released with updates since then (2007 and 2013). The W-APT format and scoring differ from the ACCESS test, and the W-APT is not articulated with the ACCESS test in such a way that growth across W-APT and ACCESS scores can be measured. (This was never the plan or recommendation by WIDA.) The W-APT, according to WIDA, was designed to be used as one of several criteria to determine eligibility for EL support services; an indicator of a student’s language proficiency level (one through six) on the WIDA continuum; an aid to determining appropriate levels and amounts of instructional services; and a guide for tier placement on the ACCESS test (WIDA, n.d-b).
The W-APT thus serves as language education policy in multiple ways. First and more importantly, it is the primary mechanism for the identification of eligibility for EL services, thus ensuring compliance and enacting two federal policies: Lau and NCLB (now ESSA). The W-APT also works as de facto policy by informing school districts’ placement decisions concerning programs and schools. In some large districts (such as the one studied here), certain high schools are designated as newcomer schools, and students’ W-APT scores determine their high school assignment and trajectory. For instance, if they earn a low W-APT score, their English credits from overseas might not be accepted, they might take more English as a second language (ESL) classes (rather than academic content classes), or they might be placed in lower-level content courses. The ramifications of these practices can be serious for students, undermining high school completion and limiting access to college prep courses (e.g., Callahan, 2013; Kanno & Kangas, 2014). And third, as discussed below, from an ethnography of language policy perspective, the highly scripted nature of the WIDA test works as language policy as it regulates the speech and interactional modes of participants.
Scripted Talk
The W-APT, in contrast to other language tests (e.g., the American Council on the Teaching of Foreign Languages Oral Proficiency Interview; Swender, 2003), is fully scripted; test administrators are to read all instructions and test items exactly as printed in the handbook in English. Only the initial section of the test (assessing student-speaking skills) allows for flexibility, with test administrators given several options as follow-up questions (e.g., “Tell me more.” or “Can you be more specific?”). The potential limitations of partially scripted or unscripted oral proficiency interviews have been examined in some detail. Brown (2005), for instance, found that International English Language Testing System (IELTS) interviewers were far from neutral, and that “which interviewer an oral interview candidate encounters can significantly affect their chances of achieving a desired score” (p. 243). Moreover, variation in scores across interviewers/raters has been shown to be driven by differences in interviewer discourse, including their elicitation patterns (e.g., complex vs. simple, direct vs. indirect), rapport work (tone, expressions of interest), and speech style (rate, repairs), all of which shaped interaction, including test takers’ performance and ultimately test administrators’ score. Brown’s research, and that of others (e.g., Brooks, 2009; Delieza, 2012), supports the broader claim that test performance is socially constructed rather than an independent cognitive trait (e.g., McNamara, 1997).
Although the nature of unscripted (or partially scripted) oral proficiency interviews has been widely examined (e.g., Lazaraton, 1992), much less studied is the discourse of scripted interview talk. Nearly all work to date on scripted talk examines how scripts work as curricular materials. For example, scripted curricula (e.g., Direct Instruction, Open Court, Reading Mastery, and Success for All) have been adopted by thousands of elementary schools, generating a growing interest in how scripts shape and constrain classroom interaction and outcomes (e.g., Milner, 2013). In general, scripted curricula tend to focus on instruction of explicit, direct, systematic literacy skills, often advertised as a means to improve standardized test scores (Ede, 2006). Findings on effectiveness are mixed, however. For instance, one study of para-educators working with first graders who were identified as “at risk” for reading failure found that use of scripted reading instruction led to higher rates of on-task instructional opportunities for student practice (relative to non-scripted instruction; Cooke, Galloway, Kretlow, & Helf, 2011). Others have expressed concern about rushed pacing and superficial treatment of complex concepts, and the negative ways in which these materials limit teacher creativity, improvisation, and responsiveness to student needs (e.g., Dresser, 2012; Sawyer, 2004). A related line of research has focused on the impact of scripts that are often part of commercial texts in language classrooms. These scripted texts (often audio performances with accompanying transcripts) tend to differ from authentic, unplanned interactions. Generally, unplanned spoken discourse contains more pauses, fillers, and false starts; is less grammatically complex; and has “less careful” phonological characteristics (Wagner, 2014). Experimental work suggests that, at least for intermediate Spanish second language learners, scripted texts were easier to understand than unscripted texts (Wagner & Toth, 2014).
Scripted spoken tests such as the W-APT are designed to constrain test administrators’ speech with the student. Although this potentially strengthens interrater reliability, an important benefit in contexts where little or limited training is provided, little is known about how scripted tests such as the W-APT work in practice. In light of this gap, two research questions drive this study:
Context
This research was conducted in an urban school district serving a large U.S. city in the Upper Midwest. The district serves approximately 35,000 students; 65% of the students are eligible for free or reduced lunch, and 33% are ELs, with 96 different languages reportedly spoken in student homes across the district. The region attracts a substantial number of Latino families, mainly from Mexico, Guatemala, and Ecuador, and is home to the largest Somali population in the United States. For some East African students, this is their first experience with formal schooling, while others have attended schools in other U.S. states, European or African countries, or refugee camps prior to arriving to the district. In recognition of the large number of students in the region with limited or interrupted formal school, the state recently amended legislation to include a definition of “Students with Limited or Interrupted Formal Education” (SLIFE) as those who usually speak a language other than English or come from a home where the language usually spoken is other than English, who enter school in the United States after Grade 6, who have at least 2 years less schooling than peers, who function at least 2 years below expected grade level in English and Mathematics, and who might be preliterate in their native language(s). SLIFE are a still uncounted but substantial portion of district school population.
All data were collected at the district’s main New Student Center over a 6-month period in 2015. The Center registers students who are new to the district and also handles requests for school re-assignment. All students wishing to enroll in the district must visit a New Student Center, but by far, the great majority of families served on any given day are speakers of Spanish and Somali, about half of whom are new to the United States. As part of the intake process, all families complete a home language questionnaire; if a language other than English is spoken at home, the W-APT is administered.
Data Collection, Method and Analysis
Initially, we were hired by the district as consultants to evaluate and improve intake processes for immigrant and refugee high school students. Across a 6-month period, we served as participant observers and engaged in a wide range of activities, including observing parent–student–staff interactions, talking with staff and parents about in-take processes, collecting training and testing materials, assisting with translation and interpretation as families moved to adjacent offices (e.g., transportation), accompanying families to the nearby newcomer high school to complete registration processes, and operating the door buzzer and passing out crayons to waiting children. We took detailed notes on all of these activities; we reference the date of these field notes in the sections below. Together, more than 160 hours of observations were logged at the Center, between May and October 2015, with late August and early September being the busiest time at the Center. In addition, we conducted more formal, audiotaped interviews with parents (n = 25) and staff (n = 5). And finally, we audio and video recorded the administration of the W-APT with 12 students, and kept copies of these students’ written work on the test, their score sheets, and supplementary evaluations of their native language literacy skills. 4
This extensive data informed our understanding of who these students were as well as the training and pressures of the test administrators. However, our analysis here focuses on five focal students, all of whom were entering high school. These students were selected as they illustrate the range of student types and types of interactions observed. The analysis presented here is based on (a) examination of the W-APT materials; (b) discourse analysis of video and audio recordings, and transcriptions of test administration; and (c) review of student written work on the tests. As a case study of local language education policy enactment, we drew from our 6 months of fieldwork at the site, our long-term experiences with similar populations of students, and established qualitative and discourse analysis approaches in multilingual settings (Bigelow, 2010; Bigelow & King, 2015a, 2015b) to examine these data, and to present cases of five, illustrative students here. We analyzed these texts with close attention to the interaction between social context and language in use, but also with attention to the individual agency of test takers and test administrators (Rymes, 2009). As paid consultants to the district, we simultaneously engaged in coding, description, and analysis of processes and interactions we observed with the twin aims of understanding the interactional patterns described here, and making recommendations for change to the district, suggested here but detailed more fully in our final report.
The W-APT, according to WIDA, serves as both a screener for eligibility for ESL services and as a guide to determine appropriate levels and amount of ESL instructional services. The test assesses students across four language domains (listening, speaking, reading, and writing) and places them on the six-level WIDA Proficiency Scale (1 Entering, 2 Emerging, 3 Developing, 4 Expanding, 5 Bridging, and 6 Reaching). In this district, during the period of the study, most students taking the test received a 1 or 2 (74%). Administration is one-on-one, and takes between 20 and 90 min, with more proficient students taking longer. Test administrators are trained through an online guide, and through peer observation and support. At this site, test administrators had a wide range of backgrounds: Some were recent high school graduates and full-time staff, while others were former ESL teachers, contracted on a daily basis during peak times.
Five Cases
To demonstrate the varied ways in which the W-APT works as language education policy, here, we present mini case profiles of five high school students, all of whom scored a 1 or a 2 on the W-APT. (See Appendix A.) We do not claim that these students were statistically representative of all students tested at the Center (much less the United States); they do, however, illustrate the range of student backgrounds and processes we repeatedly observed. These profiles demonstrate the tensions and limitations of the scripted test, how test administrators grappled with those tensions, and the everyday implementation of this language education policy.
Abdi: Assessing Emergent Literacy
Nineteen-year-old Abdi walked through the doors of the Center 4 days after the 2015-2016 school year had started. Somali is his first language, and although he had been in the United States since June, this would be his first formal school experience. He arrived alone by taxi from his temporary housing, where he is staying with his parents and siblings. He is quiet and nervous as he walks with the test administrator to the back room. The administrator does not speak Somali; there is not much small talk.
They get seated side-by-side at the table, and Jane, an experienced tester and former ESL teacher, begins with the script:
Good morning. We’re going to talk about some pictures in English. When I ask you a question, just answer the best you can. Are you ready? First, we’re going to talk about things high school students do. Here are some pictures of activities high school students do. I’m going to ask you some questions about these pictures. [pointing to a photo of soccer players] What are these students doing?
Abdi is silent. The test administrator goes on to the next questions: “Do you like to play soccer?” “What is he doing?” “Do you like to swim?” Abdi does not respond; Jane turns the page and moves on to the listening section:
Now you are going to do a listening test. I’ll ask you a question and you will point to the correct answer on the page. [points to pictures with her finger]. This says “practice.” Look at the big picture. It will help you understand what I say. The big picture shows people enjoying a day in the park. Now, when I say, “letter A,” find letter A in your text booklet. Letter A. Can you point to the letter A?
Abdi looks down at his test booklet, motionless. Jane moves on to the reading portion of the test. Abdi is not able to complete practice exercises and does not go on to do the test items for either of these sections. Abdi receives a 0 point and a score of 1 for all three sections.
Jane advances to the fourth and final section of the test, which focuses on writing. Rather than follow the script:
Now you are going to take a writing text. It is important that you do the best you can. On this page you will read about a special event that Emily planned. On this page you will write about a special event that you can plan. Look at the directions in the black box at the top of this page . . .
Jane says, “Now let me see if you can write something. Can you write your name?” She hands him the (party planning) answer sheet and points to the first line. He slowly, carefully prints his first name, using capital and lowercase letters appropriately, but with some letters above and others below the line. This takes him a full minute. Jane thanks him and takes the sheet. He is also given a native language assessment (in Somali), which asks him to answer short biographical questions in Somali. He is unable to complete this, writing a large letter M shape as an answer to the first question (his name). Because of his lack of transcripts, overall W-APT score (1), and age (19), Abdi is placed as a ninth grader in the city’s all newcomer high school.
What is most salient in this interaction is how the formal testing apparatus of the W-APT was long and unwieldy given the student’s low English proficiency levels. In response, the experienced tester adapted (and violated) formal guidelines, for instance, by not going through with the listening and reading sections based on practice performance, and not following the script for the writing component of the exam, and instead asking Abdi to write his name. As she explained afterward, she views the invitation writing task as “weird, inappropriate and contrived,” noting that her son with a U.S. MA degree never wrote letters, or addressed envelops, so how was someone who just arrived expected to write out a formal invitation? (Field notes, September 2, 2015). More broadly, this interaction suggests the ways that experienced testers negotiate and adapt the W-APT. Although these practices contradict the testing instructions in the technical manual, in some cases, they result in a more humane and probably equally accurate evaluation.
Bakalcha: Off Script Adjustments
Bakalcha, also 19, recently arrived to the United States. He is a native speaker of Oromo and comes to the front desk of the New Student Center with no appointment, but a white plastic bag from the United Nations (UN) International Organization for Migration, jammed full of documents. School started in the district about 2 weeks ago. Bakalcha is sharply dressed with new black athletic shoes with still very white stripes and a new-looking black leather jacket (although it is summerlike outside). It takes nearly an hour to complete intake paperwork, in part because he has a complicated medical history. A bilingual woman, a few years older than Bakalcha, accompanies him and helps to answer questions about his background: “Has he lived in a refugee camp?” (yes), “Has he missed more than a month of school?” (yes), “What other languages does he speak?” (Amharic). After more waiting and signatures, Bakalcha (B) is led to the back testing room by a newly hired staffer and test administrator, Mohammad (M), a Somali–English bilingual man in his mid-20s. The administrator has a casual, relaxed tone as they settle in.
Although there are many notable features of this interaction, here, we focus on the ways that Mohammad simultaneously makes the test (and text) both more and less challenging (See Table 1). For instance, the administrator (M), perhaps in attempt to be less formal, to minimize social distance, and to make test interaction less stressful, repeatedly goes off script. In doing so, however, his language becomes more complex. For instance, he uses colloquial expressions such as “these guys” (turns 11 and 37) and discourse connectors such as “so” (turns 3, 5, 22, 39). He makes questions more indirect by embedding them (e.g., “Can you tell me what else do you see in these pictures?” [31] “Can you tell me what other things you see in these pictures?” [35] “Can you tell me what these things are?” [22]). This lengthens and grammatically complicates his requests. He also makes his language more complex by turning simple information questions from the script (e.g., “What is this?”) into embedded commands (e.g., “And tell me what this is?” [24]).
Testing Excerpt 1 (Bakalcha).
Note. Text in italics corresponds to test script. See Appendix B for transcription guidelines.
Mohammad’s modifications are exacerbated by his very generous interpretation of the scoring system, awarding him two “meets” (expectations) and three “?” (a temporary placeholder if tester is unsure how to score), and only two “approaches” in the initial speaking section. According to W-APT guidelines, “meets” (p. 30) should be awarded when a student meets the expectations of the task in quality (i.e., the type of discourse produced) and quality in terms of vocabulary, usage, and control of the language. In general, “meets” should be awarded if “the student is able to give a performance that leaves little doubt that he or she would be able to understand and attempt a response to the task at the next higher level” (p. 30). “Approaches,” in contrast, corresponds to responses that are characterized by “giving brief answers when more extended responses are expected, groping for vocabulary and structure, [or] excessive hesitating” (p. 30). Given that this student cannot produce more than one-word answers to any of these questions, “approaches” would be a more accurate score. However, Bakalcha continued to move on in the test because he was awarded three “meets” by Mohammad, and a speaking level (raw score) of 2, supposedly characterized by “phrases, short oral sentences; general language related to content area; when using simple discourse, language is generally comprehensible and fluent” (p. 27). There is little correspondence between these scoring criteria descriptions and the student language here.
For the final writing portion of the test, Mohammad hands him the party invite worksheet and says:
All right, here you go. So now, think about some special events in your life or at school. Choose to write about one. This picture shows some examples, so you can write about a school event, a sport event, uh some kind of entertainment event, plan your special event here, so name the event, name your event, so this is your event, you’re going to name your event, the event, name the event, name the occasion. So people to invite, write the people’s names to invite, when is the event, where is it located, and any other information, and the second part, write a note to invite a friend. If you can, make at least four sentences . . . Good? So let me know when you’re done, OK? [silence] Do you have any questions? All right!
He leaves him alone with the paper and returns 2.5 min later. Bakalcha has written nothing on the paper, and the test is stopped. Mohammad notes on the scoring sheet that student is “unable to write,” and he receives an overall score of 1.8.
As a final step, we (the researchers) ask the student to write a little in Oromo about his family. Bakalcha quickly and fluently writes in English: “Oromo. My father name Budri My mother name Weyio My brother name Abdiso My sister name Bicuu.” After a few minutes, we ask him to write again in Oromo; he writes the same basic phrases immediately below the English. His work indicates that he mastered the basic mechanics of writing, with fluent penmanship, clear word boundaries, and sentence boundaries as indicated by vertical line spacing. The structure and set-up of the writing task in the W-APT did not allow this student to demonstrate his writing competencies. Because of his lack of transcripts, overall W-APT score (1.8), and age (19), Bakalcha is placed as a ninth grader in the city’s newcomer high school.
Paolo: Code-Switching Through a Monolingual Test
Paolo (P), who will be 17 when school starts in September, arrived to the United States in March 2015, crossing the border by foot and then private car to reunite with his mother. This is his second visit to the New Student Center in August 2015; he has completed all paperwork and is here to take the W-APT. He is a bit nervous as he enters the testing room, and the bilingual test administrator, Silvia (S), chats with him in Spanish to put him more at ease and to clarify his educational history. He reports that he attended school in Mexico since age 4, but changed schools frequently in recent years as he moved from Puebla to Veracruz with his grandparents, and can remember the names of two of the three secondary schools he attended. The last time he attended school was in Veracruz in January 2015. This initial interaction takes about 5 min, and is relaxed and friendly, punctuated by frequent laughter. Silvia’s tone then becomes more serious as she explains the purpose of the test (See Table 2).
Testing Excerpt 2 (Paolo).
Note. Text in italics corresponds to test script; underlined text is translated from Spanish to English. See Appendix B for transcription guidelines.
This excerpt highlights how resourceful and communicative Paolo is as a test taker. This is particularly notable in light of the bilingual environment created by the test administrator (S). After nearly 6 min of chatting in Spanish, she “flips the switch” to English, and follows the monolingual English W-APT script. Paolo, however, continues to operate in a bilingual frame, answering some questions in both English and Spanish and seeking clarification in Spanish, for instance, “Mhm I don’t know. Cómo se dice nadar?” (turn 14), “Viendo TV? Si?” (turn 28), and “Todas?” (turn 30).
This presents a challenge to Silvia, as she clearly understands but is constrained by test protocol. She negotiates this compromise by responding to his Spanish questions in English (e.g., offering English word for “swim,” and clarifying which picture in English and gestures [turns 15, 21, 23, 29]). Although later in the transcript, she uses small amounts of Spanish, mostly for praise and re-direction (e.g., No te preocupes [translation: Don’t worry.]). Although only English-language responses are counted within the W-APT scoring rubric, Paolo’s use of Spanish indicates not only his comprehension of some of the materials but also his strong overall communicative skills.
Also notable here is Paolo’s literacy and general school skills. Although he received the lowest possible score for his writing and reading in the W-APT, there were many indications of his strong literacy skills. Most obvious were his attempts to read the test administrator’s script and questions. In Turn 24 here, and again later in the test, Silvia re-directs his gaze away from her printed script toward the student test materials. By Turn 221 in the exam, Silvia moves her chair and holds the paper at an angle that he cannot see.
Paolo struggled for nearly 10 min with the party invitation task, only partially completing this; however, he was able to write quickly and fluently in Spanish about his family, with complete sentences and mostly standard punctuation:
Mi padre se llama Paolo [last name]. Mi madre es Rosa [last name]. Tengo un hermano mas pequeño que yo llamado David tiene 12 años de edad. el esta estudiando La Secondaria y por circuntencias Familiares mis padres no viven Juntos pero si tenemos una buena communication.
He wrote this rapidly and without careful editing. Although there were non-standard features (e.g., uneven punctuation and spelling), he clearly had a strong command of basic literacy skills in Spanish. Yet, because of his lack of transcripts, overall W-APT score (1), and age (17), Paolo is placed as a ninth grader in the city’s newcomer high school.
Hamse: Testing Transfers
Hamse is an 18-year-old who arrived in the United States in January of 2014. He enrolled that same month in one of the city’s international charter schools that serves mostly recent arrivals, and attended that school for a year and a half. He now wishes to transfer to one of the city’s comprehensive high schools, where he says he has more friends. He reports that he attended school in Kenya regularly prior to coming to the United States. Because he is a transfer student with Somali as his home language, he is required to take the W-APT. His test administrator is a highly experienced former ESL teacher who works on a contract basis, Betsy. They chat in English as they walk back to the testing room.
Hamse is able to talk about concrete items (swimming, soccer, boats) with ease, but struggles to identify and explain what is a fact and what is an opinion, and is given a “3” on the speaking portion of the text. They move onto the listening portion; the first section involves matching the test administrator’s spoken text to one of three line drawings. The test administrator reads the script quickly:
Look at the big picture. It shows a fire truck in front of a school building. I am going to say what a teacher might tell you to do for a fire drill. Look at the top of the next page. Number 1. When you hear the alarm, line up by the door so we can go outside together. Which picture shows what you should do?
Hamse must then choose between three line drawings depicting (A) a boy with a backpack walking down hall, (B) a group of students by a door, and (C) an adult pulling a fire alarm. He correctly chooses B.
Betsy moves on: “Number 2. Now we will line up on the lawn. After you hear the principal say ‘All clear,’ we will return to our classroom. Which picture shows what you should do after you hear ‘all clear?’” Hamse must choose between pictures of (A) five students sitting at a table; (B) three students lining up by what might be a wall, trailer, or bus; (C) three students lining up by a door; and (D) five students standing around teacher by classroom door. Hamse, like many of the students we observed, does not choose the correct answer (C). He does, however, earn enough points in this section to move on to Part B, where he also only misses one of the three (synonym for “gargantuan”). This is sufficient to move on to Part C. Here, he also only misses one of the three (“What is a similar function for flowers and cones?”), and for Part D, he also only misses one of the three (“What is the outcome for Cuba and for Puerto Rico at the end of the Spanish-American war?”). At the highest level of the listening test (Part E: Babysitting), students are asked to solve problems around feeding a baby, including converting gallons to ounces, adding and subtracting fractions, and algebraic equations. These questions, like some in the reading portion, demand both English listening and academic math skills. Hamse is unable to answer any of these correctly, and gets eight of the 15 questions correct on this section overall, earning at raw score of 2 for listening.
For the reading portion of the test, Hamse successfully answers questions demanding interpretation of a web address, pie chart, and table of contents. However, he misses questions that ask him to interpret a line graph, to extrapolate from an advertisement, and to calculate travel speeds. Hamse gets seven of the 15 questions correct on this section overall, earning at raw score of 1 for listening. He completes his writing assessment fluently and quickly, writing, “Dear: Risalli, Mali, I have a happy birthday part on friday at 8.30 AM. Have you can come to my part and have a fun with me. Please let me know if you want a ride. Thanks. Sincerely, Risalli, Mali.” This receives a score of 3. His writing on his native language assessment is also fluent, with neat script, clear word boundaries, and mostly regular use of capitalization, writing additional text below the lines on form when describing his family. Hamse’s overall adjusted W-APT score is 2.35. Because of his age (18), he is assigned to the newcomer high school that serves older teens.
Hamse’s W-APT performance and score illustrate how challenging the test is for beginning- to intermediate-level students. There are several dimensions here. First, certain items seem problematic. For instance, the “return to our classroom” question measures students’ ability to make inferences from ambiguous line drawings as much as their aural listening skills. Second, many aspects of the language test actually assess content knowledge in academic areas such as math. Students’ English assessment is thus dependent upon their formal math skills. Moreover, the scoring is set up in such a way that missing even a few questions means students receive a 1 or 2. Because students who get anywhere from one to eight correct, all receive a score of 1, there is little to no differentiation at the beginning, entry, or emerging levels. For instance, all five of the students profiled here, including long-term English language learner Cristina (below), earned fewer than nine correct on the reading portion and thus received a raw score of 1 on the reading test, despite having widely variable literacy skills.
Cristina: Long-Term English Language Learner
Cristina is 14 years old and nervous. This is her second time this week at the Center. There were not enough test administrators on-site for her to complete the W-APT on Monday, the first day of school, when her family came for the first time to the Center to register. Cristina moved from Morelos, Mexico to Brooklyn, New York when she was 6. She attended school for 1 year in Mexico, and 9 years in the United States. Because one of her home languages is Spanish, and she is new to the district, she is required to take the W-APT. She is assigned an experienced contract tester and former ESL teacher, Karen, who walks her to the back testing room and chats with her in English along the way.
Cristina is one of the very few students we observed who scored a 6 on the speaking portion of the test. She was able to provide meaningful and fluent answers to questions such as “How do you tell the difference between a fact and an opinion?” Cristina explained, “Um, an opinion is uh, it’s your, like something you believe? Something you say? And a fact is something real life and everybody knows it’s true.” For the listening component, Karen skipped the practice portion of the test and moved directly to the “fire drill” section, which requires students to match spoken text to pictures (Part A), as described above. Like Hamse, Cristina misidentified the “return to our classroom” drawing, but had answered enough items correctly to continue to next sections. In Part D, on the Spanish American War, she gets two out of three incorrect. Here, students need to match the administrator’s extended spoken text with sentence-long statements and a map. Cristina correctly answers the map question but misses the two questions that require reading, earning a total of 8 points on this section and raw score of 2 for listening. This sort of large disparity between speaking and listening skills was common, according to test administrators, and is perhaps due to the fact that the listening component of the test demands substantial reading.
In line with this explanation, Cristina found the reading section more challenging. Although she was able to answer all of Part A correctly (e.g., reading and making inferences about a website name, pie chart, and graph), the next section (Part B), analyzing the tables of contents of biographies of Michelangelo, Van Gogh, and Monet, is much harder for her. She gets only one of three correct, giving her a raw score of 1 for reading.
For the final writing section, planning a special event, she wrote, “Dear Kathy: Hey there, my little cousin is having his 1st year birthday party. I would love if you can come to the party on July 15 at the park at 2:00pm. I hope you can make it. Love, Cristina.” She scored a 3, which according to WIDA characterizes writing that “shows an attempt at organization of sentences into paragraphs. The student’s intent is evident and generally successful. Although the sentences may be more complex, they may be unrelated or not cohesive” (p. 33). She wrote out this invitation quickly and fluently, in marked contrast to her writing on her native language literacy assessment, which she struggled to comprehend, complete, and answered mostly in English. For instance, for the last question of the native language assessment asking her to write about her family, she printed, “I really don’t know what to say about them. Mi familia es (scratched out). Tengo 2 hermans 1 hermanas.” Because of her age (14) and W-APT score (2.6), she is placed in a mainstream high school as a 10th grader and eligible for EL services.
Cristina’s overall W-APT score of 2.6 was between “emerging” and “developing” on the WIDA Scale. Her testing experience raises several issues concerning the idiosyncrasies of the test. For instance, some of the materials, and drawings in particular, are low-quality and poorly designed (e.g., hard-to-decipher line-drawn visuals; paper copies in binders, which do not lie flat). These features potentially affect both validity and reliability. Furthermore, the nature of the scoring system collapses a very wide range of performances into Categories 1 and 2. For instance, Cristina, who could understand and explain a range of academic topics, only scored a 2 on listening, in part because the band for Category 1 (0-7 correct) is so wide. This score is also a reflection of the fact that the listening portion of the test demands substantial reading. Moreover, Cristina, who could read a graph, interpret a web address, and answer questions about pie chart, scored a 1 on reading, the same score as students profiled here who never participated formal schooling. As we discuss below, this suggests limitations in what the W-APT can (and cannot) tell us about ELs.
Discussion
These five cases illustrate how the W-APT is enacted through routine interactions between test takers and test administrators, and suggest what the W-APT tells us (and does not tell us) about EL students. With respect to how this particular language policy is enacted (Research Question 1), we found that test administrators and test takers engaged with the W-APT in ways that deviated from established WIDA guidelines, but also sometimes made local, immediate sense. Although some administrators adhered closely to the script, many made on-the-spot adjustments depending on perceived needs and capacities of the test takers as we saw above with testers Jane and Mohammad. In interviews with us, experienced test administrators questioned the ordering of test items, as well as the lack of context for many questions (e.g., students must understand a math problem read to them with no visual support; field notes from June 11, June, 23, August 23, 2015). Furthermore, the black and white line drawings were widely believed by testers to be difficult for students to interpret, especially those new to school and/or print literacy (field notes from June 11 and June 23, 2015), a phenomenon we also observed widely and is evident above. Test administrators’ level of familiarity (and frustration) with the W-APT seemed to correspond to the types of modifications they made. Overall, as evident here, experienced testers such as Jane were more likely to minimize test requirements (e.g., skipping practice questions, modifying writing instructions), while more novice testers, such as Mohammad, were more likely to require students to complete the entire tests and to generously interpret students’ minimal responses. These decisions not only extended the test length but also potentially inflated students’ scores, as we saw with Mohammad and Bakalcha.
Although language policy is enacted through the W-APT, and in light of the constraints of the script and limitations of the test, administrators developed a range of “work arounds,” with varied interactional effects. It should be noted that some test administrator discretion is encouraged by WIDA. For instance, the tester has discretion to not continue with other parts of test if the student “struggles” with speaking (WIDA-ACCESS Placement Test W-APT Test Administration Manual, 2014, p. 13). Likewise, the guidelines advise administrators to “speak as slowly and clearly as you would in class or in conversation with a student” (WIDA-ACCESS Placement Test W-APT Test Administration Manual, 2014, p. 13), a directive that might well be variably interpreted. In addition, the test administrators we observed were often sensitive to the fact that students frequently feel nervous or disoriented during the test, which also can result in spontaneous changes in the script. As suggested above, experienced test administrators often made adjustments according to students’ perceived skills. For instance, rather than having students go through entire party planning example and set-up, experienced testers said things such as “we’re going to see if you can write here” and gave students the paper (e.g., Jane and Abdi). In turn, we saw how less experienced test takers also went off script, but did so in ways that made the test more difficult. As suggested by past work, authentic, unscripted text can be more difficult for language learners to understand (Wagner & Toth, 2014); here, for instance, we saw how Mohammad’s use of informal and indirect language tended to make questions more rather than less challenging for test takers.
Other pairs of test takers and test administrators explicitly or implicitly bilingually negotiated the imposed English monolingual context of the test. For instance, we saw how Silvia and Paolo started off building rapport in Spanish, and then worked hard to stay in English, with Paolo using all of his resources, including English, Spanish, body language, and humor to engage with the test and with Silvia. From one vantage point, these code-switching practices could be understood as translanguaging, which refers to “the dynamic ways in which bilingual interact with the world translingually, beyond the two language systems that are assumed in traditional definitions of bilingualism” (García & Hesson, 2015, p. 227). However, the W-APT, like most assessments, is built upon the notion that learners’ languages can and should be compartmentalized into two, separate bounded systems. The artificiality of this assumption is evident both in the interaction here, and in the abundant data indicating that both languages are always active for bilinguals (Bialystok, 2009). Nevertheless, W-APT test administers are confined to English prompts and instructions, and only test takers’ English language utterances are scored. Paolo’s attempt to introduce and draw from all his language resources and skills here were not recognized or ratified in the monolingual testing environment.
With respect to what the W-APT tells us (and does not tell us) about EL students (Research Question 2), our analysis indicates that the W-APT provides school administrators and teachers with some very useful information about their new students. Most importantly, it gives a rough sense of students’ English language proficiency across the domains of reading, writing, speaking, and listening. Because the W-APT is aligned with WIDA standards, this score provides an approximate indication of their academic English strengths and capacities relative to the academic curricula in use in many districts. Furthermore, data here suggest that scores generally do differentiate between those students who have had some previous schooling in the United States, who tend to score a 2 (e.g., Cristina and Hamse), and who are new to school in the United States, who often score a 1 (e.g., Abdi, Bakalcha, Paolo).
However, the data here also suggest that there is much that the W-APT, in its current version, does not tell us, especially when we consider the wide range of EL students the purported test serves. Most obviously, it does not effectively differentiate between students with previous schooling outside the United States and those with no schooling whatsoever. Among students who are still acquiring literacy skills, the test is not sensitive to varied reading and writing skills in any language. Although most observers or test administrators could quickly note (and with an appropriate rubric, score) many aspects of students’ print literacy (e.g., page orienting, turning page, holding pencil, aligning gaze to the tester’s instructions, basic alphabetic awareness), these are not accounted for in the current form of the test. For students who are new to print and for some SLIFE students, the test is insufficiently sensitive.
More broadly, this raises concern about the nature of the EL category, and how this category is constructed through federal educational policy and local language testing policy and practice. For instance, the needs of Cristina, who has strong speaking skills and multiple years of formal schooling in the United States, are in many respects distinct from the other students who scored similarly on the test. Cristina generally fits the definition of a “long-term English language learner” (LTEL). 5 Although this term has become more widely used in recent years (Cushing-Leubner & King, 2015), its productivity (and possible use as a deficit label) is contentious (Flores, Kleyn & Menken, 2015). Nevertheless, for a potential LTEL like Cristina, who has not missed more than 2 years of formal schooling, yet is substantially behind her peers, it is important to note that her W-APT score is no different from some SLIFE. This fact calls into question both the utility and basis of the terms (e.g., LTEL and SLIFE) and the assessments used to identify and place students.
Furthermore, as noted by many of the test administrators, the scoring system is set up so that, in particular for the listening and reading tests, the “buckets” for a Level 1 score are quite large. Students who earn between 0 and 7 points (for listening) and between 0 and 8 points (for reading), all receive a 1 for their raw score. This lack of sensitivity in the test creates a strong floor effect. 6 At our site, of the 184 high school students tested from August to November of 2015, 136 (74%) received a 1 or 2 (17% received a 3; 8% a 4; 1% a 5; and none a 6). Indeed, experienced testers reported that in multiple years of testing hundreds of high school students, they had only seen a handful of 6s (field notes from August 23, 2015). (No national data exist.)
Finally, we noticed large qualitative differences between students who scored, for example, a 1 versus a 1.3; in contrast, the differences between students scoring 2 and 2.5 were much smaller. In other words, the differences among students scoring 1 were dramatically wider compared with those at any other level. The constraints and properties of the test, including its scripted nature, scoring system, and variable administration by staff, together resulted in limited differentiation at these lower levels. It is perhaps in part for these reasons that educators who work with newcomers often tell us that the W-APT scores are not meaningful to them; they often need to move students to different levels, and to double-check their proficiency across modalities. Although the administration and scoring of the W-APT vary for the reasons suggested previously, the cases in this article illustrate why the near blanket scores of “1” and “2” have little practical value for educators.
Conclusion
The W-APT works as language education policy in powerful and unanticipated ways. The W-APT was not developed as a standalone placement device: The guidelines state that it is “one of multiple criteria in determining eligibility for ELL services and program placement” (“WIDA-ACCESS Placement Test W-APT Test Administration Manual,” 2014, p. 6). Furthermore, the development of the W-APT was never a central component of WIDA’s mission, and in contrast to the ACCESS test, which undergoes constant updating, field testing, and rigorous item analysis, the W-APT was developed by a handful of people on a small grant to fill the need for a basic screener for ESL services (E. Cranley, personal communication, September 11, 2015).
In the decade since its development, its use has grown exponentially, largely as the result of NCLB’s requirement that all states measure the annual growth of ELs. A rapidly expanding number of states have joined the consortium, many to utilize the ACCESS test; the W-APT is an additional bonus. Yet, although the W-APT was never designed to be the sole determiner of ESL service eligibility, it has become that in many districts. For instance, in schools we visit, students are referred to by their WIDA scores, for instance, as “ones” or “twos,” and in larger districts such as the one studied here, students’ scores can determine where they might attend high school or how their credits are evaluated. The W-APT, despite its limitations, serves as the main determining factor for school placement and level placement. In this sense, the test has become the policy.
This article has offered an up-close look at how a major assessment, widely used to measure and place students across the country, works in one particular district. It is possible, even likely, that practices vary across sites. For instance, in smaller, rural districts, teachers or EL specialists administer the W-APT. A limitation of this study is its focus on one large, urban site. Nevertheless, our analysis suggests the W-APT is problematic in several respects, most obviously because the test does not differentiate adequately across students with widely divergent literacy skills and former schooling experiences. As we saw here, students with remarkably varied literacy skills earned nearly identical scores (e.g., Paolo and Abdi). And as noted by others experienced in working with SLIFE, the test tends to underestimate students’ skills and is not sensitive in particular to students who are preliterate or acquiring literacy (Karlson, 2015). As a scripted test, the W-APT constrains both the quantity and quality of talk produced by participants, and shapes interaction in unexpected ways as both test takers and test administrators negotiate these constraints and their prescribed roles in the testing event.
Taken together, these findings point to the need for additional research on the usefulness, validity, and reliability of the W-APT as a screener and placer. This is an urgent task for researchers as its use is likely to grow in the coming years under ESSA requirements. Furthermore, the growing diversity of ELs, in particular with respect to their prior experiences with literacy and formal schooling (e.g., Advocates for Children of New York, 2010), makes the need for a valid and high-quality assessment even more acute. For practitioners and administrators, in turn, these findings suggest the need to invest in training testers carefully and regularly, and most importantly, to take these scores as just one of multiple data points in making important decisions about student needs, strengths, and appropriate placements.
Footnotes
Appendix A
| Student name | W-APT score | NLLA | Tester | Formal schooling | What does this case suggest? |
|---|---|---|---|---|---|
| Abdi Assessing emergent literacy |
1 | Not able to complete/respond | Jane | None (SLIFE) | Test not sensitive to low literacy and low English level; Experienced tester make adjustments to accommodate |
| Bakalcha Off-script adjustments |
1.8 | Some Oromo phrases | Mohammad | Limited (SLIFE) |
Inexperienced tester inadvertently makes test harder (Excerpt 1) |
| Paolo Code-switching through a monolingual test |
1 | Fluent although non-standard writing | Silvia | Some secondary schooling in Mexico | Bilingual participants struggle to stay in monolingual frame; strong L1 literacy skills not captured by assessment (Excerpt 2) |
| Hamse Testing transfer |
2.35 | Fluent writing in Somali | Betsy | Some formal schooling in Kenya, 2 years in the United States. | Challenging test, which includes much academic content; wide bands for Level 1 (for listening and reading in particular) |
| Cristina LTEL |
2.6 | Mostly completes in English with a few words in Spanish | Karen | 1 year in Mexico; 8 years in the United States. (LTEL) | Even student with substantial school experience and strong English skills confused by some of the drawings; wide band for Levels 1 and 2 |
Note. W-APT = WIDA–Access Placement Test; SLIFE = Students With Limited or Interrupted Formal Education; LTEL = long-term English language learner; NLLA = Native Language Literacy Assessment (King & Bigelow, 2016).
Appendix B
Transcription conventions
CAPS spoken with emphasis (minimum unit is morpheme)
. falling intonation at the end of words
, rising intonation at the end of words
? rising intonation in clause
-> continuing or flat intonation (as in lists)
! animated tone, not necessarily an exclamation
:: elongated sound
[ ] transcriber’s comment [nods head, eye gaze, etc.]
xxx incomprehensible
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
