Abstract
The focus of this paper is on the design, administration, and scoring of a dynamically administered elicited imitation test of L2 English morphology. Drawing on Vygotskian sociocultural psychology, particularly the concepts of zone of proximal development and dynamic assessment, we argue that support provided during the elicited imitation test both reveals and promotes the continued growth of emerging L2 capacities. Following a discussion of the theoretical and methodological background to the study, we present a single case analysis of one advanced L2 English speaker (L1 Korean). First, we present overall scores, which include three types: an “actual” score, based on first responses only; a “mediated” score, which is weighted to account for those abilities that become possible only with support; and a learning potential score, which may be used as a predictor of readiness to benefit from further instruction. Second, we illustrate how an item analysis can be useful in developing a detailed diagnostic profile of the learner that accounts for changes in the learner’s need for, and responsiveness to, support over the course of the task. In concluding, we consider the implications of our approach to dynamically assessing elicited imitation tasks and directions for further research.
Keywords
In this paper, we discuss an ongoing research project that explores the evaluation of the grammatical competence of advanced second language (L2) English speakers who are undergraduate students at a research-intensive university in the northeastern United States. By the moniker “advanced,” we mean students who have been admitted to the university as regular degree-seeking undergraduates but for whom English is a second or additional language. The students have self-identified as L2 or multilingual users of English by virtue of their enrollment in an undergraduate course on reading and writing academic English that is designed for multilingual students. 1 Although the students in the course are generally considered to be highly communicatively proficient in English, they nonetheless encounter a number of difficulties related to syntax and morphology. The principal aim of our project is to develop a diagnostic test of L2 English grammatical competence that is sensitive to both fully formed and emerging capacities. Here, we report on an initial phase of the project.
Our specific focus in the present paper is on the design, administration, and scoring of a dynamically administered elicited imitation (EI) test. Drawing on Vygotskian sociocultural psychology (Vygotsky, 1978, 1986; for L2, Lantolf & Thorne, 2006), particularly the concepts of zone of proximal development (ZPD) and dynamic assessment (DA) (see, e.g., Poehner, 2008), we argue that support provided during the EI test both reveals and promotes the continued growth of L2 capacities that are not yet fully under a test-taker’s independent control. We therefore diverge from previous research on EI in L2 testing (e.g., Erlam, 2006, 2009; see below) that focuses primarily on the use of such tasks to tap into the implicit linguistic competence of learners (i.e., procedural, nonconscious linguistic competence). Instead, we emphasize the role of support, or human mediation, in linking conscious knowledge of language (i.e., metalinguistic knowledge) to performance abilities. In other words, because DA fuses assessment and teaching as a single, unified dialectical activity (Poehner & Lantolf, 2010), the present paper has a dual focus on the diagnostic and instructional functions of the dynamically administered EI test.
Following a discussion of the theoretical and methodological background to the study, we present a single case analysis of one advanced L2 English speaker (L1 Korean), Kwanghoon (a pseudonym). First, we present overall scores, which include three types: an “actual” score, based on first responses only (i.e., independent performance), a “mediated” score, which is weighted to account for those abilities that become possible only with support, and a learning potential score (Kozulin & Garb, 2002), which captures a learner’s responsiveness to mediation and may be used as a predictor of a learner’s readiness to benefit from further instruction. Second, we illustrate how an item analysis can be useful in developing a detailed diagnostic profile of the learner that accounts for changes in the learner’s need for, and responsiveness to, support over the course of the task (e.g., learning during the test).
Elicited imitation
EI tests elicit utterances from test-takers by prompting them to repeat strings of words, phrases, or sentences. EI was originally adopted in the research of children’s first language acquisition, but it has since been extended to neuropsychological research and L2 research (Bley-Vroman & Chaudron, 1994; Erlam, 2006; Schwartz & Daly, 1976). In studies of L2 development, EI has been described as an instrument that tests learners’ knowledge of specific grammatical features, learners’ interlanguage systems, and the memory system (Bley-Vroman & Chaudron, 1994; Hamayan, Saegert, & Larudee, 1977; Munnich, Flynn, & Martohardjono, 1994).
More recently, EI has been proposed as one method of assessing L2 learners’ implicit grammatical competence by targeting specific morphological and syntactic features that represent different developmental stages (Ellis, 2006; Erlam, 2006, 2009). In this regard, Erlam (2009) notes that “[t]he crucial question is whether imitation requires participants to decode and interpret the stimulus before they reproduce it, or whether they can merely repeat the stimulus verbatim without having comprehended it” (p. 66). In other words, at issue is whether, and to what extent, imitation pushes test-takers to reconstruct the stimulus via their internal L2 grammar (Erlam, 2006; Munnich et al., 1994). Reconstructive EI, according to Erlam, means that repeating the statement is not accomplished by rote memorization, which is subserved by working memory (consciousness), but instead involves procedural competence during language production.
Jessop, Suzuki, and Tomita (2007) argue that during EI, participants engage in three main cognitive processes: (1) processing a stimulus sentence; (2) reconstructing the sentence internally with their own grammar; and (3) reproducing it in speech. The reconstructive nature of EI tests is therefore likely to be a measure of implicit linguistic competence, or it can at least minimize the potential for a test-taker to rely on rote memorization and/or explicit/declarative L2 knowledge (Erlam, 2006, 2009). Accordingly, reconstructive EI requires a primary focus on comprehending the meaning of the stimulus before imitation, entails a delay between the stimulus and the imitation, is time-pressured, and assumes that ungrammatical stimuli will be corrected in imitation (Erlam, 2006, p. 472). Additionally, stimuli ought to exceed test-takers’ short-term memory capacity and target features should be embedded in the middle of the stimulus rather than at the beginning or end, and clear instructions should be provided (Tomita, Suzuki, & Jessop, 2009).
In the present study, we follow the procedures outlined by Erlam (2006, 2009; see also Ellis, 2006) (see below for specific methods). Erlam focused learners’ attention on meaning by using a “beliefs questionnaire.” In essence, for each stimulus (a statement), test-takers were prompted to state whether the statement was true, not true, or if they were not sure before being asked to repeat the statement in correct English. This also provided a delay between the presentation of the stimulus and the imitation. The imitation was also time-pressured to reduce the potential for conscious reflection on form. Finally, the EI test included grammatical as well as ungrammatical stimuli in order to test whether or not participants were able to correct ungrammatical forms during reconstruction. Our administration of the EI test differs significantly, however, in that dynamic assessment was used (see below). Briefly put, if the test-taker encountered difficulty in reconstructing the stimulus on the first attempt, the test administrator intervened to provide graduated assistance. We therefore acknowledge that the results gained from our test are not likely to represent a learner’s implicit L2 competence, at least in attempts beyond the first, but that is beside the point. As stated at the outset of the article, our goal in designing the test was to develop an approach to assessing L2 grammatical competence that includes both independent (and potentially implicit) competencies as well as still-maturing (and likely consciously controlled) abilities that are only revealed through collaboration with the test administrator.
Dynamic assessment
General principles and approaches
Dynamic assessment (DA) derives from Vygotsky’s proposal of the zone of proximal development (ZPD). Briefly put, the ZPD encompasses emerging capacities that are in the process of maturing but are not yet under independent control: “The [ZPD] defines those functions that have not yet matured but are in the process of maturation, functions that will mature tomorrow but are currently in an embryonic state” (Vygotsky, 1978, p. 86). In other words, the ZPD exists because certain maturing functions that have begun to develop are present (Chaiklin, 2003), but they still require support from a more competent collaborator.
Vygotsky (1978) therefore argued for a dual evaluation of a learner’s capacities that revealed, on the one hand, the zone of actual development (i.e., capacities that are under independent control), or the ZAD, and the ZPD (i.e., what becomes possible with support), on the other. This type of assessment—one that entails the provision of assistance during a task—therefore reveals a much wider range of learner capacities that are not typically visible during solo performance. At the same time, because DA involves interventive action (i.e., teaching) on the part of a test administrator, also known as the mediator, there is the potential for a learner’s capacities to further develop during the assessment itself. This is certainly a desired outcome of DA, and it increases the validity of the dynamically administered assessment task because improvement over the course of the task suggests that the learner responded positively to support, or human mediation (Poehner, 2011a). In other words, “learning during the test” validates the assessment because it is evidence that the mediation provided was developmentally appropriate (i.e., the learner was able to benefit from it without being over-assisted), which provides insight into the learner’s current and proximal (emerging) abilities. It is in this way that teaching and testing are dialectically united in DA (Poehner & Lantolf, 2010). We would also like to acknowledge the clear, if partial, relationship between DA as an approach to testing and some current trends in classroom-based formative assessment and assessment-for-learning (e.g., Gardner, 2012; Rea-Dickens, 2008). However, DA is a specific theoretical approach to understanding the relationship between teaching and testing (i.e., dialectics), which may be used not only for formative goals but for diagnostic and summative assessments as well. For a more elaborated discussion of L2 DA in relation to the wider assessment literature, the reader is referred to Poehner’s (2008) book-length treatment of the issue.
Approaches to conducting DA differ significantly in terms of how and when support is provided. Researchers such as Feuerstein, Rand, and Rynders (1988) have, for example, advocated an individualized, dialogic approach in which mediation is open-ended and negotiated in interaction between the mediator and test-taker (see also Sternberg & Grigorenko, 2002). Others (e.g., Brown & Ferrara, 1985; Budoff, 1987) have argued for more standardized approaches in which mediation is the same for all individuals who take a dynamically administered test. In their proposal to extend DA to the L2 field, Lantolf and Poehner (2004) offered the useful terms interactionist DA and interventionist DA to distinguish between more dialogic, open-ended approaches (interactionist DA) and the standardized delivery of mediation (interventionist DA). While interactionist DA is typically seen as involving more sensitive individual ZPDs because it allows the mediator to pursue any and all relevant issues to arrive at a nuanced understanding of the learner’s current and emerging abilities (see Poehner & van Compernolle, 2011), interventionist DA offers a standardized approach to scoring tests (e.g., by assigning weighted point values to assessments on the basis of how much assistance was required).
L2 DA work has predominately drawn on interactionist approaches to DA (e.g., Ableeva, 2010; Poehner, 2008), although some recent research has explored interventionist DA in a classroom setting (Davin, 2013; Lantolf & Poehner, 2011). One recent project directed by Lantolf and Poehner at the Center for Language Acquisition at the Pennsylvania State University has explored the design and implementation of interventionist computerized-DA (C-DA) tests of reading and listening comprehension in Chinese, French, and Russian (see, e.g., Poehner & Lantolf, 2013). The test includes prescripted standardized prompts and a weighted scoring system that provides various metrics related to independent abilities and responsiveness to mediation (see below). Our dynamically administered EI test follows the format of the C-DA tests, albeit in a face-to-face rather than a computer-mediated context. We discuss the C-DA format for standardization and scoring in greater detail in the following section.
Standardization and scoring
As noted above, interventionist DA involves standardized mediation, which is typically arranged in a prescripted order from least explicit to most explicit. In other words, mediation is graduated (Brown & Ferrara, 1985) in order to ascertain the minimum amount of support that a learner requires in order to succeed on a task. The less explicit and/or frequent the mediation, the closer the learner is to functioning independently.
Poehner and Lantolf’s (2013) C-DA tests of reading and listening comprehension in Chinese, French, and Russian are based on existing standardized tests (e.g., Advanced Placement and NY State Regents exams for foreign languages, the HSK test of Chinese) and involve multiple-choice comprehension questions. 2 Test-takers have four attempts to respond to each question. The first attempt is unassisted, as would be typical on a non-DA test. If the response is incorrect, however, the learner is provided with a low-level prompt (i.e., they are told the answer was incorrect and to try again, this time focusing on a smaller part of a text, highlighted on the reading tests or excerpted and replayed on the listening test). If the test-taker fails to respond correctly on the second attempt, he or she is provided with another, more explicit prompt and a smaller excerpt of the text is then highlighted. This continues (i.e., increasing explicitness of prompts and highlighting smaller parts of the text) until the test-taker either selects the correct answer or exhausts all four attempts, at which point the correct answer is provided and an optional explanation is offered.
The tests are scored on the basis of the number of attempts the test-taker needed for each question. A score of 4 is awarded for questions answered correctly on the first attempt. The point value decreases by 1 for each successive attempt (i.e., the second attempt is worth 3 points, the third attempt 2 points, etc.). Two main scores are then calculated. An “actual score” (Figure 1) reflects independent performance only—that is, only the first attempt to respond to the question (worth 4 points for a correct answer, 0 points for an incorrect answer) is taken into consideration.

Formula for actual score.
A “mediated score” (Figure 2) then includes points awarded for items correctly answered on the second, third, or fourth attempt, in addition to the actual score.

Formula for mediated score.
The comparison of actual and mediated scores provides insight into the degree to which test-takers were able to respond positively to the graduated mediating prompts provided during the test.
A third metric, proposed originally by Kozulin and Garb (2002) in a study of L2 English reading comprehension, more formally captures responsiveness to mediation as a reflection of learning potential. Thus, a learning potential score (LPS) (Figure 3) is, in essence, a way of quantitatively measuring a learner’s readiness to benefit from instruction because it is based on the learner’s responsiveness to mediation during the test.

Formula for LPS (adapted from Kozulin & Garb, 2002, p. 121, and Poehner & Lantolf, 2013).
Kozulin and Garb reported LPSs ranging from 0.47 to 1.21 (p. 121), with clusters of learners in a high range (LPS ≥ 1.0), a middle range (LPS between 0.79 and 0.88), and a low range (LPS ≤ 0.71). It should be noted that these ranges emerged from the scores produced in Kozulin and Garb’s study, and are not meant to be exact or objective scores defining high, middle, and low scores. Poehner and Lantolf (2013), for example, report several LPSs that fell between Kozulin and Garb’s middle and high ranges, and one that fell between the low and middle ranges. The principle, however, is that an LPS greater than 1.0 means that the learner responded very well to mediation because his or her score on the test essentially improved by a full standard deviation or more.
Poehner and Lantolf (2013) report a number of interesting results. Differences between actual and mediated scores were statistically significant, which the authors argue to be evidence that the scripted mediation was successful (i.e., learners were indeed able to respond to the mediation). The authors also showed that similar actual scores do not necessarily map onto similar mediated scores. For instance, several students who took the Chinese listening test had identical actual scores, and while two had similar mediated scores, one had a dramatically higher mediated score. This learner also produced a much higher LPS than the other two (1.07) and scored better on several transfer items. Poehner and Lantolf interpret this finding as evidence of different ZPDs. In short, despite identical actual scores, one learner was significantly closer to independent performance than the other two, a finding only revealed by comparing the learners’ responsiveness to mediation as reflected in the mediated score and LPS. Poehner and Lantolf performed additional statistical tests (e.g., correlating LPS and performance on transfer items), the results of which suggested that the LPS was a promising predictor of learning during the test. In other words, their results show that learners did not simply get higher scores because they were provided with help, but that the help supported them in learning, which they were able to transfer to new and more difficult tasks (i.e., transfer items).
Implicit and explicit knowledge in DA
Although our study does not aim to address the roles of implicit and explicit linguistic knowledge directly, we would like to comment briefly on our view of these issues as they relate to DA in general and to our dynamically administered EI test in particular. As noted earlier, EI is typically used as a means of assessing learners’ implicit linguistic competence—that is, the aim is to minimize the contribution of learners’ explicit (conscious) metalinguistic knowledge. This is precisely what Erlam (2006, 2009) argues to be the advantage of reconstructive EI wherein test-takers must first respond to a comprehension question before repeating the statement they have heard. The idea is that the time between hearing and repeating the statement surpasses working memory capacities. As a result, the ability to repeat the statement by rote memorization (working memory) is minimized, meaning that learners must rely more heavily on implicit competence (i.e., their own “internal” grammar) than metalinguistic knowledge.
However, because DA introduces support aimed at drawing learners’ conscious attention to the task at hand when they encounter difficulties, their metalinguistic knowledge comes into play. In the test developed for this study, support (or human mediation) prompts learners to consciously attend to the form of the sentence they are expected to repeat in attempts beyond the first. Test-takers’ responsiveness to mediation, measured as the LPS, is one index of their ability to make use of their conscious metalinguistic knowledge when support is provided. Additionally, and as we will see in the case analysis presented below, improvement over the course of the test (i.e., decreasing reliance on support) can be taken as evidence of microgenetic development—that is, increasingly independent functioning. One of the issues that this raises is whether, and to what extent, mediation in DA promotes greater control and speed of access to conscious metalinguistic knowledge during performance, or if it supports the development of a learner’s implicit (procedural, nonconscious) competence. Indeed, there are various theories of the relationship between implicit and explicit knowledge that have been extended to L2 learning, mainly as a way of addressing the potential for an interface whereby explicit knowledge may be converted into implicit competence (see Bowles, 2011).
Although our test does not aim to uncover the contributions of implicit and explicit knowledge or their relationship per se, we do espouse a noninterface position, specifically the position argued by Paradis (2009). Paradis presents evidence that implicit and explicit processes are subserved by different neurological systems that are not directly linked—namely, the procedural memory system and the declarative memory system. According to Paradis, because the two systems are neurophysiologically distinct, there is no direct interface. However, he acknowledges that the two systems may operate in parallel, and that explicit knowledge may indirectly affect implicit competence through language use. Importantly, Paradis also describes a developmental process by which access to metalinguistic knowledge during language use “can be sufficiently speeded up [i.e., accelerated] to be perceived” as automatic (i.e., procedural, implicit) (p. 118). Thus, in DA, it is possible that support provided during a test mediates the development of accelerated access to consciously controlled metalinguistic knowledge, which may lead indirectly to the acquisition of implicit competence in the future. In the present study, for example, support provided during the EI test mediated the learner in accessing and using his conscious knowledge of L2 forms when repeating stimulus sentences. Improvements observed over the course of the test, we argue, were evidence that the learner was gaining greater control over his use of explicit metalinguistic knowledge during performance.
Design, administration, and scoring of the elicited imitation assessment
Target features and test items
As an initial attempt at developing a dynamically administered EI task, we decided to follow the example of Ellis (2006), Erlam (2006, 2009), and Tomita et al. (2009), referenced above, in designing the instrument. We chose three word-final morphological features that represent, in theory at least, three different acquisitional stages: plural –s (early); past tense –ed (intermediate); and third-person singular –s (late). In addition, we included both correct and incorrect items in order to evaluate learners’ abilities not only to reconstruct well-formed sentences, but also to correct ungrammatical sentences. Finally, we decided to include two obligatory contexts in each item (i.e., in two different clauses). Target words that occur nearer to the beginning of a statement are typically easier to reconstruct than words that occur later. We avoided putting a target word in sentence-initial position and, where possible, in sentence-final position because first and last words are typically easier to remember than words in the middle of a sentence (Erlam, 2006). This was not always possible, however, especially in the case of plural –s items. In addition, we attempted to control for item difficulty following the recommendations outlined in Erlam (2009). The statements do not require any specific domain knowledge, follow similar syntactic complexity, and to the extent possible were about 20 syllables in length. Examples are given in Table 1. We note here that the items for plural –s in the example sentences are somewhat shorter that the other two target features. However, in all other sets, sentence lengths were comparable for all features (i.e., around 20 syllables). Feasibility of reconstruction was confirmed by a small group of English native speakers who were all able to repeat representative example sentences.
Example items.
Note: Target words are set in boldface italic font.
Our test includes 36 sentences (= 72 target words) divided into six sets of six sentences. Each set includes one correct and one incorrect sentence for each of the three morphological features illustrated in Table 1. The rationale for this format derives from our interest, discussed above, in tracking learner improvement/learning during the test (i.e., responsiveness to human mediation). The format allows us not only to measure differences between independent and mediated performance on the test as a whole, but also to track improvement, or microgenetic development (i.e., changes in the learner’s functioning from moment to moment) during the test (e.g., decreasing reliance on external support). In essence, each successive set of items represents an opportunity for the transfer of developing capacities that were mediated in preceding sets.
Test administration and scripted mediation
Our EI test is administered in a one-on-one format (i.e., a mediator and a student), using audio-recordings of the test items, described above. In order to attempt to focus test-takers’ attention on the meaning rather than on the form of the sentences, they are instructed to listen to each statement and then to circle on a worksheet whether they think the statement is true or not true, or if they are not sure (see Erlam, 2006, 2009). Then, they are prompted to repeat the statement in correct English under time pressure (5 seconds). A chiming sound is played at the beginning of each item to prime test-takers to listen, and a second chime signals the end of the 5-second period for repetition. A sample script of the audio is provided below:
(chime) (1-sec pause) When Edison invented the light bulb, life changed for everyone. (5-sec pause to allow test-taker to circle “True”, “Not true,” or “Not sure” on worksheet). Now, can you repeat the statement in correct English? (5-sec pause for repetition) (chime)
A 5-minute training session is provided at the beginning of the test in order to familiarize test-takers with this format. The sample items in the training session do not include the targeted features used in the actual test.
Participants have up to four attempts to repeat correctly the statement they have heard. The first attempt represents independent performance and follows the basic procedure outlined above. However, if the learner is not successful in reconstructing the sentence, human mediation is introduced. Table 2 provides a description of the procedures for providing graduated (i.e., from implicit to explicit) support to test-takers. The basic idea is to provide the least explicit form of support required, and only increasing explicitness when the test-taker demonstrates a need for more direct or targeted forms of assistance. For our test, this means moving from simply indicating that something is not right and providing a second chance to repeat the statement (step 2), to narrowing the focus to a specific word or words (step 3), to providing a metalinguistic prompt (step 4), to finally providing the correct form and an explanation of the problem (step 5). Each target word is worth a maximum of 4 points for the first attempt (i.e., independent performance), and the point value decreases with each additional attempt.
Outline of interventionist DA procedures.
In the case analysis presented below, the mediator (Author 1) was an experienced practitioner of DA (5 years of training and practice in various contexts). In other cases, and as we move forward with the test, other mediators (e.g., Author 2) will also administer the test. Although the scripted mediation standardizes the provision of support during the test, there will always certainly be some degree of variability between mediators. Our research protocol includes video-recordings of DA interactions, so as we continue our work, we will be exploring variability in the delivery of mediating prompts.
Scoring procedures
Three scores are calculated on the basis of learners’ performances on the test (see point values in Table 2, discussed above). A maximum of 96 points is possible for each feature (12 sentences × 2 target words each × 4 points). The entire test is therefore worth a maximum of 288 points. Following the approach outlined in Poehner and Lantolf (2013), actual, mediated, and learning potential scores are calculated (see above). Recall that actual scores reflect correct responses on the first attempt only (each worth 4 points), while mediated scores take into account correct responses on the second, third, and fourth attempts, which are worth 3, 2, and 1 points, respectively, in addition to the actual score. We use Kozulin and Garb’s (2002) formula for calculating the LPS, also following Poehner and Lantolf (2013).
An illustrative case analysis
We now turn to an illustrative case analysis of Kwanghoon’s (L1 Korean) performance on the dynamically administered EI test. We focus on the ways in which results obtained from the test can be used to create a profile of a learner’s competencies as well as to track microgenetic development during the test through both quantitative and qualitative analyses. We begin with an analysis of actual, mediated, and learning potential scores for the test as a whole. This is followed by an item analysis that allows us to track changes in performance across test items/sets.
Analysis of actual, mediated, and learning potential scores
Table 3 displays Kwanghoon’s actual scores, mediated scores, gains (i.e., difference between mediated and actual scores), and LPSs for each of the targeted morphological features as well as for the test as a whole.
Test scores and gains.
Note: The fraction in parentheses in the “Actual score” column reflects the number of items correct over the total number of items.
Kwanghoon’s actual scores show that he had good independent control over plural –s; yet he encountered some difficulty with past tense –ed and struggled considerably with third-person singular –s. As expected, his performance improved with support, as reflected by the mediated scores and gains. He made the most improvement with third-person singular –s (a gain of 29 points), followed by past tense –ed (a gain for 19 points), and then plural –s (a gain of 5 points). Chi-square tests revealed that the gains for past tense –ed and third-person singular –s are significant (p < .05), as is the gain for the total test score (p < .05).
These results suggest that despite struggling with some of the items, especially past tense –ed and third-person singular –s, Kwanghoon had some of the pieces in place to make progress. This is because he was able to respond successfully to mediation. Incidentally, Kwanghoon was able to correctly repeat all of the sentences with support (i.e., he never needed outright correction with an explanation). We argue this to be evidence of the presence of maturing functions (e.g., conscious control over morphology)—that is, a ZPD—that were rendered visible through interventive action (i.e., mediation). Our claim is bolstered by the high LPSs for all three targeted features, which reveal that Kwanghoon very positively responded to the mediation provided during the test, and that he therefore ought to be poised to benefit from further instruction. Incidentally, although the gain for plural –s was not statistically significant, Kwanghoon nonetheless produced a high LPS. This is because he does not have very far to go to achieve (nearly) flawless independent performance but would nonetheless benefit from some additional, though probably minimal, instruction—namely, additional practice linking his knowledge of word-final morphology to his performance abilities.
Item analysis
Our second level of analysis focuses on Kwanghoon’s performance on an item-by-item basis. In this way, it is possible to pinpoint specific items of difficulty during the test as well as changes in performance, which may be interpreted as evidence of microgenetic development. Table 4 displays the points awarded for each item on the test. Recall that for each set of sentences, there were two sentences for each of the targeted features (i.e., one correct, and one incorrect), and each sentence contained two target words, which we have labeled as “correct 1,” “correct 2,” “incorrect 1,” and “incorrect 2.” For example, in the sentence “On the weekend, everybody
Points awarded for each item.
The data for set 1 reveal flawless performance for plural –s. However, Kwanghoon required mediation for past tense –ed and third-person singular –s. Specifically, he encountered some difficulty correcting the ungrammatical (i.e., incorrect) sentence “In the 19th century, Germans
Set 2 shows a rather dramatic change in Kwanghoon’s performance. He maintained his flawless performance for plural –s, and he improved his control over past tense –ed (also flawless in set 2). Interestingly, there was a marked improvement in third-person singular –s. Kwanghoon required some support for “correct 1” and “correct 2,” but he was then able to repeat both target words in the incorrect sentence without mediation. This finding is suggestive of microgenetic development in that Kwanghoon appeared to be gaining more independent control over this morphological feature. His performance remained relatively stable in set 3, and in set 4 his performance was flawless for all items, furthering bolstering the claim that Kwanghoon was responding well to mediation during the test (i.e., microgenesis).
In sets 5 and 6, however, Kwanghoon’s performance began to falter. His scores dropped for all three targeted features, including plural –s, which he had not had any difficulty with up to this point. He was nevertheless still responsive to mediation, and his scores remained relatively high (cf. especially third-person singular –s in set 1 with sets 5 and 6). One potential explanation for this drop is test fatigue. When set 5 began, Kwanghoon had been taking the test for approximately 45 minutes, and it is possible that he was beginning to lose concentration. While this raises questions about the reliability of the results from sets 5 and 6 in some sense, it also provides important information—namely, that when Kwanghoon’s cognitive resources are strained for some reason (e.g., fatigue), his control over English morphology begins to falter. Presumably, this would not have happened if Kwanghoon were reconstructing the sentences nonconsciously through implicit linguistic competence. In other words, it is reasonable to presume that his performance on the test reflects consciously controlled (declarative) language use rather than implicit competence.
In sum, the item analysis provides additional details regarding mediated performance during the dynamically administered EI test. This includes evidence of microgenetic development, as revealed by decreasing reliance on human mediation over the course of the test, as well as when the learner is pushed to the point where his or her performance begins to falter (e.g., from test fatigue). In Kwanghoon’s case, it appears that he was responsive to mediation and therefore made gains in controlled performance (especially with regard to third-person singular –s), but also that his control over English morphology begins to break down as his cognitive resources are strained.
Summary
The results of the illustrative case analysis of Kwanghoon’s dynamically administered EI test provide a relatively detailed profile of his current and emerging abilities. At the time of the test, his control over plural –s was rather good, but he struggled to some extent with past tense –ed and especially with third-person singular –s. However, he responded positively to mediation. Not only were his mediated scores higher than his actual scores (as expected), but he produced high LPSs for all three targeted morphological features as well as for the test as a whole. Recall that the LPS reflects responsiveness to mediation and may be used as a predictor of readiness to benefit from further instruction. In other words, the higher the LPS, the more responsive the learner was to the mediation, meaning that he or she has the pieces in place to make more progress through instruction that is similar to the kind of support offered during the test.
As we have shown in the item analysis, Kwanghoon did indeed improve over the course of the test. The most marked improvement was with his control over third-person singular –s. We argue that this result provides evidence that the mediation provided during the test supported Kwanghoon in linking his conscious knowledge of English morphology to his performance. It is in this sense that we believe there is evidence of microgenetic development (Wertsch, 1985; for interventionist L2 DA, see Poehner & Lantolf, 2013) over the course of the test. As we noted earlier in this article, each successive set of sentences represented a context for the transfer of capacities that were mediated in previous sets. Accordingly, because Kwanghoon’s performance improved from set 1 to set 4, we believe that the mediation he received during the test supported him in moving toward greater independence. We also showed that his performance began to falter starting with set 5, which we suspect resulted from test fatigue. Nonetheless, his mediated scores remained relatively high, and his performance on third-person singular –s did not drop back down to his original scores on set 1.
Conclusion
In this paper, we have described the design, administration, and scoring of an initial attempt at developing an EI test of grammatical competence in L2 English that integrates mediation through dynamic assessment. We have also provided data from one exemplar test in order to illustrate how a learner’s profile may be obtained and interpreted, which includes actual scores based on independent performance, mediated scores based on responsiveness to support, and learning potential scores. In addition, we have shown how an item analysis can be used to track microgenetic development over the course of the test.
The major contribution of our test, we believe, is that it provides information about learners’ ZPDs (i.e., emerging capacities and responsiveness to mediation) that is otherwise unavailable in non-DA tasks. Such information (e.g., actual vs. mediated scores, LPS) may be used to make instructional decisions in the future (e.g., designing supplementary lessons in class) that are more sensitive to the learners’ needs than non-DA diagnoses because DA accounts for emerging as well as completed developmental processes. As noted earlier, a high LPS—based as it is on responsiveness to mediation during the test—suggests that a learner is poised to respond positively to further instruction that is similar to the mediation provided during the test. However, we caution, along with Poehner and Lantolf (2013), against understanding the LPS as an indication of whether, and to what extent, a test-taker can learn at all, as if learning potential were an immutable, fixed property of the test-taker. Instead, we suggest that other types of instruction might be relevant to learners who produce lower LPSs. This view, as Poehner and Lantolf (2013) point out, is more closely aligned with Vygotsky’s preoccupation with access and fairness in education (see also Poehner, 2011b). A low LPS simply indicates that the type of mediation provided during the test was not appropriate for a particular learner at that moment in time; he or she may certainly respond positively to other forms of mediation (e.g., open-ended, dialogically negotiated support) in another context.
Indeed, one of the limitations of our test, and of interventionist approaches to DA more generally, is that standardization is given priority over sensitivity to individual learner needs. Because our test aims to provide a snapshot of learner capacities and ZPD-relevant information in terms of actual, mediated, and learning potential scores, mediators are not free to pursue problems as they arise and offer any and all forms of mediation that may support the test-taker as is done in more dialogic, interactionist approaches to DA. We would like to emphasize, however, that our test is not meant to be the single form of assessment/pedagogy that learners have access to. Instead, we advocate a multipronged approach to assessment in general. Interventionist DA, such our test, foregrounds the assessment function of DA, whereas interactionist DA foregrounds the instructional function. Yet, because both approaches to DA are predicated on Vygotsky’s ZPD concept, both entail the unification of teaching and testing, and each may complement the other. For example, our current test may help to provide a preliminary diagnosis of learner capacities (i.e., foregrounding assessment), and such information may be used as the basis for continued support through classroom-based interactionist DA (i.e., foregrounding instruction), which could be followed up by a second interventionist DA test similar to what we have proposed in this article. As we continue our work in this domain, we certainly envision a second iteration of the test that is coordinated with specific instructional goals in the classroom.
Finally, we would like to stress that we are not advocating the specific format of our test as the only way to carry out the dynamic assessment of EI tests. In fact, as we administer the test to a greater number of students, we are learning more about what works, and what does not, in order to modify the format in a future iteration of the test. We have, for example, imagined an alternative or complementary task based on the EI of a narrative rather than a series of isolated and unrelated sentences. Our thinking is that a narrative may help to maintain the test-takers attention on the task at hand, especially if there is some intrigue in the story, thereby minimizing the risk of test fatigue, which we believe to have occurred during Kwanghoon’s assessment (discussed above). We also believe that extending the assessment to include more formal transfer items or an additional transfer task would help to bolster our claim that microgenetic development is possible during the test. Although we argue that each successive set of sentences in the current test represents a context for transfer from previous sets, the assessment would certainly be strengthened by including a more complex task at the end or as a delayed posttest in order to document the extent to which a student is able to transcend (Poehner, 2007) the demands of one task and transfer new and emerging abilities to a new context. For instance, although it is more labor intensive, open-ended free language production tasks may be integrated into assessment procedures in order to ascertain the extent to which learners control target features in spontaneous, communicative situations. We would also like to note that we do not intend to limit the scope of our test to morphology. Indeed, syntax, lexis, and pragmatics are also relevant L2 domains that could be dynamically assessed through EI. To be sure, much remains to be done in this domain, and it is our hope that the test that we have discussed in this article may serve as a point of departure for future work that integrates DA principles into L2 assessment.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
