Abstract
Multiple-choice (MC) analogy items are often used in cognitive assessment. However, in dynamic testing, where the aim is to provide insight into potential for learning and the learning process, constructed-response (CR) items may be of benefit. This study investigated whether training with CR or MC items leads to differences in the strategy progression and understanding of analogical reasoning in 5- to 6-year-olds (N = 111). A pretest-training-posttest control group design with randomized blocking was utilized, where two experimental groups were trained according to the graduated prompts method. Results show that both training conditions improved more during dynamic testing compared with untrained controls. As expected, children in the CR condition required more prompting during training and showed different strategy-use patterns compared with the MC group. However, the quality of solution explanations was significantly better for children in the CR condition. It appears that possible performance advantages of training with CR items are most apparent when active processing is required. In the future, we advise including items such as CR or analogy construction in dynamic testing that allow for fine-grained analysis of strategy-use to further discern differences in children’s analogical reasoning understanding.
Introduction
Dynamic testing, often contrasted with static testing such as traditional IQ assessment, aims to provide a measure of abilities that are not yet fully developed (e.g., Elliott, Grigorenko, & Resing, 2010; Sternberg & Grigorenko, 2002). Where static tests measure previously acquired knowledge at one point in time, dynamic tests focus on potential for acquiring new knowledge across multiple testing occasions. Dynamic testing procedures further differ from static testing in that feedback is provided by the examiner to facilitate learning during assessment. Dynamic tests often consist of a pretest-training-posttest design where structured feedback is provided during one or more training sessions. The effectiveness of various types of training has been demonstrated in a dynamic testing context (e.g., Day, Engelhardt, Maxwell, & Bolig, 1997; Resing, Tunteler, De Jong, & Bosma, 2009). However, feedback type not only influences strategy-use, learning, and transfer but also problem format (e.g., Luwel, Foustana, Papadatos, & Verschaffel, 2011). For example, open-ended items are generally considered more difficult to solve (Behuniak, Rogers, & Dirir, 1996; Hohensinn & Kubringer, 2011; In’nami & Kozumi, 2009), but provide more diagnostic information (Birenbaum & Tatsuoka, 1987; Birenbaum, Tatsuoka, & Gutvirtz, 1992; Currie & Chiramanee, 2010). Problem construction or constructed-response (CR) formats may lead to greater learning and transfer than multiple-choice (MC) solutions (Harpaz-Itay, Kaniel, & Ben-Amram, 2006; Martinez, 1999). In this study, the aim was to investigate the effects of problem format in a dynamic testing context on learning and strategy-use. We examined whether training using figural analogy CR problems, in which the solution must be constructed from figure features, would lead to greater progression than training with MC problems within a dynamic test of analogical reasoning.
Dynamic testing is often conducted with analogical reasoning tasks (Resing, 2000), such as the item shown in Figure 1. Analogical reasoning is the cognitive process of transferring information from a known source to a new but similar context (Sternberg & Gardner, 1983). Given the essential role analogical reasoning plays in learning (Goswami, 1992), it is important to investigate which factors best promote analogy solving in children.

Example figural analogy item from AnimaLogica in multiple-choice (MC) and constructed-response (CR) item formats. In MC-items answer options represent possible outcomes of common strategies (from left to right) non-analogical, correct, duplicate, partial (missing animal transformation) and partial (missing orientation transformation) respectively. In CR-items learners construct solutions by placing animal card(s) with correct animal, color, size and quantity (1 or more cards can be selected) in the empty box in the correct orientation (changed by turning over) and position (top, middle or bottom).
The various training formats used in dynamic tests generally show that children improve their skills through instruction and that posttest scores provide a better indication of their potential ability (Fabio, 2005; Sternberg & Grigorenko, 2002). Furthermore, utilizing graduated prompts enables us to determine the amount and type of instruction a child requires to perform at this potential level (e.g., Ferrara, Brown, & Campione, 1986; Resing & Elliott, 2011). In the case of inductive reasoning tasks, graduated prompting was shown to be more effective than practice with regard to both accuracy and strategy development (Ferrara et al., 1986; Resing et al., 2009). Young children’s analogical reasoning improves with self-explanation, feedback (e.g., Cheshire, Ball, & Lewis, 2005; Siegler & Svetina, 2002; Stevenson, Resing, & Froma, 2009; Tunteler, Pronk, & Resing, 2008), and graduated prompting (Stevenson, Heiser, & Resing, 2013; Tunteler & Resing, 2010). Training leads to a decrease in duplication errors, in which one of the analogy terms is copied, and an increase in partial and correct analogical solutions. Although much research has been conducted on the effects of training on analogical reasoning, few studies have investigated the influence of task format on learning to solve analogies.
In the context of dynamic testing, item formats are interesting for two reasons. First, CR items have been found to provide diagnostic advantages in determining where a pupil goes wrong if the solution is incorrect (Birenbaum & Tatsuoka, 1987; Martinez, 1999). This diagnostic information is valuable for process-oriented aims of dynamic testing such as examining strategy-use and instructional needs (e.g., Resing et al., 2009). In the case of analogies, solution categories, often referred to as strategies in the literature (e.g., Siegler & Svetina, 2002; Tunteler et al., 2008) such as duplication or partially correct, can be determined directly rather than inferred from the limited MC options. Furthermore, diagnosis of systematic errors such as continually disregarding a specific transformation, for example, orientation, can be more accurate as errors are not limited to the possible MC answers.
The second reason is that CR formats may lead to better learning and understanding of analogical reasoning. First of all, constructing a response may elicit a more effective solution process as transformations from A:B can be applied sequentially and thereby be less taxing of working memory (Bruttin, 2011)—a hypothesized constraint in children’s analogical reasoning (e.g., Richland, Morrison, & Holyoak, 2006). Second, response construction appears to require better understanding of the measured construct than MC-solution (Bridgeman, 1992; Martinez, 1999). With an MC item, the solution can be recognized among a small selection of options, whereas in response construction, the solution as a whole is not available but must be constructed and therefore cannot be selected based merely on recognition (see Figure 1).
Solving analogies and matrices with MC items is related to the number and type of available options (Vigneau, Caissie, & Bors, 2006). For example, Jarosz and Wiley (2012) demonstrated that the more salient the distracters are, the more strongly matrix solving is predicted by working memory capacity, indicating that the type of distracters present plays a crucial role in how strongly working memory is taxed. For adults, the most salient distracters are partial solutions. Young children often rely on perceptual matching and are strongly influenced by the presence of duplicates (Richland et al., 2006; Siegler & Svetina, 2002; Thibaut, French, & Vezneva, 2010). Having duplicates available as MC options may lead to a misdiagnosis of young children’s understanding of analogical reasoning (Birenbaum et al., 1992; Goswami, 1992). For example, a young child may not be capable of inhibiting a duplication response when it is present as an MC option, but will construct a different solution when not distracted by this option.
These pitfalls of MC-solving, such as recognition and being distracted by MC options, could be said to fall under the response elimination solution method, where each of the options is tested until the best fitting solution is chosen. Earlier studies indicate that this method is often used by those with weaker analogical reasoning skills, whereas constructive matching, where the problem is solved before constructing or selecting the solution, appears to be used by more advanced reasoners (Bethel-Fox, Lohman, & Snow, 1984; Vakil, Lifshitz, Tzuriel, Weiss, & Arzuoan, 2010). Although response elimination may be possible in a CR item, it is far more time-consuming as one element undergoing m possible transformations (e.g., color, size) with n values per transformation (e.g., red, yellow, blue; big, small) leads to n1 × n2 × . . . nm possible elements to select for a solution (e.g., big red, small red, big yellow, small yellow, etc.)—which is far greater than the standard four or five answer options in an MC item. Therefore, a less effortful and more effective solution process for CR items would involve sequential solution of each of the transformations within the item, such as employed in the constructive matching method. Perhaps using CR items, in which salient distracters in the form of partial solutions or duplicates are not readily present, is of additional benefit when teaching the preferred constructive matching solution process to children.
In this study, we investigated the role of item format during training on children’s progression in analogical reasoning during dynamic testing: CR versus MC. Our first research question concerned the effect of the graduated prompts training. In accordance with the literature, we expected that (Hypothesis 1 [H1]) children receiving the graduated prompts training would show greater improvement in analogical reasoning compared with the control group (Ferrara et al., 1986; Tunteler & Resing, 2010). Our second research question focused on the effects of item format on performance during training. We expected (Hypothesis 2a [H2a]) the CR items to be more difficult than MC items (Behuniak et al., 1996; Currie & Chiramanee, 2010; Martinez, 1999), but (Hypothesis 2b [H2b]) that training with the CR format would lead to better understanding—revealed by better verbal explanations of the solution—compared with MC. Finally, we investigated item effects on strategy progression, by comparing strategy-use patterns of the two training conditions. We expected (Hypothesis 3 [H3]) CR-trained children to utilize more advanced analogical reasoning strategies, that is, fewer duplications and more partial and correct solutions, than the MC group during training and on posttest measures (Harpaz-Itay et al., 2006; Tunteler et al., 2008).
Method
Participants
Participants were 111 children (54% girls; M = 64, SD = 7 months). All children were native Dutch speakers, from two elementary schools in the Netherlands selected based upon their willingness to participate. Written informed consent was obtained from the parents.
Design
A pretest-training-posttest control group design with randomized blocking was used. Children were blocked into one of three conditions: (a) training with MC items, (b) training with CR items, and (c) a control group. Randomized blocking was based on visual exclusion performance, a test of inductive reasoning ability (Bleichrodt, Drenth, Zaal, & Resing, 1987), classroom, and gender. All children solved the 20 pretest items during the first session. In the following two sessions, trained children received the graduated prompts training with either MC or CR items. The children were trained on four items per session with eight items total—limiting the duration of each session to 20 min. The control group solved maze coloring tasks. During the last two sessions, posttests—parallel versions of the pretest—were administered. Sessions took place weekly in a quiet location at the child’s school, except for the last session which took place 2 weeks after the first posttest.
AnimaLogica: Dynamic Test of Figural Analogical Reasoning
This dynamic test of analogical reasoning comprised of an introduction task, pretest, training, and posttest. The visual analogies consisted of colored (red, yellow, or blue) animal figures, classically presented in 2 × 2 matrix format where animal figures occupied three squares and the lower right or left quadrant was empty. Children had to infer the relation between two pictures (horizontally or vertically) and apply this to a third picture to solve the analogy (A:B::C:?). Rule-based item generation, where item difficulty can be predicted based on the number of figures and transformation rules applied (e.g., Mulholland, Pellegrino, & Glaser, 1980), was used to develop items of varying difficulty. Transformations comprised the following dimensions: (a) animal, (b) color, (c) size, (d) position, (e) orientation, and (f) quantity.
Introduction task
Six items consisting of a pair of animals that differed by one of the six transformations were presented (Stevenson et al., 2009). The children were asked to name the animals and explain what changed; mistakes were corrected using a standardized protocol. This task was used to ensure that the children were familiar with the objects and transformations in the tests and training as this is a prerequisite for children’s analogical reasoning (Goswami, 1992).
Pretest and posttests
The 20 analogy problems used during the pre- and posttests were solved by choosing a picture from five alternatives at the bottom of the task. The solution could be selected from five systematically constructed alternatives: (a) correct answer, (b and c) partial answer: missing one transformation, (d) duplicate answer: a copy of the term above or next to the empty box, and (e) other non-analogical answer: missing two or more transformations (see Figure 1). The pretest and posttest items were isomorphs to avoid the children solving the items in the next session from memory (Freund & Holling, 2011). To this end, the color and animal-type of each figure were systematically changed (e.g., all red figures changed to blue; all lions changed to dogs), but the item comprised the exact same transformations and similar difficulty (Stevenson, Hickendorff, Resing, Heiser, & de Boeck, 2013).
Training items
The eight training items were presented in either MC or CR format, depending on the training condition. In the MC training, the items were presented in the same format as the pre- and posttests. In the CR training, the solution was constructed from a number of animal cards representing the six transformations (see Figure 1); each animal was available in the three colors, two sizes (large, small), and printed on two sides, so by turning the card over, the animal’s orientation could be changed (looking left by default or turning over to look to the right). Quantity was specified by selecting one or more animal cards, and position was selected by the placement in the empty square.
Training procedure
During training graduated prompting a standardized, adaptive training procedure was used (e.g., Ferrara et al., 1986; Resing, 2000; Tunteler & Resing, 2010). Each item began with a general instruction. The examiner recorded the child’s answer, and if this was incorrect, a prompt was provided. If another mistake was made, the next prompt, consisting of more specific instruction, was given. This stepwise approach begins with general, metacognitive prompts, such as focusing attention, followed by cognitive hints emphasizing the transformations in the item and finally step-by-step scaffolds to solve the problem. The same training procedure and instructions were used for the MC and CR conditions; however, after the scaffolds prompt, MC-selection or response construction, respectively, were modeled for the child. Once the child answered correctly, whether this was after the first or last prompt, he or she was asked to explain the correct solution. The trainer then provided an explanation of the solution—regardless of the correctness of the child’s explanation. No further prompts were given, and the next item was administered.
Scoring
The children’s analogy solutions were scored in two ways. First, scores based on correct/incorrect solutions were obtained using Rasch estimates from item response theory (IRT). IRT models were chosen as these seem to circumvent statistical problems (e.g., unreliability of change scores, scaling of change is not necessarily the same for persons with different pretest scores) encountered when using proportion correct as the dependent variable in measuring performance change over time (Embretson, 1987; Embretson & Reise, 2000). Rasch model scores are based on a person’s ability as well as item difficulty. Rasch estimates were obtained for a joint logistic scale of pretest and posttests performance using Andersen’s (1985) Rasch Model for repeated measurements, allowing us to reliably investigate pretest to posttest changes—which was essential to answering our research questions.
The second way the children’s pretest and posttest solutions were categorized was into four strategies based on the literature (e.g., Cheshire et al., 2005; Siegler & Svetina, 2002; Tunteler et al., 2008; Tunteler & Resing, 2002): (a) correct analogical solutions as correct answer selection or construction, (b) partial analogical were solutions missing one transformation, (c) duplicate non-analogical solutions were copies of the B or C term, and (d) other non-analogical solutions as answer choices missing more than one transformation (see Figure 1). A duplication error was always scored as Category 3—even if the duplicate was missing only one transformation.
Two measures were obtained from the graduated prompts training for each item: (a) the number of prompts required and (b) quality of explanation. The categorization of the children’s first solution to each training item was used in analyses of strategy progression. The explanation of the correct solution was quantified by the number of unique transformations the child mentioned with regard to similarities and differences between the A, B, and C element features in the analogy (Stevenson et al., 2009). Given that there were six transformations in each analogy, the maximum score per item was 6. No answer, irrelevant responses about why the analogy was correct (e.g., “I thought really hard.”), or responses based on the features without explicit analogical reasoning (e.g., “This is a blue lion. I like blue.”) were all scored as 0.
Results
Initial Group Comparisons
The children’s initial level of inductive reasoning, measured with the visual exclusion task, did not differ between the three conditions according to an analysis of variance (ANOVA), F(2, 108) = .21, p = .814. The average age per condition also did not differ, F(2, 108) = .15, p = .860. Initial performance on the figural analogies was related to performance on the exclusion test (r = .37, p < .001) and age (r = .41, p < .001).
Psychometric Properties
Classical test theory
The reliability of the pretest, α = .78, is satisfactory. The reliabilities for the first posttest per condition were αMC = .85, αCR = .90 and αcontrol = .81; αMC = .88, αCR = .88 and αcontrol = .85 for the second posttest. The reliabilities are considered good. The reliabilities of the training scale (eight items), calculated using the number of required prompts per item, were satisfactory: .83 and .78 for the MC and CR conditions, respectively. The test–retest reliability for the control group 3 weeks after initial testing was r = .83, p < .001 (N = 39), indicating good stability over time. The proportion correct of the pretest items ranged from .11 to .80 (M = .31, SD = .42); on the first and second posttest, this was .23 to .91 (M = .50, SD = .46) and .23 to .95 (M = .56, SD = .45), respectively.
Item response theory
The independent Rasch model parameters were estimated for the pretest and posttests using the marginal maximum likelihood (MML) estimation procedure, θ ~ N(0, 1), from the ltm package for R (Rizopoulos, 2006). A parametric Bootstrap goodness-of-fit test using the Pearson’s χ2 statistic was used to investigate model fits of each test occasion using the same ltm package. The model fit of the first posttest was acceptable (p = .36). For the pretest and second posttest, this was less satisfactory (p = .04 and p = .04). However, the item fit statistics for the items of both measurement moments were generally satisfactory (p > .05) and, therefore, the models were deemed acceptable. The correlation between the item difficulty parameters for the items of the pretest and first posttest was moderate, r = .67, and the correlation between the two posttests was strong, r = .82. We therefore considered the application of Andersen’s Rasch model for repeated measurements appropriate. Fit statistics (smaller is better) for the Andersen model estimated using the lme4 package for R (Bates, Maechler, Bolker, & Walker, 2013) were Akaike information criterion = 6,844, Bayesian information criterion = 7,021, log likelihood = −3,396.96 with 26 parameters. The random effects (ranef) function in the same package was used to extract the person Rasch-scaled estimates per testing occasion.
General Effect of Training
Our first research question concerned the effect of the graduated prompts training on young children’s analogical reasoning. We expected trained children’s figural analogical reasoning would improve more than that of the control group. This was investigated using repeated measures (RM) ANOVA with Rasch-scaled ability estimates per session as dependent variable (see Table 1, for basic statistics), with Session as within-subjects variable and Condition as between-subjects variable. The analysis revealed a main effect for Session—Wilks’s λ = .38, F(1, 108) = 177.12, p < .001, partial η2 = .62—showing that children, on average, progressed in figural analogy solving across Sessions. The significant interaction effect for Session × Condition—Wilks’s λ = .92, F(2, 108) = 4.82, p = .010, partial η2 = .08—indicates that there were differences between conditions in average progression. Simple contrasts showed that both the CR and MC training groups improved more than the control group—F(1, 73) = 4.31, p = .041, partial η2 = .06 and F(1, 73) = 8.92, p = .004, partial η2 = .11, respectively—confirming H1. A comparison between the two training groups (MC and CR) showed no differences in progression from pretest to posttest, F(1, 70) = 1.18, p = .282, partial η2 = .02.
Basic Statistics of Rasch Ability Estimates for Figural Analogies Pretest and Posttests.
Note. MC = multiple-choice; CR = constructed-response.
Comparison of Training Item Format
Our second question pertained to the effect of training item format (MC or CR) on performance during the graduated prompts training. We hypothesized that (H2a) CR items would be more difficult than MC items, but that at the same time (H2b) CR-trained children would provide more advanced answer explanations. See Table 2, for descriptive statistics.
Basic Statistics of Percentage Required Prompts and Quality of Explanations During Training per Condition.
Note. CR = constructed-response; MC = multiple-choice.
Prompting
The number of prompts required by the children was analyzed to investigate the difficulty of the training items. An RM ANOVA with number of prompts as the dependent variable, one within factor (Items 1-8) and one between factor (Condition), was conducted. There was a main effect for Item—Wilks’s λ = .37, F(7, 64) = 15.40, p < .001, partial η2 = .62—showing that children generally required fewer prompts during the course of training (see Figure 2a). The significant Item × Condition effect, Wilks’s λ = .65, F (7, 64) = 4.92, p < .001, partial η2 = .35, indicates a difference in rate of change in required prompts between the two conditions. Analysis of linear trends revealed a significant Item × Condition interaction, F(1, 70) = 23.85, p < .001, partial η2 = .25); as can be seen in Figure 2a, children in the CR condition generally decreased in required prompts over the items, whereas prompt requirements for children in the MC group was relatively steady. Finally, the significant between-subjects effect for Condition, F(1, 70) = 38.49, p < .001, partial η2 = .36, indicates that MC-trained children required fewer prompts on average than those trained with CR items, in accordance with H2a.

Progression of children’s mean number of required prompts during training per item (a) and the quality of the children’s verbal explanations during training per item (b) with separate lines for each training condition (CR = constructed-response items and MC = multiple-choice items).
Explanation quality
Children’s explanations of the correct solution were also analyzed using RM ANOVA with explanation quality as the dependent variable, one within factor (Items 1-8) and one between factor (Condition). Again, there was a main effect for Item, Wilks’s λ = .09, F(7, 64) = 89.12, p < .001, partial η2 = .91, showing that on the whole, children’s explanations improved across items (see Figure 2b). The interaction effect for Item × Condition, Wilks’s λ = .73, F(7, 64) = 3.34, p = .004, partial η2 = .27, indicates that differences between MC and CR depended on the administered item. The significant between-subjects effect, F(1, 70) = 12.25, p = .001, partial η2 = .15, also shows that children in the CR condition provided more advanced explanations on the whole compared with children in the MC condition, confirming H2b.
As can be seen in Figure 2b, differences between Conditions appear to occur during the second training session (Items 5-8). This was checked by an ANOVA for each session with one between factor (Condition) and average explanation quality per session as dependent variable (p values adjusted for multiple comparisons). The two conditions did not differ in explanation quality during the first session, MCR = .50, SDCR = .14; MMC = .45, SDMC = .07; F(1, 70) = 3.85, p = n.s., partial η2 = .05, but the CR group outperformed the MC group during the second session, MCR = .67, SDCR = .15; MMC = .53, SDMC = .12; F(1, 70) = 19.26, p < .01, partial η2 = .22, thus confirming our observation.
Comparison of Training Item Format: Strategy-Use Patterns
Our third research question focused on the effect of training item format (MC or CR) on strategy-use patterns. Here, we compare the strategies of the MC and CR training group across each of the dynamic test sessions. Children’s solutions were categorized as correct, partially correct, a duplicate, or other. We hypothesized that (H3) training with CR items would lead to more advanced strategy-use than training with MC items. See Table 3 for descriptive statistics.
Percentages of Strategy-Use per Session for MC and CR Training Conditions.
Note. MC = multiple-choice; CR = constructed-response.
As can be seen in the depiction of strategy progression in Figure 3, the children generally increase correct solutions from pretest to posttests and decrease incorrect strategies. Yet, some differences between the two conditions seem apparent, especially during the training sessions. Changes in strategy-use across sessions were analyzed, as well as possible differences between MC and CR training conditions, with an RM multivariate analysis of variance (MANOVA; 2 Conditions × 5 Sessions). The three dependent variables were percentage strategy-use for the “correct,” “partial,” and “duplicate” strategies. The “other” strategy was not included because the four strategies form a linear combination and are thus dependent upon each other, which violates an assumption for ANOVA. There was a main effect for Session, Wilks’s λ = .13, F(12, 59) = 34.32, p < .001, partial η2 = .88, which implies that strategy-use differed from session to session. A significant interaction effect for Session × Condition was present, Wilks’s λ = .58, F(12, 59) = 3.62, p = .001, partial η2 = .42, indicating that the two conditions differed in proportions of strategy-use across sessions, confirming H3.

Strategy-use patterns of children in the two training conditions are shown: (a) multiple-choice item format and (b) constructed-response format.
Given the significant Session × Condition interaction, further analyses were needed to pinpoint when these differences occurred. To this end, MANOVAs were conducted per session with Condition as between-subjects factor and the three strategies (correct, partial and duplicate) as dependent variables. Results of the separate analyses are shown in Table 4 (p-value significance < .01 due to five additional comparisons). A significant difference in strategy-use was found only in the first training session, Wilks’s λ = .71, F(3, 68) = 3.40, p < .01, partial η2 = .29. As can be seen in Figure 3, children trained with CR items were more likely to provide solutions stemming from a partially correct strategy than a duplication strategy, whereas the opposite was true for children trained with MC items. No significant differences were found between conditions on pre- and posttests.
MANOVA Results per Session With Condition as Between-Subjects Factor and the Three Strategies (Correct, Partial, and Duplicate) as Dependent Variables.
Discussion
The main aim of this study was to investigate the influence of item format on dynamic testing performance of 5- to 6-year-olds on figural analogical reasoning tasks. The results demonstrate that training in a dynamic testing context with the graduated prompts method leads to greater improvement in analogical reasoning than in untrained controls. On the whole, no differences in improvement were found between the MC and CR training conditions. However, item format did lead to differences in performance during the two training sessions. Children trained with CR items provided better quality explanations of analogy solutions compared with those trained with MC items, despite the greater difficulty the children had solving CR items evidenced by the requirement of more attempts and instruction during training. Also, different strategy-use patterns between the two training groups were found. These results are now discussed in further detail.
As with previous research on training of analogical reasoning in children, we found that that training led to greater improvements when compared with untrained controls (e.g., Siegler & Svetina, 2002; Tunteler et al., 2008). Although we expected that training with CR items would lead to greater progression than training with MC items, the two training conditions did not differ in their improvement after training. On one hand, one could argue that there is no advantage to CR items, and the advantage in the study of Harpaz-Itay et al. (2006) clearly lies in the construction of the item, indicating that CR may not tap into deeper processing components to the same degree as item-construction. On the other hand, any possible advantage in CR may not have been apparent on the MC items of the posttest. For example, Gay (1980) found that when college students were instructed and repeatedly tested in behavioral science knowledge using MC or CR items, no differences were apparent on the MC posttest items, but the advantages of CR training were apparent on CR posttest items. Including CR items on pre- and posttests in future research could control for this possibility. Furthermore, the items were quite difficult for all participants and the children in the CR group may have had difficulty transferring their developing skills to a different problem format. Generally, children only show knowledge transfer once they have mastered the correct strategies to solve a task (Siegler, 2006). Nevertheless, despite the posttest advantage for the MC-trained children given that this was in the same item format, they did not perform better than the CR-trained children.
Interestingly, when performance during the training sessions was analyzed, differences between the two training groups emerged. CR-trained children provided better quality explanations of why their analogy was correct compared to MC-trained children during the second training session. Training with CR in the first session may have led to a better understanding of analogical reasoning. However, further research is needed to support this claim; possibly, including items or questions in the posttest that require more active processing, such as self-explanation or item-construction, would provide the children with more opportunity to demonstrate the depth of their understanding.
As with previous research, we found that CR items were more difficult than MC items (e.g., Behuniak et al., 1996; In’nami & Kozumi, 2009; Martinez, 1999); children in the CR condition required more prompts to solve these items and applied fewer correct strategies during training compared with the MC condition. Interestingly, the erroneous strategy used most often by the CR group was partially correct, whereas duplication was the most common erroneous strategy in the MC group. Duplication is considered the most common non-analogical strategy used by young children (e.g., Cheshire et al., 2005; Siegler & Svetina, 2002); however, analogy strategy-use is usually assessed with MC items. In the CR training condition children’s erroneous strategies comprised more partial rather than duplicate solutions; this most likely demonstrates that these children had a good understanding of the solution process as otherwise a nearly correct solution would not be possible to construct, but made a mistake—that is, neglected to process one of the transformations (e.g., incorrect color or position of solution). In other research with CR items, partial strategies increase with practice (Stevenson, Touw, & Resing, 2011) and training (Tunteler et al., 2008; Tunteler & Resing, 2010). Perhaps training, especially with CR items, encourages the transition from non-analogical (e.g., duplicate solutions) to analogical solutions (e.g., correct or partially correct solutions) through the absence of duplicates as distracters. These partial strategies are most likely due to the high cognitive load of the items in combination with working memory constraints, a well-known bottleneck in young children’s analogical reasoning (Richland et al., 2006; Tunteler & Resing, 2010). A study addressing the role of working memory in MC versus CR items is deemed essential to understanding the underlying reasons for this discrepancy in strategy-use—partial credit IRT analyses would be appropriate given enough participants (Embretson & Reise, 2000).
In sum, CR items may improve learning and provide a means for obtaining more fine-grained analyses of strategy-use and are therefore deemed useful in the dynamic testing context. The possible diagnostic advantages of CR items were not examined in this study, but given its relevance for dynamic testing and individualized instruction, we recommend future research to investigate this. CR items may be very beneficial for process-oriented diagnostics, with the goal of adapting instruction to individual needs where the analysis of strategy progression and extent of understanding are of particular interest (e.g., Grigorenko, 2009; Jeltova et al., 2007).
Footnotes
Acknowledgements
The authors thank Carina de Klerk for her assistance with organization, data collection, and coding, and Mellany Croes for her assistance with material development and data collection.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
