Young Learners

Abstract

The Early Literacy Knowledge and Skills (ELKS) instrument was informed by the work of Ferreiro and Teberosky based on the notion that young children could be differentiated according to levels of sophistication in their understanding of the rules of written language. As an initial step to evaluate the instrument for teaching purposes, the present study examines its psychometric properties in terms of internal consistency reliability, model-data fit, item discrimination, match between item difficulty and ability range in the sample, and item difficulty according to Ferreiro and Teberosky’s theory of early literacy development. Overall, the ELKS instrument showed good psychometric properties in these areas, although two items in particular may require further investigation and possible revision according to the model-data fit index. Future directions to investigate the potential classroom application of the instrument are suggested.

Keywords

early literacy early childhood development developmental progression standardized assessment item response theory psychometrics

The work of Ferreiro and Teberosky (1979/1982) on preschool children’s early literacy concepts has influenced some of our current understanding of early literacy development (e.g., Clay, 2002; Goodman, Reyes, & McArthur, 2005; Sulzby & Teale, 2003). Adopting a constructivist perspective, Ferreiro and Teberosky conceptualized early literacy as a problem-solving process where young children construct different hypotheses to solve the mystery of written text. According to Ferreiro (1991), children form hypotheses about the rules of written language to understand them and apply them in ways that are widely accepted. Based on their observations of 4- to 6-year-old Argentinian children (N = 69) completing early literacy tasks, Ferreiro and Teberosky suggested different progression levels to describe the way young children develop a more sophisticated understanding of the rules of written language. These developmental levels have been identified in the early literacy development of monolingual and bilingual children in languages such as Japanese (Kato, Ueda, Ozaki, & Mukaigawa, 1998), Spanish, Hebrew (Tolchinsky & Teberosky, 1998), and English and Chinese (Yaden & Tsai, 2012). Deaf children have also been found to display these developmental levels (Ruiz, 1995; Watson, 2009).

Ferreiro and Teberosky (1979/1982) argued that their findings support the idea that students need to be able to demonstrate their own understanding of literacy in the classroom. They asserted that children at different developmental levels may construct an understanding of literacy that is different from adults. By allowing children to demonstrate their own understanding of literacy, teachers can identify the misconceptions behind the reasoning of individual children and find developmentally appropriate ways to correct those misconceptions. Rather than asking children to rote learn the conventional conception of literacy, Ferreiro and Teberosky suggested that literacy teachers need to incorporate children’s existing knowledge, using language and concepts that are appropriate to their cognitive developmental stages. Their study findings therefore have specific implications on the teaching and assessment of literacy in the preschool and early primary years.

Drawing upon Ferreiro and Teberosky’s (1979/1982) work, the Early Literacy Knowledge and Skills (ELKS) instrument (Barringer, Brown, Chan, & Care, 2009) contains tasks that provide children the opportunity to explain how they make sense of written language. A scoring scheme was formulated to allow children’s responses to be rated according to the developmental level demonstrated. The instrument was developed as part of the Young Learners’ Project (YLP; www.education.unimelb.edu.au/younglearners/), a 6-year study (2007-2012) that aimed to identify personalized early literacy teaching strategies for children in preschool and the first year of school. Although the instrument was initially designed as a data collection tool for the YLP (Barringer, 2009), the responses that can be elicited using the instrument may also be useful for informing teaching.

The purpose of the present study was to examine the psychometric properties of the ELKS instrument as an initial step to evaluate whether the data drawn from the instrument could be useful for informing early literacy teaching. Item response modeling was used to analyze the data and to compare the difficulty ordering of the ELKS items and item steps with Ferreiro and Teberosky’s (1979/1982) theory of early literacy development. Other indicators from item response and classical test theories were also examined, such as internal consistency reliability, the fit between the items and the partial credit model (PCM), how well the items discriminate between children of different ability levels, and the match between the difficulty of item steps and children’s ability estimates. For the purpose of this study, literacy was defined as the ability to read (decoding and comprehension) and write (spelling and expression of ideas) using conventional media (print and pencil). Consistent with Whitehurst and Lonigan’s (1998) definition of emergent literacy, early literacy was used to refer to the conceptual knowledge and skills that are developmental precursors to formal literacy.

Method

Participants

This study included cross-sectional data collected from 293 children (145 males, 148 females), with two thirds of them (n = 183; 47-66 months old) in preschool and the rest in their first year of school (n = 110; 63-79 months old) at the time of assessment. The average age of the preschool children at the time of assessment was 56.7 months (SD = 4.2 months), and 70.6 months (SD = 3.4 months) for the primary school children.

Material and Administration

The children were assessed using the ELKS instrument in the middle of their preschool or school year. The instrument is designed to be administered on an individual basis and is composed of three main components: an A4-size stimulus booklet, a response recording sheet that includes administration instructions, and a set of scoring criteria. The instrument contains 10 tasks assessing concepts of silent reading behavior, writing, knowledge of print conventions, word reading, syntactic knowledge, and knowledge of the alphabet, letter-sounds, and words. Most of the tasks involve giving children problem scenarios, and asking them to select from a range of responses and explain their reasoning. Some tasks have multiple items to account for possible inconsistencies in responses. The average administration time was around 20 min per child, ranging from 15 to 45 min. Further details regarding the instrument can be found in Barringer (2009) and Chan (2012).

The number of items in each of the ELKS tasks varies. Altogether, there are 30 items in the ELKS instrument, with the number of score levels within each item ranging from two to seven. Consistent with Ferreiro and Teberosky’s (1979/1982) study, responses were coded so that a higher score level within an item indicates a more sophisticated level of knowledge or skills. The interrater reliability of the individual tasks in the ELKS instrument based on interrater correlation (Tinsley & Weiss, 1975) were .87 or above across tasks (Barringer, 2009).

Procedure

Item scores from the ELKS instrument were analyzed with ConQuest (Wu, Adams, Wilson, & Haldane, 2008) using the PCM (Masters, 1982; Masters & Wright, 1997). As the ELKS instrument is designed to describe the early literacy development of children based on Ferreiro and Teberosky’s (1979/1982) theory, the one-parameter logistic (1PL) model was used instead of the two-parameter logistic (2PL) generalized PCM (Muraki, 1992) as the former model allows a clearer link to be drawn between a child’s raw score and the theoretical developmental progression (Wu, Tam, & Jen, in press).

The PCM allows items that have more than two item steps (i.e., score levels) to be analyzed and assumes the difficulty of the item steps within an item to be of different intervals (Wu, Adams, Wilson, & Haldane, 2007). Mathematically, with $θ$ representing person ability, $δ$ representing item difficulty, and k the item category number, if item i is a partial credit item with item steps $0, 1, 2, \dots, m_{i}$ , the probability of person j scoring x on item i can be expressed as follows:

P_{i j x} = \frac{\exp \sum_{k = 0}^{x} (θ_{j} - δ_{i k})}{\sum_{h = 0}^{m_{j}} \exp \sum_{k = 0}^{h} (θ_{j} - δ_{i k})}, x = 0, 1, \dots, m_{i},

where $\sum_{k = 0}^{0} (θ_{n} - δ_{i k}) = 0$ and $\sum_{k = 0}^{h} (θ_{n} - δ_{i k}) \equiv \sum_{k = 1}^{h} (θ_{j} - δ_{i k}) .$

As the PCM allows the item step parameters $(δ_{i 1}, δ_{i 2}, \dots, δ_{i m_{i}})$ for each item i to be estimated, substituting these estimates into Equation 1 provides the estimated probabilities of scoring $0, 1, \dots, m_{i}$ on item i for any specified ability $θ$ (Masters & Wright, 1997).

In the current study, year level (preschool or first year of school) was used as an independent variable or predictor for the item responses using latent regression (Adams, Wilson, & Wu, 1997). Assuming that the primary school children would generally find the assessment tasks easier than the preschool children, the regression provided an additional validity check for the ELKS instrument.

The PCM with latent regression converged successfully. An examination of the item statistics found 24 out of the 30 items have at least one score category with a relatively small number of respondents (fewer than 15 children; 5% of the total sample). Such categories may not be very reliable for estimating the item difficulty or child ability. The provision of a large number of score categories (up to seven categories for an item) also makes the administration process less efficient. Following the procedure described by Wu and Adams (2007), the items were recoded and the score categories for items that had more than three levels were collapsed. The recoding process involved balancing the spread of the responses in each score level, the sizes of the point-biserial correlation and average ability estimates across the score levels, and the meaningfulness of the categories in differentiating children of different ability levels.

The item response analysis was rerun with the recoded score categories using the PCM with year level as the independent variable. Table 1 provides a list and a brief description of the items and item steps included in the analysis after the recoding.

Table 1.

A List of the ELKS Tasks, Item Code, Item Description, and Item Step Description.

ELKS task	Item code	Item description	Item step description
1. Silent reading behavior	SRB01	Identification of silent reading behavior	Level 1: Identified silent reading behavior as reading/looking Level 2: Described silent reading in detail
2. Writing	WRI02	Picture label spelling	Level 1: Incorrect spelling Level 2: Correct spelling
	WRI03	Direction of the writing referred to in Item WRI02	Level 1: Did not write from left to right Level 2: Wrote from left to right
	WRI04	Reading of the writing referred to in Item WRI02	Level 1: Attempted to read own writing Level 2: Correct reading
	WRI05	Re-reading of the writing referred to in Item WRI02	Level 1: Inconsistent with original response Level 2: Consistent with original response
	WRI06	Own name spelling	Level 1: Incorrect spelling Level 2: Correct spelling
	WRI07	Direction of name writing	Level 1: Did not write from left to right Level 2: Wrote from left to right
	WRI08	Other words spelling	Level 1: Incorrect spelling Level 2: Correct spelling
	WRI09	Direction of the writing referred to in Item WRI08	Level 1: Did not write from left to right Level 2: Wrote from left to right
	WRI10	Reading of the writing referred to in Item WRI08	Level 1: Attempted to read own writing Level 2: Correct reading
	WRI11	Re-reading of the writing referred to in Item WRI08	Level 1: Inconsistent with original response Level 2: Consistent with original response
3. Spacing between words	SBW12	Concept of spacing between words	Level 1: Referred to reading direction or showed knowledge of spacing Level 2: Showed knowledge of spacing with supporting explanations
4. Differentiation between symbols	DBS13	Differentiation between shapes, numbers, and letters	Level 1: Differentiated shapes from numbers and letters Level 2: Differentiated shapes, numbers, and letters
5. Word structure	WDS14	Concept of what constitutes a word	Level 1: Applied variation rule and/or minimum quantity hypothesis Level 2: Applied orthographic knowledge
6. Reading with pictures	RWP15	Reading of the text kick accompanied by a picture of a soccer player	Level 1: Used picture for reading Level 2: Used print for reading
6. Reading with pictures	RWP16	Reading of the text red accompanied by a picture of a red flag	Level 1: Used picture for reading Level 2: Used print for reading
7. Word identification	WID17	Identification of the word “kitten” out of four cards: k, kt, ki tten, and kitten	Level 1: Used a space to separate the syllables within a word (ki tten) Level 2: Correct word identification
7. Word identification	WID18	Identification of the word “spider” out of four cards: s, sr, spi der, and spider	Level 1: Used a space to separate the syllables within a word (spi der) Level 2: Correct word identification
8. Word reading	WDR19	Single word reading: dog	Level 1: Used grapho-phonic cues Level 2: Correct reading
	WDR20	Single word reading: mum	Level 1: Used grapho-phonic cues Level 2: Correct reading
	WDR21	Single word reading: day	Level 1: Used grapho-phonic cues Level 2: Correct reading
	WDR22	Single word reading: tree	Level 1: Used grapho-phonic cues Level 2: Correct reading
	WDR23	Single word reading: water	Level 1: Used grapho-phonic cues Level 2: Correct reading
	WDR24	Single word reading: house	Level 1: Used grapho-phonic cues Level 2: Correct reading
9. Swapping terms	SWT25	Inference made about a sentence when its subject and object are swapped around to create a new sentence: from sam tickled mum to mum tickled sam	Level 1: Showed flexibility in reading direction or recognized at least one change between the sentences Level 2: Correctly read or deduced the meaning of the transformed sentence
9. Swapping terms	SWT26	Inference made about a sentence when its subject and object are swapped around to create a new sentence: from the girl chased the boy to the boy chased the girl	Level 1: Showed flexibility in reading direction or recognized at least one change between the sentences Level 2: Correctly read or deduced the meaning of the transformed sentence
10. Letter identification	LID27	Labeling of the alphabet	Level 1: Labeled letters as numbers Level 2: Labeled letters as letters/alphabet/ABC
	LID28	Letter name response	Level 1: Gave at least one correct letter name
	LID29	Letter-sound response	Level 1: Gave at least one correct letter-sound
	LID30	Word response	Level 1: Gave at least one correct word beginning with the target letter

Note. All ELKS items contain a Level 0 which generally refers to an absence of a response (e.g., said “I don’t know”) or unclear responses. ELKS = Early Literacy Knowledge and Skills.

Results

Various indices were inspected to examine the psychometric properties of the ELKS instrument, including internal consistency reliability, the model-data fit, item discrimination index, and estimation of item difficulty.

Internal Consistency Reliability

The internal consistency of the ELKS instrument based on classical test theory statistics was examined. The index examines the agreement of scores between the items in a test (Allen & Yen, 1979). The alpha coefficient of the instrument was .95, indicating good reliability. However, as Kieftenbeld, Natesan, and Eddy (2011) pointed out, the coefficient does not reflect the precision of measurement at different levels of the latent trait, and a high value could also indicate item redundancy. The coefficient therefore needs to be interpreted in light of other psychometric properties of the instrument.

Model-Data Fit

Item fit statistics were inspected to examine the dimensionality of the ELKS items and the model-data fit. According to item response theory, items that fit a unidimensional model are generally assumed to measure the same construct or latent trait (Wu & Adams, 2007). Items that do not fit the model well may be measuring a different construct, although the misfitting could also be due to other reasons, such as poor item design, or random errors associated with items or population sampling (Embretson & Reise, 2000; Wilson, 2005).

Model-data fit can be assessed using statistical means by examining the mean square fit statistics for individual assessment items. The unweighted fit mean square statistic (also known as outfit mean square) proposed by Wright and Masters (1982) is defined as follows:

\begin{array}{l} Unweighted mean square = \frac{\sum_{n} z_{n i}^{2}}{N}, \\ = \frac{1}{N} {\sum_{n} \frac{(x_{n i} - E (x_{n i}))}{Var (x_{n i})}}^{2}, \end{array}

where N is the total number of respondents, x_ni is the observed score for person n on item i, and z_ni is the standardized residual. As the unweighted fit mean square statistic is sensitive to unexpected responses made by persons when item i is too easy or too difficult, Wright and Masters (1982) proposed the weighted fit mean square (infit) statistic as an alternative for examining item fit:

\begin{matrix} Weighted mean square = \frac{\sum_{n} z_{n i}^{2} Var (x_{n i})}{\sum_{n} Var (x_{n i})}, \\ = \frac{\sum_{n} {(x_{n i} - E (x_{n i}))}^{2}}{\sum_{n} Var (x_{n i})} . \end{matrix}

Table 2 presents the unweighted and weighted fit statistics for the 30 ELKS items based on the unidimensional PCM. The values in the table are sorted from low to high according to the weighted t-statistics. Applying the recommendations of Wright and Linacre (1994) and Wu et al. (2007), values that exceed the suggested thresholds for the weighted mean square (i.e., 0.5-1.5) and t-statistics (i.e., ±2) are in bold.

Table 2.

Weighted Mean Square Fit Statistics.

Item	Difficulty estimate	Error	Unweighted mean square	Unweighted t	Weighted mean square	Weighted t
WRI02	0.62	0.08	0.48	−7.9	0.62	−4.7
RWP16	−0.82	0.11	0.66	−4.7	0.71	−3.4
WRI08	0.43	0.08	0.53	−6.9	0.72	−3.2
LID29	0.06	0.11	0.64	−4.9	0.81	−3.2
RWP15	−1.03	0.10	0.74	−3.4	0.79	−2.5
WDR20	0.72	0.08	0.61	−5.4	0.80	−2.1
WRI05	−0.16	0.08	0.80	−2.6	0.87	−1.6
WDR21	2.04	0.09	0.57	−6.3	0.85	−1.5
WRI09	−0.05	0.08	0.62	−5.3	0.89	−1.2
WRI11	−0.09	0.08	0.88	−1.4	0.90	−1.1
WDR22	0.22	0.08	0.65	−4.8	0.89	−1.0
WRI04	2.03	0.09	0.87	−1.7	0.91	−1.0
WDR19	−0.26	0.08	0.76	−3.1	0.92	−0.7
WRI03	1.13	0.09	0.98	−0.2	0.94	−0.7
LID28	2.63	0.10	0.89	−1.4	0.91	−0.6
WDR23	−2.58	0.13	0.49	−7.6	0.94	−0.6
WRI07	−1.94	0.10	0.47	−8.0	0.94	−0.4
LID30	−1.28^a	0.49	0.80	−2.6	0.97	−0.3
DBS13	−1.63	0.09	0.73	−3.5	1.01	0.1
WDS14	0.93	0.09	1.01	0.2	1.01	0.1
WDR24	2.23	0.10	0.96	−0.5	1.03	0.3
WRI10	0.21	0.08	1.32	3.6	1.05	0.6
SWT25	−0.01	0.08	1.19	2.2	1.06	0.7
WRI06	−1.49	0.09	0.78	−2.8	1.08	0.8
SWT26	−0.19	0.08	1.03	0.4	1.19	2.2
SBW12	1.44	0.10	1.18	2.1	1.16	2.3
WID18	−1.04	0.09	1.48	5.1	1.24	2.6
SRB01	−0.82	0.09	1.72	7.2	1.34	3.7
WID17	−0.78	0.08	2.15	10.6	1.53	5.6
LID27	−0.51	0.08	14.93	52.9	1.70	6.9

Note. Values that exceed the suggested thresholds for the weighted mean square (i.e., 0.5-1.5) and t-statistics (i.e., ±2) are in bold.

Constrained item parameter.

The weighted t-statistics were first examined to identify mean square values that fell outside of the confidence interval of 95% (two-tailed test). Items with weighted mean square values that exceed the recommended range indicate item misfitting. As can be seen from the table, out of the 30 ELKS items, 12 items have t values that are less than −2 or greater than 2. Out of these 12 items, 2 (WID17 and LID27) have a weighted mean square value that is greater than 1.5. The fit statistics suggest that on the whole, the majority of the ELKS items were measuring the same construct and fit the unidimensional PCM.

Item Discrimination Index

Other than item fit statistics, the psychometric properties of individual items can also be examined in terms of the item discrimination index based on classical test theory. An item discrimination index is the correlation between each child’s item score and the child’s total score (Fan, 1998). The index ranges from −1 to 1 and shows how well the items differentiate between children of different ability levels. Wu and Adams (2007) suggested that items with a discrimination index of less than 0.2 are not useful for measurement and recommended the inclusion of items with an index of above 0.4 in tests.

Table 3 shows the discrimination index of the ELKS items. The discrimination of the items was generally good, ranging between 0.40 and 0.87. As expected, the two items that showed relatively poor fit (WID17 and LID27) have a lower discrimination index. LID28, which refers to the ability of children to name at least one letter, had the lowest discrimination index. This could be because the majority of the children in the sample (91.1%) could do so, and so the item did not differentiate well between children of different ability levels.

Table 3.

Item Discrimination Index.

Item	Discrimination index
LID28	0.40
WID17	0.43
LID27	0.44
SRB01	0.46
WRI07	0.47
LID30	0.48
DBS13	0.49
WID18	0.50
SBW12	0.52
WRI06	0.54
WDS14	0.64
WDR23	0.67
SWT26	0.67
WDR24	0.68
LID29	0.69
SWT25	0.71
WDR22	0.72
WRI03	0.73
RWP15	0.73
RWP16	0.75
WDR21	0.75
WRI05	0.76
WRI10	0.76
WRI11	0.76
WRI09	0.77
WDR19	0.78
WRI04	0.79
WDR20	0.82
WRI08	0.85
WRI02	0.87

Item Difficulty

Item response theory conceptualizes the ability level of different people (person ability) and the difficulty level of different items (item difficulty) as being distributed along the same continuum. Figure 1 shows the item step difficulty map based on the item response modeling, where the numbers on the left represent a scale in logits (log odd units) shared by both child ability and item step difficulty. The Xs in the middle of the map represent the 293 children in this study, and the item steps are listed on the right. The suffixes .1 and .2 represent the second and third item steps of the particular item, respectively. Three of the four letter identification task responses only have two item steps and so only one item step is shown on the map for each of those three items. The item step difficulty map is presented in numerical form in Table 4 by showing the ordering of the items from high to low difficulty.

Figure 1.

Map of child ability and item step difficulty estimates.

Table 4.

Item Step Difficulty.

Item step	Difficulty estimate
SBW12.2	3.55
WDR23.2	3.48
WDS14.2	2.80
WDR24.2	2.63
WDR22.2	2.55
WDR21.2	2.45
RWP16.2	1.87
WDR24.1	1.82
WDR23.1	1.78
WDR21.1	1.63
WDR22.1	1.51
WDR19.2	1.27
RWP15.2	1.20
WRI02.2	1.00
WDR19.1	0.98
WDR20.2	0.89
WRI08.2	0.61
WDR20.1	0.54
SRB01.2	0.51
SWT25.2	0.48
WRI04.2	0.26
WRI08.1	0.25
WRI10.2	0.25
WRI02.1	0.24
WRI04.1	0.18
WRI10.1	0.17
SWT26.2	0.13
LID29.1	0.06
WRI09.2	0.02
WRI11.2	0.00
WRI05.2	−0.02
WRI03.2	−0.03
WRI09.1	−0.12
WRI11.1	−0.19
WID17.2	−0.27
WRI05.1	−0.31
LID27.2	−0.43
WRI03.1	−0.48
WID18.2	−0.49
SWT25.1	−0.49
SWT26.1	−0.52
LID27.1	−0.58
SBW12.1	−0.66
WRI06.2	−0.80
WDS14.1	−0.94
DBS13.2	−1.26
LID30.1	−1.28
WID17.1	−1.29
WID18.1	−1.59
WRI07.2	−1.66
DBS13.1	−1.99
SRB01.1	−2.15
WRI06.1	−2.17
WRI07.1	−2.23
LID28.1	−2.58
RWP15.1	−3.26
RWP16.1	−3.50

Note. Item step difficulty estimates based on Thurstonian thresholds.

In terms of interpreting the item step difficulty map (Figure 1), the children are positioned along the logit scale according to their ability level while the item steps are placed according to their difficulty level. The more able children and more difficult item steps are located further up the scale, whereas the less able children and easier item steps are located lower down the scale. As can be seen from the figure, the ordering of all of the item steps seemed to follow the PCM where the higher item step was more difficult (at a higher position) than the lower item step within an item. This implied that the ordering of the item steps followed the conceptual framework underpinning the score categories.

The average ability estimates of the children who were in preschool and the first year of school were −0.41 logit and 2.30 logits, respectively. The variance in the latent variable for each group of children was estimated to be 1.07. Technically, if the difficulty of the ELKS item steps matches the overall ability level of the children, the distribution of the item steps on the scale should be at the same level as that of the children (Wilson, 2005). This was supported through an inspection of the test information curve (Figure 2), which illustrates the different quality of information that a test provides at different ability levels (Furr & Bacharach, 2008). The figure shows that overall the ELKS items provided the greatest information at the average ability level of $θ$ = 0 and less information at the extreme ability levels. Further visual inspection of the item step difficulty map (Figure 1) suggested that the ELKS instrument seemed to be better able to cover the lower ability range where there are item steps on the lowest portion of the scale (i.e., RWP15.1 and RWP16.1) with no children located at the same ability/difficulty level as these item steps. The instrument also appeared to be too easy or “ceilinged out” for the more able children, where there are no item steps located beyond 3.75 logits on the scale despite more than 10 children covering that part of the scale. The ceiling effect was confirmed through an inspection of the ability estimates¹ output from ConQuest which found 12 children estimated to have an ability level of 3.75 logits or above. All of these children obtained full or almost full score on the ELKS instrument. Out of these 12 children, 10 were attending the first year of school at the time of assessment.

Figure 2.

Test information curve.

The construct validity of the ELKS instrument can also be inspected through the examination of item step difficulty according to Ferreiro and Teberosky’s (1979/1982) theory of early literacy development. Supporting their theory, the difficulty estimates of the item steps demonstrated a progression of early literacy concepts. Children at a lower literacy ability level were found to rely more on contextual information such as pictures when they read (RWP15.1 and RWP16.1). The children were able to identify shapes among different symbols (DBS13.1), but tended to mix up numbers and letters (DBS13.2). At a higher ability level, children tended to apply stringent rules such as the minimum quantity hypothesis or the variation rule (WDS14.1). Children at the highest ability level tended to notice more text features such as spaces between words (SBW12.2), reading direction (SBW12.1), subject-object position (SWT25.2 and SWT26.2), and letter-sound relationships (LID29.1). They were also more likely to utilize their knowledge of these features when attempting literacy tasks.

Discussion

Based on the findings from the statistical analysis, overall the ELKS instrument showed good psychometric properties, although two items in particular may require further investigation or revision.

The results from the item difficulty analysis were generally consistent with Ferreiro and Teberosky’s (1979/1982) conceptualization of early literacy development. The lower end of the progression included item steps that were relatively less sophisticated, such as using the accompanying pictures to derive the meaning of text when reading. Responses in the middle section of the progression generally related to the application of rules such as the minimum quantity hypothesis or the variation rule. The upper end of the progression included responses that demonstrated a more sophisticated understanding of the rules of written language, such as the use of spacing, syntax, reading direction, subject-object position, and letter-sound relationships. At the item level, the difficulty ordering of the items seems reasonable when taking into account the nature of the tasks. For instance, children tended to find correctly writing their own names (WRI06) easier than writing other words (WRI02 and WRI08). Writing words that the children chose themselves (WRI08) tended to be easier than writing words that the assessment administrator prescribed (WRI02). Identifying the correct words from a range of choices (WID17 and WID18) was generally easier than pronouncing a word correctly (WDR19-WDR24).

In terms of model-data fit, the majority of the items (28 out of 30) in the ELKS instrument showed relatively good fit with the unidimensional PCM. All of the items generally discriminated well between children of different ability levels, and the measure as a whole had high internal consistency. Further investigations are needed to understand the reasons for the relatively poor fit showed by the two items (WID17 and LID27). These investigations may include examining the model-data fit in a larger sample to determine the replicability of the findings.

In terms of the match between item difficulty and person ability in the sample, as expected, the children who were in the first year of school generally performed better than the preschool children on the ELKS instrument. Although the ELKS item steps appear to cover a range of ability levels, the instrument may be more suitable for assessing children in preschool than the first year of school as the item steps appear to be too easy for the more able children at the top of the scale.

Regarding the possible applications of the ELKS instrument in the classroom, although the analysis of this study only examined the properties of the instrument at the group rather than at individual child level (Chan, 2013), the summative results from the assessment could help teachers to differentiate their teaching in small group teaching. For example, children who are at a lower ability level according to the item response modeling could engage in group activities that would encourage them to pay more attention to the textual cues when reading than to rely on pictures. For children who are at a higher ability level, teachers could challenge their thinking by examining exceptions to rules such as the minimum quantity hypothesis and draw their attention to the letter-sound correspondence in words at the syllabic level. For children who are at the highest ability level, formal literacy tasks that are more difficult or complex than those in the ELKS instrument may be needed to assess their ability. The developmental progression inferred from the item step difficulty map can also be used as a “road map” or a criterion-referenced framework that links curriculum, assessment, and pedagogy, as suggested by Black, Wilson, and Yao (2011).

Rather than focusing on a single correct answer, the ELKS instrument provides an important contribution to early literacy assessment as an assessment tool that more finely differentiates children in terms of different levels of understanding. As stated in the introduction, this study provides the initial step to evaluate psychometrically whether the data drawn from the instrument can be useful for informing early literacy teaching. Further investigation is needed to determine how the responses of individual children to the ELKS instrument may change over time and how they relate to other learning outcomes.

In conclusion, this initial investigation of the psychometric properties of the ELKS instrument suggests the developmental progression generated from the data generally supports Ferreiro and Teberosky’s (1979/1982) theory of early literacy development. This article demonstrates how item response modeling can be useful for examining the psychometric properties of the instrument in relation to a theorized developmental progression. Although more work is needed to refine the ELKS instrument, the analysis helps to highlight areas in the instrument that may require further investigation. Further research is also needed to determine how the instrument can be used to inform early literacy teaching in practice.

Footnotes

Acknowledgements

This article is based on the author’s doctoral thesis titled Standardised Assessment in Early Literacy: Reconciling Different Perspectives and Methods. The author wishes to thank Associate Professors Esther Care and Margaret Brown for their support in this research. The parents, children, and teachers who participated in this research and members of the Young Learners’ Project are gratefully acknowledged. The helpful comments from Professor Margaret Li-min Wu on prior drafts of this article are greatly appreciated.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Australian Research Council in conjunction with its partner organizations the University of Melbourne and the Australian Scholarships Group. The author was a recipient of the Australian Postgraduate Award (Industry) supported under the Australian Research Council’s Linkage Projects funding scheme (Project No. LP0883437).

Notes

References

Adams

R. J.

Wilson

(1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.

Allen

M. J.

Yen

W. M.

(1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

Barringer

(2009). Early literacy knowledge and skills: The development of a measure of early literacy and an investigation of the skills associated with single word reading (Doctoral thesis, The University of Melbourne). Available from Melbourne University Research Collections (UMER) database. http://hdl.handle.net/11343/35450

Barringer

Brown

P. M.

Chan

M. C. E.

Care

(2009). Early literacy knowledge and skills instrument. Parkville, Victoria, Australia: The University of Melbourne.

Black

Wilson

Yao

S.-Y.

(2011). Road maps for learning: A guide to the navigation of learning progressions. Measurement: Interdisciplinary Research and Perspectives, 9, 71-123.

Chan

M. C. E.

(2012). Standardised assessment in early literacy: Reconciling different perspectives and methods (Unpublished doctoral thesis). The University of Melbourne, Victoria, Australia.

Chan

M. C. E.

(2013). Young learners: An exploration of the notion “by different paths to common outcomes” in early literacy assessment. In Dunston

P. J.

(Eds.), 62nd yearbook of the Literacy Research Association (pp. 75-92). Altamonte Springs, FL: Literacy Research Association.

Clay

M. M.

(2002). An observation survey of early literacy achievement (2nd ed.). Auckland, New Zealand: Heinemann.

Embretson

S. E.

Reise

S. P.

(2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.

10.

Fan

(1998). Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58, 357-381. doi:10.1177/0013164498058003001

11.

Ferreiro

(1991). Literacy acquisition and the representation of language. In Kamii

Manning

(Eds.), Early literacy: A constructivist foundation for whole language (pp. 31-54). West Haven, CT: National Education Association.

12.

Ferreiro

Teberosky

(1982). Literacy before schooling ( Castro

K. Goodman

, Trans.). Exeter, NH: Heinemann Educational Books. (Original work published 1979)

13.

Furr

R. M.

Bacharach

V. R.

(2008). Psychometrics: An introduction. Los Angeles, CA: Sage.

14.

Goodman

Y. M.

Reyes

McArthur

(2005). Emilia Ferreiro: Searching for children’s understandings about literacy as a cultural object. Language Arts, 82, 318-323.

15.

Kato

Ueda

Ozaki

Mukaigawa

(1998). Japanese preschoolers’ theories about the “Hiragana” system of writing. Linguistics and Education, 10, 219-232. doi:10.1016/s0898-5898(99)80109-0

16.

Kieftenbeld

Natesan

Eddy

(2011). An item response theory analysis of the mathematics teaching efficacy beliefs instrument. Journal of Psychoeducational Assessment, 29, 443-454. doi:10.1177/0734282910391062

17.

Masters

G. N.

(1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174. doi:10.1007/BF02296272

18.

Masters

G. N.

Wright

B. D.

(1997). The partial credit model. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of modern item response theory (pp. 101-121). New York, NY: Springer.

19.

Muraki

(1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176. doi:10.1177/014662169201600206

20.

Ruiz

N. T.

(1995). A young deaf child learns to write: Implications for literacy development. The Reading Teacher, 49, 206-217.

21.

Sulzby

Teale

W. H.

(2003). The development of the young children and the emergence of literacy. In Flood

Lapp

Squire

J. R.

Jensen

J. M.

(Eds.), Handbook of research on teaching the English language arts (2nd ed., pp. 300-313). Mahwah, NJ: Lawrence Erlbaum.

22.

Tinsley

H. E.

Weiss

D. J.

(1975). Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 22, 358-376. doi:10.1037/h0076640

23.

Tolchinsky

Teberosky

(1998). The development of word segmentation and writing in two scripts. Cognitive Development, 13, 1-24. doi:10.1016/s0885-2014(98)90018-1

24.

Watson

L. M.

(2009). Early print concepts: Insights from work with young deaf children. Deafness & Education International, 11, 191-209. doi:10.1002/dei.267

25.

Whitehurst

G. J.

Lonigan

C. J.

(1998). Child development and emergent literacy. Child Development, 69, 848-872.

26.

Wilson

(2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum.

27.

Wright

B. D.

Linacre

J. M.

(1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370. Retrieved from http://www.rasch.org/rmt/rmt83b.htm

28.

Wright

B. D.

Masters

G. N.

(1982). Rating scale analysis. Chicago, IL: Mesa Press.

29.

M. L.

Adams

(2007). Applying the Rasch model to psycho-social measurement: A practical approach. Melbourne, Victoria, Australia: Educational Measurement Solutions.

30.

M. L.

Adams

R. J.

Wilson

M. R.

Haldane

S. A.

(2007). ACER ConQuest version 2.0: Generalised response modelling software manual. Camberwell: Australian Council for Educational Research.

31.

M. L.

Adams

R. J.

Wilson

M. R.

Haldane

S. A.

(2008). ConQuest: Generalised response modelling software (Version 2) [Computer program]. Camberwell: Australian Council for Educational Research.

32.

M. L.

Tam

H.-P.

Jen

T.-H.

(in press). Educational measurement for applied researchers—Theory into practice. Singapore: Springer.

33.

Yaden

D. B.

Jr. Tsai

(2012). Learning how to write in English and Chinese: Young bilingual kindergarten and first grade children explore the similarities and differences between writing systems. In Bauer

E. B.

Gort

(Eds.), Early biliteracy development: Exploring young learners’ use of their linguistic resources (pp. 55-83). New York, NY: Routledge.