Abstract
Investigations of early vocabulary production often employ a single method to measure children’s word use. This study examined expressive vocabulary development in children aged 1;0–2;6 years through a combination of picture naming, caregiver report and language sampling. The participants were predominantly exposed to Maltese at home, with gathered evidence providing novel documentation of early vocabulary development in this specific language-learning context. Expressive vocabulary reported by caregivers was compared to word use elicited through picture naming and sampled naturalistically during play. Analyses revealed commonalities between pairs of measures that pointed towards their validity. Resulting differences underscored the influences that data collection methods exerted on the measures they generated. Taken together, these findings highlight the relevance of multiple methods for ensuring validity and objectivity in the investigation of expressive vocabulary development.
Keywords
Introduction
Researchers constantly grapple with how best to derive valid measures of early language skills, including vocabulary production. Since the linguistic abilities of young children are notoriously difficult to determine (Fenson et al., 1994), the dependability of data in early language research is a core methodological construct that merits careful consideration. Using multiple measures that converge on a single aspect of children’s language ability helps to ensure that resulting outcomes are sound and objective. Consequently, employing more than one method in the assessment of children’s early language skills is recommended (Dockrell, 2001).
Known as triangulation, the use of two or more methods enhances the validity of research procedures employed (Robson, 2002), ensuring the trustworthiness of research instruments and demonstrating the quality of the methodological design (Mertens, 1998). The comparison of parallel pieces of evidence allows cross-validation because a positive correlation between two measures of the same aspect indicates their concurrent validity. Correspondence between language scores obtained through different strategies validates the individual measures as well as the methods and instruments used. Pooling information from multiple sources also compensates for measurement bias inherent in individual methods. Researchers are alerted to potential methodological problems when measurement of a single language dimension using different strategies gives rise to mismatched findings. Triangulation is therefore a means to enhance the comprehensiveness of data (Pring, 2005).
In recognition of the fact that vocabulary development is central to language acquisition, several studies have addressed young children’s vocabulary skills from various perspectives (Schipke & Kauschke, 2011). However, vocabulary development research is marked by a dearth of multi-method designs despite awareness of the benefits of triangulation approaches. This article reports a study that capitalises on the advantages of employing multiple data collection strategies. Three parallel measures of vocabulary production allowed the validity and objectivity of data to be addressed. The similarities between measures served to establish their soundness and affinity, while discrepancies underscored the influences exerted by the methods themselves.
The measurement of early vocabulary skills
The methods typically employed in the assessment of children’s early word production are parental report, language sampling and standardised testing (Dale, 1996; Law & Roy, 2008). Their review and evaluation follows.
Parental report
Parental report draws on the insight that parents can contribute concerning their children’s language and communicative abilities across a variety of daily situations (Feldman et al., 2005). In contrast with direct assessment methods, parental report minimises performance constraints such as the child’s lack of cooperation or reticence. It also drastically reduces the time and human resources required to assess early language skills. It is not surprising, then, that large samples of young children are a hallmark of studies based solely on parental report.
Parent-based assessment of early vocabulary abilities typically makes use of vocabulary checklists. Parents are asked to tick the words understood and/or used by their children and possibly to exemplify others. Early vocabulary measurement is a major focus of the well-established MacArthur-Bates Communicative Development Inventories (CDI) (Fenson et al., 2006) and the Language Development Survey (LDS) (Rescorla, 1989).
Although parental report is often considered the most effective means of obtaining information on early expressive language abilities (Rescorla, 1989), parents may overestimate their children’s vocabulary and emergent grammar skills, particularly at the initial stages of word production when first attempts at using words may require considerable interpretation (Feldman et al., 2000). Conversely, parents may underestimate the early language of children whose speech is difficult to understand. In addition to the reliability of parental judgements, memory limitations introduce an element of bias in reported data.
Standardised assessment
Standardised language testing requires direct assessment. It allows comparison of a child’s language abilities to normative data derived from a large population sample (Dockrell, 2001). Standardised assessments of early language skills are, however, vulnerable to measurement error due to potentially low attention and compliance levels in young children (Emanuel, Chiat, & Roy, 2007). The value of early language assessments is therefore limited by young children failing to abide by test requirements (Dale, 1996; Rescorla & Alley, 2001). Perhaps this explains why so few norm-referenced assessments cater for children younger than 2;6 years (Law & Roy, 2008). Standardised assessments also tend to be limited in the information they provide to practitioners regarding young children’s language skills, hindering the planning and implementation of intervention procedures (Dockrell, 2001). Particularly when sensitivity and specificity data presented with norm-referenced tests are weak, other sources of information may be prioritised (Spaulding, Plante, & Farinella, 2006).
Standardised testing of early language skills is not possible in the absence of developmental norms. In such contexts, informal structured assessments that require children’s cooperation on a set task provide an alternative to standardised tests. For example, Thal, Jackson-Maldonado, and Acosta (2000) administered an informal object naming task to 1;8- and 2;4-year-old Spanish-speaking Mexican children in order to elicit specific vocabulary targets.
Language sampling
Language sampling is a direct assessment procedure that evaluates children’s naturalistic language use. Empirical evidence indicates that language sample measures vary as a function of the communicative context. Focusing on the sampling of vocabulary skills, Ukrainetz and Blomquist (2002) found that the range of expressive vocabulary used was determined by discourse context, children’s level of grammatical development and talkativeness. Spontaneous language samples may also be affected by observer presence (Bornstein, Painter, & Park, 2002) and children’s readiness to cooperate (Law & Roy, 2008). Furthermore, Hoff (2010) found the vocabulary range employed by young children in conversation to be influenced by the developmental level of the conversational partner. Sampling procedures and analyses must therefore take account of contextual variables. Despite these constraints, the sampling of children’s language use in conversational settings remains popular, particularly for specific languages where no standardised tools are available (e.g. Gutiérrez-Clellen, Restrepo, Bedore, Peña, & Anderson, 2000).
Triangulation of methods
The use of multiple methods in research on vocabulary acquisition is driven by concerns with the validity and objectivity of measures. This review will now focus on empirical literature that attests to the relevance of triangulation in minimising methodological bias and validating vocabulary data.
Identifying methodological bias
Evaluation of the approaches used to assess early language skills exposed the lack of objectivity that may result when each procedure is employed as a stand-alone method. Assessment methods may influence the language measures obtained. An ongoing debate on the issue of methodological objectivity has been primarily concerned with the variability inherent in reported and sampled measures of vocabulary. Parent-based assessment and language sampling do not tap into children’s vocabulary abilities in the same way. Trudeau and Sutton (2011) noted that parental report provides the caregiver’s personal view on the child’s ability to use a closed set of vocabulary items across various contexts. In contrast, language samples yield observational information on the child’s unrestricted vocabulary production in a single setting. In addition, measures of word frequency and the use of flexible or memorised word combinations cannot be obtained through parental vocabulary reports (Bates et al., 1994). In view of these differing measurement properties, Fenson et al. (1994) argued that parental report tools should not be used to the exclusion of other methods. Method triangulation is a means to enhancing objectivity. Collecting data from different sources might minimise the bias associated with individual assessment procedures. Moreover, combining information obtained from language sampling with data generated by parental report is recommended for more comprehensive measures of vocabulary (Bornstein et al., 2002; Pine, Lieven, & Rowland, 1996; Salerni, Assanelli, D’Odorico, & Rossi, 2007).
Validating vocabulary measures
Method triangulation is often employed by researchers seeking to establish the concurrent validity of vocabulary measures. For instance, Law and Roy (2008) reviewed a series of validation studies in which CDI vocabulary checklist scores were compared to vocabulary measures obtained through standardised tests and/or naturalistic sampling. Correlations between CDI scores and standardised and/or naturalistic measures ranged between moderate to high across various samples of children, including delayed language and low socioeconomic groups. In Rescorla’s (1989) series of four studies documenting the psychometric properties of the LDS, a sample of children was involved in both parent-based and direct assessment of expressive vocabulary abilities. Multiple measures of children’s expressive vocabularies were then compared and their close correspondence taken as evidence of the instrument’s concurrent validity. Pan, Rowe, Spier, and Tamis-Lemonda (2004) established the concurrent validity of a parent-based vocabulary measure in 2;0-year-old children from low-income families by investigating its associations with sample and standardised measures.
The comparison of vocabulary measures derived from different methods has also been employed in the validation of CDI and LDS vocabulary checklist adaptations to other languages. For example, a parallel between checklist and sample vocabulary measures has ascertained the concurrent validity of CDI adaptations into Icelandic (Thordardottir & Ellis Weismer, 1996), Irish (O’Toole & Fletcher, 2010), Galician (Pérez-Pereira & Resches, 2011) and Quebec French (Trudeau & Sutton, 2011). The reliance on sampled vocabulary measures in these studies is testimony to the scarcity of well-established criterion measures for languages other than English, which in turn limits the additional data sources to which parent-based measures can be compared (Dale, 1991). Alternative means to derive multiple measures of early vocabulary often need to be sought. For instance, the triangulation of methods employed by Thal et al. (2000) to test the concurrent validity of the Mexican-Spanish CDI vocabulary checklist adaptation involved naturalistic language sampling and object naming alongside parental report. Checklist measures obtained for 1;8- and 2;4-year-old children were correlated with the number of different words used spontaneously by the same children and the number of objects they labelled on a naming task devised purposely for the study.
Unitary sources of evidence
Despite the benefits of diversifying data collection, triangulation remains relatively uncommon in language acquisition research (Behrens, 2008). Studies tend to use unitary sources of evidence rather than combinations of measures. This popularity of single-method designs is disconcerting and warrants some explanation. In the study of vocabulary development, the use of multiple sources of evidence has an impact on available resources, constraining overall sample size. Consequently, multi-method assessment may be limited to a subgroup of the main cohort. For example, Fenson et al.’s (1994) normative data for the CDI: Words and Sentences (CDI: WS) involved a group of 1142 toddlers, whereas evidence of the tool’s concurrent validity was based on samples numbering fewer than 30 children (see Dale, 1991). Similarly, Rescorla (1989) drew on dual-source data from subsamples of 81 and 58 children out of a composite total of 641 when examining the validity of the LDS. A study correlating reported and observed vocabulary measures for 422 children (Rescorla & Alley, 2001) is an uncommon case of parallel measures being obtained for an extensive sample. The laborious nature of direct assessment has been identified as a reason for the use of restricted samples validating parental reporting tools (O’Toole & Fletcher, 2010). Researchers tend to trade off triangulation against larger participant groups that enhance statistical power.
Research questions and hypotheses
The literature review has indicated that multiple measures of vocabulary production allow data comparisons that (a) identify methodological biases impinging on measures generated by individual procedures and (b) validate novel assessment tools and the measures they yield. The study reported here exemplifies the use of a triangulation approach in investigating the vocabulary production skills of 44 Maltese children aged between 1;0 and 2;6 years. This research is part of a larger study that provides unprecedented documentation of expressive vocabulary development in young Maltese children. Data yielded by caregiver report, structured testing using a picture naming task and language sampling were compared in order to ascertain the validity of measures. Since two instruments were developed purposely for the study, it was important to demonstrate that the data they generated reflected the expressive vocabulary skills they purported to measure. Having multiple data sources also allowed potential methodological influences to be identified. In designing the study, a triangulation of methods was weighed against cohort size so that equilibrium between both methodological considerations was reached. Having a moderate sample size allowed measures obtained from the three data sources to be examined descriptively and statistically for validity and objectivity purposes. The following research questions were addressed:
RQ1: To what extent do caregiver report, picture naming and language sampling concur and differ in their documentation of expressive vocabulary development for the same group of children?
RQ2: How do measures of vocabulary production derived from caregiver report and picture naming compare when analysed on an item-by-item basis?
Piloting of the three methods on a reduced cohort had indicated the validity of the respective measures. Since subsequent changes to the picture naming task and vocabulary checklist were minimal, it was hypothesised that each tool would continue to document similar trends in expressive vocabulary development as the other measures.
Method
Participants
The 44 participants were typically developing Maltese children aged 1;0, 1;6, 2;0 and 2;6 years. Each age group consisted of approximately equal numbers of boys and girls. The participants were six boys and five girls aged 1;0 year, five boys and seven girls aged 1;6 years, five boys and six girls aged 2;0 years, as well as five boys and five girls aged 2;6 years. Three of the children, a 2;0-year-old girl and a 2;6-year-old girl and boy, lived with one parent but regularly spent time with the other parent. The main caregivers of all participants were mothers, except for one 2;6-year-old boy who was mostly cared for by his grandmother. All the participants’ parents had a secondary level of education. Seventeen mothers and 16 fathers had pursued their studies to post-secondary level. The remaining parents were in possession of a university degree. One mother and one father also had a post-graduate qualification.
Each child was exposed primarily to Maltese within the home context. Both Maltese and English carry the status of official languages in Malta, with bilingualism being widespread. Camilleri (1995) noted that Maltese families vary in the language use patterns employed in the home. As a result, young Maltese children may be exposed simultaneously to Maltese and English, or predominantly to either Maltese or English, with the latter patterns paving the way for sequential bilingualism upon school entry. In order to minimise variability in the participants’ language exposure patterns, this study chose to focus on children receiving predominantly Maltese input that outweighed exposure to the English language, the latter occurring directly through specific patterns of English mixing typical of Maltese child-directed speech (Borg, 1988) and indirectly through bilingualism and language contact at the societal level. Maltese-dominant input is more typical of the linguistic exposure received by the childhood population at large (see National Statistics Office, 2007). For every participant, this input pattern was established upon initial telephone contact with the child’s family and confirmed by the main caregiver through completion of a language background questionnaire.
Children manifesting features which clearly impaired their language development at the time of data collection were not included in the study. Due to the lack of normative information on language acquisition in young Maltese children, children with evident language impairments were identified on the basis of clinical experience. Since all data were gathered by the first author, a practising speech-language pathologist, the same researcher’s professional judgement was applied across all potential participants. The data of a 2;0-year-old boy found to be on the autism spectrum and a highly unintelligible 2;6-year-old boy were excluded from the study. Although an eventful developmental history placed participants at risk for language impairment, it did not definitely predict language difficulties. For this reason, a 1;0-year-old boy and a 1;6-year-old girl born prematurely at 34 and 32 weeks’ gestation respectively were included in the study. Ear infections were reported for three 1;0-year-olds, two 1;6-year-olds, one 2;0-year-old and two 2;6-year-olds. Since problems with middle ear function are a common occurrence in early childhood, these children were included in the participant group. For two male participants, aged 1;0 and 2;0 years respectively, speech or language difficulties were reported in an older male sibling. No significant medical conditions were reported.
Procedure
Participants and their main caregivers were involved in two data gathering sessions, with the second session taking place one to two weeks after the first encounter. Data were collected in the children’s homes. During the first meeting, caregivers were interviewed about the children’s birth history, general health, physical and language development, siblings, parental education and occupation. At the end of the first session, caregivers were asked to complete a language background questionnaire and an expressive vocabulary checklist. Questionnaire responses allowed the children’s predominantly Maltese-speaking home environment to be ascertained, while checklist completion yielded a caregiver-based measure of their productive vocabularies. Both forms were collected at the second session, when direct assessment of each child’s vocabulary production was carried out through picture naming and language sampling. Procedures employed with each method and the relevant tools are described next.
Caregiver report
Each caregiver was briefed about the purpose of the vocabulary checklist in supporting the assessment of children’s word production and taken through the attached written guidelines. It was emphasised that only words used spontaneously by the child were to be ticked. The reporting tool given to caregivers was an adaptation of the vocabulary checklist found in Fenson et al.’s (1993) first edition of the CDI: WS. Adaptation of the original checklist for use with Maltese-speaking children and validation of the pilot version are described in Gatt (2007). Following changes made to the checklist adaptation in the light of pilot study findings, the tool employed in the present investigation consisted of 916 words organised into 24 semantic categories. The vocabulary checklist incorporated both Maltese and English words, since pilot study results showed children receiving predominantly Maltese exposure to employ a proportion of English lexical items alongside Maltese words, possibly reflecting the input received. Research findings have suggested that input patterns are likely to influence young children’s vocabulary production in terms of language mixing (Gatt, Letts, & Klee, 2008) and translation equivalent use (Montanari, 2010). Maltese words made up 68.45% of the checklist entries. The modifications made to the checklist adaptation following the piloting phase necessitated the validation of the revised word list and the measures it generated. Findings from the current study address this issue.
Picture naming
In this direct assessment procedure, children were encouraged to label 18 pictures presented in book format. Caregivers were instructed to browse through the book with their children, guiding them to focus on one picture at a time. If no naming response ensued, they were to attempt elicitation of the verbal label by asking what item was depicted in the picture. Pictures rather than objects were opted for, so as to provide a context for vocabulary use different from that employed in language sampling. Pictorial representations of a ball, car, cat, baby, pair of shoes, dog, doll, aeroplane, telephone, glass, bicycle, egg, guitar, bird, spoon, hat, flower and comb were presented in this order. Given the lack of normative vocabulary acquisition data for Maltese children, developmental ordering of vocabulary concepts for the pilot version of the picture book had relied on the examination of clinical resources and literature indicating children’s word use at specific ages within the target age span. Vocabulary items that could be represented pictorially were thus identified. Corresponding coloured line drawings were downloaded from www.clipart.com and compiled into a book, with each page displaying a single picture. Although pilot testing of the naming task had demonstrated its validity (Gatt, 2007), it also identified aspects requiring modification. For the main study, pictures were ordered according to the age and frequency of use of the respective labels in the pilot study. Four pictures employed at the piloting stage were eliminated since they did not elicit any response. Pictures of a glass and a bicycle replaced the pilot study pictures depicting cup and tricycle to enhance children’s use of picture labels that corresponded with adult meanings. Confirming that the adjusted version of the task retained its validity is an aspect addressed in this study.
Language sampling
Following the picture naming activity, vocabulary production was observed as each child interacted with the caregiver during free play for approximately 20 minutes. Parental involvement in naturalistic sampling has been shown to encourage the child’s verbal expression (Bornstein et al., 2002; Hoff, 2010), thus making the sample more representative of the child’s expressive potential. Throughout the sampling procedure, the observer adopted an unobtrusive role. A standard set of toys was provided to enhance replication of the sample setting across children. This consisted of a telephone, camera, stacking balls and cups, cars, baby doll and baby care items, kitchen set, farm animals, tool set, insert puzzles and a pop-up cause-and-effect toy. The range of play materials was purposely chosen to provide for the varying levels of cognitive skill expected at the different developmental stages of the young participants, besides taking children’s diverse toy preferences into account. For instance, the pop-up toy was included for its potential to engage the youngest participants, while the baby doll and related items, kitchen set, farm animals and tool set were expected to elicit pretend play in the older children. Caregivers’ inclusion of additional toys in the home which habitually elicited substantial verbal expression on the child’s part was accepted but only took place occasionally. Although this procedure limited control over the number and types of toys available, it enhanced the representativeness of the collected samples, encouraging children to use a wider range of the vocabulary available in their repertoires. The elicitation of richer vocabularies also motivated the presentation of largely different items in the sampling procedure and in the naming task.
Recording, transcription and coding
All direct assessment sessions were audio- and video-recorded to facilitate the accurate transcription of children’s vocabulary use. A Sony MZ-N710 portable Mini Disc recorder with a Sony microphone was used with the 1;0-year-old group and two 1;6-year-olds. This was replaced by an Olympus WS-300M digital voice recorder which allowed more flexibility with transferring the audio data to computer. A free-standing Yoga EM-278 stereo condenser microphone was also used with the digital recorder to enhance the quality of recordings. A Panasonic NV-VZ1 VHS-C movie camera with incorporated microphone was used to video-record the adult–child dyads, allowing the non-verbal and contextual aspects of the communicative interactions to be captured.
Preliminary transcription of children’s vocalisations during free play and picture naming was attempted as the caregiver–child dyads were observed. A full orthographic transcription of spontaneous and imitated utterances was subsequently carried out on the basis of the audio-recordings. Video-recordings helped decipher unintelligible productions captured on the audio-recordings. Inter-transcriber reliability was ascertained by calculating percentage agreement with a second transcriber in the transcription of all intelligible word tokens produced spontaneously and imitatively. Independent transcription yielded a mean percentage agreement of 91.37% on the sample data and 96.70% on the picture naming data.
For coding purposes, transcribed words were first arranged in alphabetical order so that type and token counts could be derived manually. Spontaneous, imitative and prompted words were coded separately. Findings reported in this study derive solely from measures of spontaneous word use.
Measures
Caregiver report measures
Reported vocabulary production was represented by Total Vocabulary (TV) scores which tallied the total number of Maltese and English words used, thus indexing expressive vocabulary size. TV incorporated recognised and recalled vocabulary items. Recognised words were those vocabulary items employed by children which caregivers identified in the word list provided. Recalled words were additional items which adults contributed to the standard inventory. For the purpose of comparison with picture naming data, recognised words were converted to a proportion score by expressing their sum as a percentage of the 916 checklist entries available.
Picture naming measures
A picture naming raw score was obtained by counting the semantically relevant labels produced by each child while viewing the picture book. Scoring considered the first label produced, whether Maltese or English, for every picture viewed. Proportion scores were derived by expressing picture naming raw scores as a proportion of 18, the maximum number of scored picture names possible.
Language sample measures
The Number of Different Words (NDW) quantified the range of Maltese and English vocabulary items used by participants during the 20-minute sample. Representing expressive vocabulary size, it was the measure analogous to TV derived from the caregiver report. Unlike the measures generated by the caregiver report and picture naming, NDW was not expressed as a proportion since there was no upper limit on the number of words that could be produced during the 20-minute sample.
Validity and objectivity of measures
Measures yielded by each of the three methods were expected to have different numerical bases. It was therefore necessary to ascertain that the data obtained documented similar trends in expressive vocabulary development, providing evidence for validity of the relevant measures. Demonstrating the concurrent validity of the picture naming tool and vocabulary checklist adaptation was especially important since both instruments were devised purposely for the study. The picture naming task was intended as a supplementary measure of expressive vocabulary that could be employed in the cross-validation of reported, sampled and elicited measures. The identification of commonalities and discrepancies between each pair of scores also flagged up considerations pertaining to the impact of method on vocabulary measures.
Results
The measures generated by picture naming, language sampling and caregiver report are first analysed descriptively. The outcomes of statistical analyses follow, together with an item-by-item comparison of checklist and picture naming scores.
Descriptive analyses
Mean raw scores and standard deviations obtained for picture naming, language sampling and caregiver report are presented in Table 1. Mean raw scores obtained for picture naming consistently ranked lowest among the three measures, while caregiver report yielded the highest counts at all age points. The discrepancies between mean scores increased as older age groups were considered. Considerable dispersion of scores was present across methods and age groups. Standard deviations were smallest for picture naming scores, with restricted naming score ranges reflecting the limited number of task items. In contrast, reported checklist measures were marked by vast discrepancies between minimum and maximum scores, in accordance with the larger number of reporting options available. The youngest children’s minimum scores were at or near floor on all measures. Five children scored at ceiling on the picture naming task.
Mean raw scores (M), standard deviations (SD) and ranges derived from picture naming, language sampling and caregiver report.
Figure 1 illustrates the substantial differences in mean vocabulary production scores related to the data source. Both caregiver report and sampling revealed an exponential rise in vocabulary production, with the reported rate of growth being much higher than that sampled. Moreover, word production increased rapidly between 1;6 and 2;6 years according to checklist data, while sample data showed acceleration in expressive vocabulary growth to take place between 2;0 and 2;6. The increase in expressive vocabulary size with age documented by the naming task was protracted compared to data sourced from the other two methods.

Mean picture naming raw scores, Number of Different Words (sample data) and Total Vocabulary (checklist data) relative to age.
Statistical analyses
Table 2 shows the mean percentage scores derived for naming and checklist measures. Picture naming percentage values consistently exceeded those derived for caregiver report, reflecting the computation of proportion scores from a smaller total.
Mean percentage scores (M%) and standard deviations (SD) derived from picture naming and caregiver report.
Two-way analyses of variance (ANOVA) revealed that mean percentage scores obtained through picture naming and caregiver report were significantly predicted by age, F(3,80) = 62.30, p < .001, and method, F(1,80) = 55.65, p < .001. However, these effects were qualified by a significant interaction, F(3,80) = 5.91, p < .05, reflecting that the rate of increase in proportion scores for checklist and picture naming data was dissimilar. Indeed, mean percentage scores were found to become increasingly different with age. Follow-up pairwise comparisons showed that the mean differences were significant at every age point (ps < .05) except 1;0.
Relationships between checklist, sample and picture naming measures were investigated through partial correlations in which the effects of age were held constant. Significant positive relationships resulted, with all measures being moderately associated: for picture naming and checklist measures, r = .556; for picture naming and language sample measures, r = .540; for checklist and language sample measures, r = .635 (all ps [two-tailed] < .001). These findings point towards a close correspondence between the vocabulary scores derived from elicited, sampled and reported measures. Therefore, despite considerable differences in the mean numbers of words yielded by each method, children’s relative vocabulary levels were identified similarly through picture naming, sampling and caregiver report.
Item-by-item analysis (checklist and picture naming data)
An item-by-item comparison of children’s word production on the vocabulary checklist and picture naming task was carried out to gain further insight on the similarities and differences between vocabulary items recorded by caregivers and those elicited during picture naming. For each child, reported vocabulary items were screened for words matching the semantically relevant picture labels that were scored on the naming task. The range of picture labels that participants could produce was unlimited, so both recognised and recalled items were considered for comparison. For every child, a score of 1 was assigned for each match. For every age group, the total number of matching checklist items was expressed as a percentage of the sum of picture naming scores. Table 3 shows the picture naming and checklist match raw scores for each group to vary, complementing the finding of significant differences between means identified through the ANOVA. Nonetheless, consistently high percentage agreement between elicited and reported vocabulary emerged for each age group, with percentage match scores ranging between 78.05% and 83.93%. The segment of matching vocabulary items for naming and checklist data was therefore largely similar at each age level and across the four age groups together, confirming the correspondence between both measures identified by the partial correlation procedure.
Total picture names and total matching checklist items (raw scores) together with percentage of matching words relative to age.
Discussion
A triangulation of methods investigated the development of expressive vocabulary in children aged 1;0, 1;6, 2;0 and 2;6 years, exposed primarily to Maltese. Vocabulary production skills were gauged through caregiver report, picture naming and language sampling in a cross-sectional study. While enabling the in-depth documentation of participants’ expressive vocabularies, the methods employed allowed the comparison of different measures. Similarities between reported, elicited and sampled vocabulary scores pointed towards the validity of measures, whereas differences shed light on methodological biases impinging on the vocabulary data. The commonalities and discrepancies resulting among the three measures are discussed first, followed by evaluation of the item-based comparison of caregiver report and picture naming data.
Comparison of reported, sampled and elicited measures of vocabulary production
First, descriptive statistics highlighted the substantial individual differences in vocabulary production evidenced by the three methods. Floor and ceiling effects on the picture naming scores restricted the score ranges. Besides individual differences in word production abilities which, for the same children, would be expected to emerge relatively equally across methods, reported measures may have been influenced by caregivers’ subjective judgement and recall abilities. Indeed, caregiver-based assessment has been claimed to filter information in the light of individual perceptions of language skill (Stiles, 1994). Sampling is also prone to situational constraints that affect the child’s language performance. For example, Bornstein et al. (2002) showed that 2;0-year-olds employed more utterances, more word types and longer productions in sampling situations deemed by mothers to be optimal in enhancing their children’s expressive language skills. Hoff (2010) reported that children aged between 1;5 and 2;2 years employed richer vocabularies and produced more cohesive contributions in book-reading contexts than at mealtimes and during play. It has also been said that spontaneous language samples reveal the vocabulary children prefer to use, while caregiver-based assessment is more indicative of the totality of their expressive vocabulary repertoire (Bates, Bretherton, & Snyder, 1988; Pine et al., 1996). Indeed, it was unlikely that the numbers of words produced by children during 20-minute samples would approximate the vocabulary scores generated by a 916-word checklist. Had a more intensive sampling design been employed in the current investigation, it probably would have still failed to cover the range of words identified by caregivers across various communicative situations. The 18-picture naming task was expected to generate even smaller scores. Therefore, measures of vocabulary production were characterised not only by individual differences but also by the variability contributed by the assessment method. These methodological differences are explored next.
As expected, comparison of the mean scores obtained for each method showed numerical dissimilarities across measures. Moreover, graphical representation of scores highlighted the increasing discrepancies between reported, sampled and elicited mean vocabulary scores with age, indicating that each method gauged the rate of vocabulary growth differently. Differences were also noticeable in the age at which expressive vocabulary growth started to accelerate as evidenced by checklist and sample measures. Further, method was found to exert a significant effect on mean percentage reported and naming scores, which became increasingly evident with age. These findings underscored the methodological influences impacting the three measures of vocabulary production.
Importantly, however, all methods concurred on the finding of faster vocabulary growth with age. This outcome was minimally perceptible for picture naming scores, due to the limited number of words the task had the potential to elicit, coupled with the fact that five of the oldest children performed at ceiling on the naming task. Checklist and sample data, however, clearly showed accelerating word production. Moreover, positive and significant correlations among the three measures showed that they documented corresponding vocabulary levels when the effect of age was controlled, providing evidence for their concurrent validity. Thus, the dissimilar scores were validated by the positive associations among them, which implied that lower- and higher-performing children would be identified similarly by each method. This finding corresponds with similar concurrent relations documented in the literature. For example, correlations between vocabulary checklist measures and object naming scores as well as the number of different words sampled have been reported by Thal et al. (2000) for monolingual Spanish-speaking children and by Marchman and Martínez-Sussman (2002) for bilingual Spanish-English children. Ukrainetz and Blomquist (2002) identified significant positive correlations between the number of different words identified in language samples and the scores obtained on four standardised vocabulary tests, two of which addressed expressive vocabulary skills. Vocabulary checklist scores derived from the Quebec-French adaptation of the CDI: WS correlated positively and significantly with the number of different words employed spontaneously by the same children during play (Trudeau & Sutton, 2011). Pan et al. (2004) identified positive and significant correlations among checklist, sample and standardised measures of vocabulary and more general language skills. Together, these results show that drawing data from multiple sources when measuring children’s vocabulary production not only helps to pull apart methodological variability from individual differences, but also serves to cross-validate measures.
Item-by-item comparison of caregiver report and picture naming
Focusing on only two of the three methods previously compared, this in-depth analysis provided further insight on the similarities and differences between caregiver report and picture naming, better informing the outcomes of triangulation in this study. Understandably, picture naming percentage scores tended to be higher than checklist percentage scores because they were expressed as a proportion of a smaller total. Despite the smaller range of vocabulary covered by the picture naming task, elicitation tapped into a segment of expressive vocabulary that was not reported by caregivers. Growth in naming raw scores with age was generally accompanied by an increment in the number of matches with checklist data. Yet, discrepancies between the two sets of total scores became increasingly evident with age, complementing the finding of significant differences between the mean percentage scores yielded by caregiver report and picture naming. Elicited vocabulary was not matched for two reasons: picture labels that were represented on the checklist went unrecognised, and picture labels that were not among the checklist entries were not recalled. These findings suggest that the checklist measure was prone to reporting biases, confirming results reported by Pine (1992) and Pine et al. (1996). Understandably, caregiver report appeared to become less accurate as children grew older and their expressive repertoires expanded considerably. Picture naming was also subject to methodological biases as it depended on children’s compliance and ability to sustain attention to symbolic referents. Nonetheless, the high percentage agreement between reported and elicited items suggests an affinity between the two measures that supports the finding of a positive and significant correlation between them. Despite differences between the two methods in procedures and tools employed and measures generated, both yielded highly comparable measures.
Conclusion
A multi-method strategy investigated vocabulary production in 44 children aged 1;0–2;6 years who were exposed primarily to Maltese. Caregiver report, picture naming and language sampling generated complementary measures of vocabulary production that were subject to methodological biases. None of the methods was found to yield a complete record of all the words used by the participants, indicating that each method generated an index of vocabulary production rather than an exhaustive measure.
Clearly, multi-method approaches have the advantage of allowing the validity of measures to be addressed, while shedding light on the confounding effects of methods on measures and results. The quality of vocabulary development research therefore stands to benefit from triangulation designs. This is not to say that studies of vocabulary development that employ single-method designs necessarily lack methodological rigour. Besides, the feasibility of multi-method designs is often limited because collecting data from different sources is labour-intensive and time-consuming. These considerations raise the question of when triangulation designs should be prioritised over single-method approaches.
The empirical literature attests to the fact that multiple measures are called for when the concurrent validity of novel vocabulary assessment tools is evaluated. Once a tool is validated, there is reassurance that it generates sound measures. Such an outcome might, in turn, eliminate the need for more than one vocabulary measure when the same valid tool is employed again, unless it is used as a criterion measure in the validation of other data sources. Validation of novel assessment tools in underresearched languages or language pairs is usually unable to rely on established criterion measures, prompting consideration of alternative methods for comparison. Triangulation also helps to detect the nature and extent of methodological influences, minimising the risk of erroneous conclusions. It is therefore concerning that single-method studies may incorporate unidentified methodological bias that undermines the dependability of results. This might be taken to suggest a pressing need to include at least two methods in investigations of early vocabulary development. The availability of two or more data sources also allows the derivation of composite vocabulary measures, a strategy that may further minimise the influences of individual methods. Pooling non-overlapping words, documented through different methods, leads to a score that is likely to approximate the totality of children’s vocabulary repertoire more closely. Although not attempted in the current study, this strategy might have enhanced the comprehensiveness of the data. However, there is also the issue of balancing multiple methods with sample size. It is considerably taxing on resources to derive multiple vocabulary measures for a substantial sample of children. Indeed, single-method designs are sometimes inevitable. At best, acknowledgement of suspected methodological bias and potential validity concerns is called for when multiple data sources cannot be employed.
In sum, the present study has shown that a triangulation of methods in a study of early vocabulary development enhanced the dependability of measures and results. Undoubtedly, multi-method designs have the potential to improve methodological rigour. However, the extent to which they should be employed in investigations of vocabulary production is still open for debate, pointing towards a need for further research on the use of multiple vocabulary measures. Outcomes would contribute to a better understanding of the contexts, research questions and variables that justify the choice of multi-method designs in the study of early vocabulary development.
Footnotes
Acknowledgements
The authors would like to thank the editor and two anonymous reviewers for their helpful comments on an earlier version of this article.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
