Abstract
This article examines the input to Argentinian Spanish-learning children from low and middle socioeconomic status (SES). It aims to determine whether the vocabulary composition (nouns and verbs) of their input varies as a function of SES, the addressee and other contextual variables such as the type of activity and the pragmatic orientation of the utterances. Thirty children (mean: 14.3 months) and their families were audio-recorded for four hours and the middle two hours were analyzed using Computerized Language Analysis (CLAN). The nouns and verbs in child-directed speech (CDS) and overheard speech (OHS) were identified using the CLAN’s part of speech tagger MOR Morphosyntactic Analysis. Regression analyses showed effects of: (a) SES and addressee on the proportion of noun types and tokens; (b) the type of activity and the pragmatic orientation of the utterances on the proportion of nouns in CDS; (c) SES and type of activity on the proportion of entity and action-oriented utterances. These findings reveal that given the complexity of children’s home environments it is crucial to consider these social and contextual dimensions to account for the distribution of different lexical categories. How they are distributed in the input likely influences the developmental course of vocabulary acquisition.
Introduction
Language acquisition research has shed light on some key characteristics of early experiences that may contribute to individual and sociocultural variations in children’s early vocabulary trajectories. Studies conducted in children’s households and laboratories have mainly analyzed child-directed speech (CDS), that is, the speech that caregivers address to children. Although the primary research focus has been on the degree to which quantity, diversity and pragmatic properties of language input predict later vocabulary, the lexical composition of CDS has also been examined in studies that seek to explain the predominance of either nouns or verbs in children’s early lexicon. Some of the latter work has also highlighted that children’s sociocultural environments (pragmatic styles and types of activities) may influence the proportions of nouns and verbs to which children are exposed in daily interactions (e.g. Altınkamış et al., 2014; Choi & Gopnik, 1995; Goldfield, 1993; Tardif et al., 1999).
Nouns and verbs carry most of the semantic or conceptual information in utterances. However, these grammatical categories differ in regard to content and syntactic function (Croft, 2000; Dixon, 2004; among others). While nouns prototypically refer to people, things or phenomena and function as arguments in a sentence, verbs are commonly associated with actions or states and function as the glue that holds arguments together (Bowerman & Brown, 2008). Nouns and verbs differ with respect to their morphological, lexical and syntactic behavior and this has implications in language processing (Błaszczak & Klimek-Jankowska, 2015). Hence, it is of utmost importance to consider the lexical composition of the language that children have to process to make sense of their environments (Nelson, 2007).
It is noteworthy that CDS comprises only a portion of children’s linguistic exposure, and that the distribution of CDS and the speech that children overhear (OHS) varies between families, cultures and socioeconomic groups (Sperry et al., 2019). Nonetheless, the studies that identified the predominance of certain grammatical categories based their conclusions only on CDS (e.g., Altınkamış et al., 2014; Choi & Gopnik, 1995; Glas et al., 2018; Goldfield, 1993, 2000; Jackson-Maldonado et al., 2011; Tardif, 1996), did not discriminate between CDS and OHS (Stoll et al., 2012) or compared CDS with adult-directed speech (ADS) collected in an experimental situation (Adi-Bensaid et al., 2015). Moreover, despite the mounting evidence that demonstrates the impact of socioeconomic status (SES) on the quantity, diversity and pragmatic characteristics of the input (see Pace et al., 2017; Schwab & Lew-Williams, 2016, for reviews), the influence of SES on the frequency and situational context in which children are exposed to nouns and verbs in daily conversations has not yet been assessed.
Hence, the present article investigates the lexical properties of children’s natural linguistic environments, taking into consideration the role of SES as well as the addressee, the activity context and the pragmatic function of utterances. First, we analyze the entire speech –CDS and OHS – produced in the environments of Argentinian Spanish-learning children during the ebb and flow of everyday activities and explore the relationship between SES and the frequency of occurrence of nouns versus verbs. Next, we assess whether the pragmatic function of the utterances and the ongoing activity influence the frequency of occurrence of nouns and verbs in CDS. In doing so, we link distinct but related findings on how children’s access to nouns and verbs can be shaped by (a) the typological and input properties, (b) the pragmatic functions of the utterances and activity systems that might drive children’s attention to nouns versus verbs and (c) SES-related differences in children’s linguistic environments.
Typological and input properties that shape children’s access to nouns and verbs
Cross-linguistic research has shown that typological features and input frequency can determine specific patterns of content words that favor either nouns or verbs in children’s linguistic environment.
Typological features may influence the salience of certain morphosyntactic information in utterances (Choi, 2006; Gathercole & Min, 1997; Tardif et al., 1997; among others). For example, Tardif et al. (1997) explained that in pro-drop languages, such as Spanish, verbs may be emphasized more than in non-pro-drop languages, such as English or French, because fewer noun phrases are necessary to communicate the same meaning. In addition, word order may influence phonological salience. Words in utterance-final position and one-word utterances are bounded by silence and lengthened in comparison to words elsewhere in the utterance (Longobardi et al., 2015). In Spanish, verb morphology expresses the subject, and thus word order is relatively flexible: verbs can be placed at the beginning or the end of utterances. However, nouns are frequently inserted at the end of utterances too.
The foregoing considerations suggest that the effect of input frequency is not straightforward. However, based on the general finding that the sheer frequency of words in the input influences word acquisition (Goodman et al., 2008), various cross-linguistic studies have assessed the link between the quantity of nouns and verbs in CDS and children’s early lexicons. A quite consistent association has been observed, with a bias towards noun types in English (Choi, 2000; Tardif et al., 1997) and towards verbs in other languages, such as Mandarin, Korean, Cantonese and Tzotzil (De León, 2001; Leung, 1998; Tardif et al., 1997). Yet, studies of Hebrew, Italian, Spanish, Chintang, Japanese, French and Turkish have failed to find conclusive evidence of the proposed relationship (Adi-Bensaid et al., 2015; Altınkamış et al., 2014; Camaioni & Longobardi, 2001; Jackson-Maldonado et al., 2011; Ogura et al., 2006; Stoll et al., 2012).
Contextual properties that shape children’s access to nouns and verbs
The sociocultural affordances that influence language deployment in social interactions have been the focus of investigations that regard language acquisition as part of the socialization process (e.g. De León, 2001). Findings in this stream indicate that caregivers’ speech varies considerably as a function of contextual characteristics typically defined by culture and social environments: e.g. the type and structure of the ongoing activity (Glas et al., 2018; Goldfield, 1993; Söderström & Wittebolle, 2013; Tamis-LeMonda et al., 2019; Weisleder et al., 2019), the variety of objects used (Brown, 2014), and the presence, quantity, age and role of other participants in the interactions (Hoff, 2006). Thus, research has explored whether culturally-defined features of interactional routines and/or the ongoing activities help explain the cross-linguistic differences in the input.
Several studies have indicated that the predominance of certain pragmatic functions in CDS differs among linguistic communities. Naming and other entity-centered utterances, that orient the child’s attention to objects or elicit their names or properties, characterize child–caregiver interactions in English, French, Taiwanese, Italian and Turkish populations. Conversely, Korean and Mandarin caregivers are prone to produce activity-oriented utterances, emphasizing verbs in their commands, descriptions and elicitations about the world (Choi, 2000; Kim et al., 2000; Masur et al., 2013; Ogura et al., 2006). A less clear tendency towards entity or action-oriented utterances was found in Mexican Spanish-speaking caregivers. Although mothers’ utterances aimed mainly to regulate children’s activities and less to describe objects and activities, no significant differences were observed in the quantity of utterances oriented to eliciting nouns and verbs (Jackson-Maldonado et al., 2011).
Additionally, quasi-experimental studies have shown that the relative frequency of nouns and verbs in CDS is highly sensitive to the ongoing activity. The higher frequency of nouns in child input during book-readings than in play events is a widespread phenomenon that has been systematically recorded (Altınkamış et al., 2014; Choi, 2000; Goldfield, 1993; Ogura et al., 2006; Tardif et al., 1999). Although stylistic and cultural differences have been identified in caregivers’ use of teaching strategies during these situations (Luo et al., 2011; Melzi & Caspe, 2005), the higher frequency of nouns during book-readings may be due, in part, to certain features of the activity. During book-readings, pictures, a fundamental component of children’s story-books, are the focus of joint attention motivating the use of referential language. Also, the presence of objects – toys and even different types of toys – is a source of variation in the vocabulary composition of the input: non-toy-play events entail a lesser use of nouns than do toy-play events (Goldfield, 1993; Tardif et al., 1999).
Recently, Glas et al.’s (2018) naturalistic study revealed a greater proportion of verbs in Tunisian than in English and French households. Interestingly, their results indicated that cross-linguistic and sociocultural variability in everyday life is less pronounced during social activities, such as book-reading and play, than in maintenance activities (feeding and hygiene). This suggests that naturalistic studies may unveil a particular portrayal of reality not clearly depicted in quasi-experimental studies (Bergelson et al., 2019; Tamis-LeMonda et al., 2017), and underscores the relevance of analyzing language in children’s natural environments.
SES-related differences in children’s linguistic environments that may shape their access to nouns and verbs
The evidence that input, interactional style and type of activities influence children’s access to nouns and verbs leads us to ask whether SES-related differences have a bearing on the vocabulary composition of children’s linguistic environments.
Evidence, mostly from studies of English-speaking populations, have depicted the ebb and flow of everyday life in different SES backgrounds. In many cases, low-SES households consist of large families (Psaki et al., 2014; Vernon-Feagans et al., 2012). In these households, everyday activities are not frequently centered on children, who usually have less access to toys and other child-specific objects (Bradley & Corwin, 2002). Children share activities with peers and adults and thus input may, to a greater degree, stem from multi-speaker interaction (Sperry et al., 2019). Low-SES children hear less quantity of CDS (Casillas et al., 2017) delivered in short utterances characterized by less lexical diversity (Hoff, 2003). A greater proportion of caregivers’ speech is directed at managing the child’s behavior, and their speech includes a lesser proportion of eliciting questions. In contrast, CDS episodes are more commonly found in middle-SES households (e.g., Hart & Risley, 1995; 2012; Hoff, 2013; Pace et al., 2017; Schwab & Lew-Williams, 2016). To explore whether these SES-related differences influence the frequency and interactional context in which children hear certain word categories, it is necessary to study daily natural situations, considering not only the speech directed to the child but also overheard speech.
Why is it relevant to look at OHS as well as CDS? The benefits that OHS might bring for language development among children from low-SES backgrounds are currently under debate in the foregoing literature (Golinkoff et al., 2019; Sperry et al., 2019). Weisleder and Fernald (2013) have provided evidence that only the quantity of CDS but not OHS predicts later vocabulary comprehension, they did not consider the quality of OHS, which may be relevant to analyze as well. Especially because, in line with research suggesting the use of observational learning strategies through multiple domains of development (Rogoff et al., 2011), experimental findings indicate that children are keen observers of third-party interactions and are able to monitor overheard conversations. At least when a thorough control over other stimuli is exerted and attentional demands are minimized, children can learn word meanings from overheard language (Akhtar, 2005) without the need of ostensive cues (Arunachalam et al., 2013). Hence, examining the patterns of nouns and verbs in naturalistic speech – both CDS and OHS – produced at home across socioeconomically diverse groups may reveal whether SES shapes certain characteristics of children’s environments that might be related to word learning trajectories.
The Argentinian context: an understudied population
Argentina has a fragmented social structure, with considerable variation along the dimensions of housing, occupation, education, and family structure and size. This variation determines markedly different living and developmental conditions for children growing up in the most populated cities of the country.
In the metropolitan area of Buenos Aires, there exist 1102 informal and precarious settlements known as villas de emergencia. Official data inform that 400,900 families (mostly, descendants of migrants from the north of Argentina or neighboring countries) currently reside in these marginalized urban neighborhoods, i.e., approximately 2,004,500 people. The urban segregation of the villas de emergencia and poor neighborhoods in general, as well as their clear-cut differences with residential neighborhoods, where middle- and high-income families – mainly of European origin – live, creates extremely dissimilar material conditions of living. Among low-SES populations, large families and three-generation households are very common, which augments the index of living density (the number of people per room can reach 5.7) (Dirección General de Estadística y Censos, 2016).
These varying living conditions are amalgamated with marked differences in the level of education accessed by the population. The differences in education are quantitative: in the villas de emergencia adults may have had little schooling: less than 12 and even less than 7 years (Abelenda et al., 2016). Differences are also qualitative: the education of disadvantaged populations proceeds through circuits of schooling with unequal human, material and pedagogical resources (Tiramonti, 2004). These circumstances result in a major socioeconomic inequality, accompanied by sociocultural but not linguistic differences, given that Spanish is the native language of the majority of the population.
The current study
This study is motivated by the different streams of research reviewed above: studies demonstrating how the relative prevalence of nouns and verbs in early vocabularies is influenced by (a) input frequency and typological properties, (b) interactional patterns and activity contexts, and (c) SES-related differences on CDS and OHS, all of which shape children’s linguistic environments. Taken together, these findings raise the question of whether SES differences bring along variations in the quantity and distribution of nouns and verbs children are exposed to in daily interactions. To our knowledge, no previous study has examined the impact of SES on the patterning of nouns and verbs in children’s everyday environments. Differently to previous studies that focused on qualitative features of the input, we analyzed not only CDS but also OHS. Additionally, we examined CDS to determine whether the effects of SES on the distribution of nouns and verbs were mediated by other factors at a proximal level, such as the type of activity and pragmatic function of the utterances produced in daily interactions. Finally, we assessed whether SES and type of activity predict the proportion of entity and action-oriented utterances in CDS. Thus, the present study analyzes an understudied Spanish-speaking population from Argentina to answer the following questions:
Do SES and addressee (CDS or OHS) have an effect on the proportion of nouns versus verbs produced in the entire linguistic environment of Argentinian Spanish-speaking children?
Do SES, type of activity and pragmatic function of the utterance have an effect on the proportion of nouns versus verbs in CDS?
Do SES and type of activity have an effect on the proportion of entity and action-oriented utterances in CDS?
Previous work has shown that, compared to low-SES mothers, middle-SES mothers are more likely to engage in book-sharing activities and organized play and to use referential language (Hoff, 2006). Hence, we expect a greater proportion of nouns and a predominance of entity-oriented utterances in middle-SES linguistic environments. Conversely, the high frequency of commands observed in low-SES households from other populations led us to anticipate that utterances in the low-SES households of our sample will tend to emphasize verbs and to be oriented towards actions. Moreover, we predict that the type of the activity will influence the proportion of nouns versus verbs and entity-oriented versus action-oriented utterances in CDS.
Method
Ethics statement
This research was conducted following the ethical regulation 5344/99 by the National Scientific and Technical Research Council of Argentina (CONICET) and was approved and supervised by CONICET’s committee. Parents provided written informed consent for their participation as well as their children’s.
Participants
Thirty children (mean age 14.3 months) and their families, who lived in the metropolitan area of Buenos Aires, participated in the study. They were drawn from a larger longitudinal sample of socioeconomically diverse children 1 (Corpus: Rosemberg et al., 2015–2016).
A sample of children from both ends of Argentinian cities’ SES spectrum was achieved by considering two parameters: education and housing. We categorized a child as middle SES if: (i) the mother or the father had a university degree or similar (e.g., a teaching degree) – which translates into at least four years of education after secondary school – and (ii) the family lived in a residential neighborhood. In contrast, caregivers whose children were categorized as low SES: (i) had at most completed secondary school and (ii) lived either in a villa de emergencia in the city or in an impoverished suburban neighborhood. SES groups were balanced in terms of age and gender. Spanish was the first language of all the families and the one used in everyday interactions (see participants’ characteristics in Table 1).
Description of participants. Means/N (SD/%) of selected variables in each SES group.
Procedures
Data collection
During four-hour sessions without the presence of an observer, children wore vests equipped with digital devices in order to audio-record every natural interaction at home or occasionally outside. Families were asked to interact as they normally would. Immediately after, families provided information about the activities carried out and the participants involved in the four-hour session.
Transcription
Each child’s analyzed sample comprised the second and third hours of the four-hour session. Thus, the total analyzed sample amounted to 60 hours (30 children by 2 hours). All the speech produced in these 60 hours was transcribed following the CHAT format (Codes for Human Analysis of Transcripts, MacWhinney, 2000) and segmented into utterances. Utterances in the same interactional turn were segmented whenever they met two of the following three criteria: they were bounded by a pause longer than 2 seconds, they were syntactically complete, and had a distinctive intonation (Bernstein Ratner & Brundage, 2015). We followed the prosodic patterns described for Spanish (Alarcos Llorach, 1994) and considered whether the final portion of the unit was rising, falling or suspended in order to delimit questions, exclamations or declaratives.
To guarantee inter-transcriber reliability, research assistants and doctoral students underwent a thorough training on the transcription and utterance segmentation protocols under the supervision of a senior researcher. The trainees practiced transcribing trial samples and, once these samples matched verified master files, they started to transcribe the samples included in the study. Subsequently, a senior researcher checked the accuracy of the orthographic transcription and the segmentation into utterances. A third researcher was consulted whenever discrepancies arose, for instance regarding unclear fragments.
Coding and analyses
Addressee
All the utterances (except those produced by the target child) were coded as CDS or OHS. The latter included adult to adult and adult to non-target child(ren) utterances and, also, speech from other children, either directed to an adult or a non-target child present in the situation. The identification of utterances’ addressee was based on (i) their semantic content, (ii) contextual cues, such as the participant’s proximity to the child (inferred from the loudness of their voice) and (iii) information provided by the families about the participants present. As with transcription, the coding of addressee was checked by a senior researcher. To assess the reliability of the coding procedure of addressee, the first, second and last author coded 10% of the sample.
Lexical measures
To identify nouns and verbs, all the utterances in the sample – except those produced by the target child – were morphologically parsed with CLAN’s MOR for Spanish (MacWhinney, 2000). The category of noun included common nouns and excluded kinship terms, vocatives and proper names. The category of verb comprised main verbs, either referring to physical or mental actions. Attention-seeking terms (e.g., mirá [look]) were excluded. Periphrastic verbs (e.g., tiene que leer [she has to read], está leyendo [she is reading], fue leído [it was read]) involve two verb forms: a non-finite verb (i.e., infinitives, gerunds and participles) that conveys most of the meaning and an auxiliary which only carries grammatical information (Real Academia Española, 1983). So as to exclude the latter from the analysis we developed an algorithm in Python (Garber, 2019).
We calculated the number of noun and verb lexemes (types) as well as all the occurrences of nouns and verbs (tokens). Following Stoll et al. (2012) and Altınkamış et al. (2014), we generated two measures: a noun-to-verb types ratio – noun types/(noun types+verb types) – and a noun-to-verb tokens ratio – noun tokens/(noun tokens+verb tokens).
Pragmatic function
Adapting coding schemes from related studies (Altınkamış et al., 2014; Choi, 2000; Kim et al., 2000), every intelligible utterance directed to the target child was categorized as: (1) entity-oriented or (2) action-oriented.
Utterances that led children’s attention towards an entity were diverse: (a) declarative forms such as labeling or describing an entity (qué rico flan [what a delicious pudding]); (b) requests asking the child to produce a noun phrase that denoted an entity, such as identification questions and commands (¿qué vas a cocinar Vera? [what are you going to cook Vera?]); (c) directives or commands that led children to focus on an entity (mirá todos los juguetes que hay allá [look at all the toys that are there]).
Utterances were coded as action-oriented when they directed children’s attention to an overt or mental activity: (a) declarative forms such as labeling or describing an action (comiste el flan [you ate the pudding]); (b) identification questions or commands eliciting verb phrases that referred to an action (¿qué querés hacer? [what do you want to do?]); (c) commands that oriented children to focus on an action (no, todavía no metas la mano [no, don’t put your hand there yet]).
Formulaic utterances (muy bien [very good], por favor [please], gracias bombón [thank you sweetie]), songs (yo tengo una manito la hago bailar [I take a little hand out, I make it dance]), exclamations (¡uy! [ouch!]), greetings (hola hermosa [hello beautiful]), yes/no responses, and utterances without a clear pragmatic orientation were excluded from the analyses.
Activity
The type of activity was coded in two steps. First, each child-directed utterance was grouped into one of the following categories based on careful scrutiny of the audio-recordings: meals, grooming, book-sharing, exploratory play with objects, organized play, physical play, household chores, outings, adult–child conversations that lasted at least five linguistic turns, conversations between adults and watching TV. Organized play was any playful activity that had an inferable objective (e.g., building a tower) or included dramatic play (e.g., having a tea party). Situations in which the child manipulated an object or a toy, and that were not generated or scaffolded by other participants were coded as exploratory play with objects. Running or jumping was categorized as physical play. Reading stories, observing and/or talking about pictures in books and magazines were coded as book-sharing. Having breakfast, lunch, dinner or a snack were categorized as meals. Taking a bath, getting dressed, changing diapers, washing hands and brushing teeth were all classified as grooming. Household chores included cleaning, tidying up the house and cooking. Outings comprised interactions that took place when the child was taken out of the house for a walk, shopping, visiting family, etc. Utterances were labeled as watching TV only when the target child was actively watching a television program. Conversations that developed a topic for a minimum of five successive turns were coded as adult–child conversations or conversations between adults. Whenever two activities occurred simultaneously, we coded the one that organized to a greater extent the participation of the child and her interlocutors. If it was not possible to determine what activity was taking place, the segment was excluded from the analysis. Considering the exclusion criteria of the coding processes described both for the pragmatic function and activities, the analyzed sample contained 5653 child-directed utterances from the initial sample of 11,940.
In a second step the aforementioned types of activity were clustered according to two criteria adapted from categories developed in previous studies (Glas et al., 2018; Weisleder et al., 2019). We determined whether the activity was centered on the household life or on the child. The former included household-maintenance activities such as feeding, grooming, domestic chores, outings and conversations between adults. Child-centered or ‘child-engaging’ activities were further classified considering whether they were socially structured – book-sharing, organized play and adult–child conversations around a topic – or mainly solitary – exploratory object play, physical play and watching TV. The scheme for coding the type of activity is presented in Table 2.
Activity category system.
The first, second, third and fourth authors coded 20% of the sample to evaluate inter-rater reliability for pragmatic functions and activities. Fleiss kappa for multiple coders indicated strong inter-observer reliability (addressee: κ = 0.86, pragmatic function: κ = 0.92, activity: κ = 0.95).
Data analysis
Data processing and statistical analyses were performed using R (R Core Team, 2017). First, we conducted a descriptive examination of the data. Next, we applied beta regression (Cribari-Neto & Zeileis, 2010) and mixed-effects beta regression analyses with ratios as dependent variables. The beta distribution, bounded between 0 and 1, is very flexible, can accommodate skew and symmetry and allows to model regular location (mean) shift but also heteroskedasticity. Therefore, it is appropriate for studying proportions (Smithson & Verkuilen, 2006). Finally, we fitted a set of mixed-effects logistic models with binary response variables.
Results
Descriptive analyses
To provide an overview of the language environment in the low- and middle-SES households we report the number of speakers, the means of types and tokens in CDS and OHS as well as the noun-to-verb ratio in the entire speech produced (see Table 3). On average, 5 and 10 people interacted with the target child during the situations recorded in middle-SES households and low-SES households respectively. In middle-SES households there were more types in CDS than OHS whereas the opposite occurred in low-SES households: there were twice as many word types in OHS than CDS. In both groups the entire speech produced was characterized by a similar proportion of noun versus verb types and a slightly higher proportion of verb than noun tokens.
Characteristics of language environments according to SES. Means, standard deviation, minimum and maximum for selected variables are provided.
Table 4 shows how many child-directed utterances were coded as part of each type of activity and pragmatic function identified. In the analyzed sample household-centered and child-centered activities were equally distributed and the proportion of action-oriented utterances was higher than the proportion of entity-oriented utterances. In particular, the highest proportion of utterances expressed commands oriented to actions, followed by declaratives.
Number of child-directed utterances by activities and pragmatic functions. For each socioeconomic group, percentages (in brackets) were calculated over the total numbers of child-directed utterances included in the analyses of type of activity and pragmatic function.
Regression analysis
In order to address our three research questions we built three regression models controlling for age variability. First, we estimated the effects of SES and addressee on the noun-to-verb types ratio and the noun-to-verb tokens ratio. Next, to explore the effects of SES, type of activity and pragmatic function on the noun-to-verb tokens ratio in CDS, we conducted a two-stage mixed-effects beta regression analysis with target child as a random effect. This analysis was done at the utterance level and thus only the tokens ratio was considered. At stage one, SES and type of activity (clustered in child-centered social, child-centered solitary and household-centered) were included as predictors. The pragmatic function of the utterances was entered at stage two. Finally, to determine whether SES and type of activity predict the pragmatic function of the utterance we built six mixed-effects logistic models with each pragmatic function as the dependent variable. A stratification of the data was necessary due to differences in the amount of data in each socioeconomic group.
Do SES and addressee (CDS or OHS) have an effect on the proportion of nouns versus verbs produced in the entire linguistic environment of Argentinian Spanish-speaking children?
Table 5 shows the results of the first beta regression analysis, which gauged the relationship between noun-to-verb tokens and types ratio and two predictors: SES and addressee (the target child or other participant). The relationships between these variables can be observed in Figures 1 and 2.
Beta regression models predicting the noun-to-verb tokens ratio and noun-to-verb types ratio.
The level of significance is cued as follows: * p < 0.05; ** p < 0.01; *** p < 0.001

Noun-to-verb tokens ratio by SES.

Noun-to-verb types ratio by SES.
As shown in Figure 1, middle-SES environments have a slightly higher ratio – in both CDS and OHS – than low-SES environments. In line with our predictions, we found a main effect of SES on the noun-to-verb tokens ratio (see Table 5). The addressee did not have a significant effect on the mean but a strong one on the precision: variability in CDS is two times bigger than in OHS and addressee significantly predicted this variance. In addition, as age increases there is also a small increase in the noun-to-verb tokens ratio variability.
Regarding the noun-to-verb types ratio, the beta regression analysis yielded a significant effect of the interaction between SES and addressee. While the noun-to-verb types ratio of CDS was higher in middle-SES households than in low-SES households, the opposite applies to OHS. The noun-to-verb types ratio of OHS was higher in low-SES households than in middle-SES households. These results are shown in Figure 2. Lastly, SES and addressee did not have an effect on the variance of the noun-to-verb types ratio.
Do SES, type of activity and pragmatic function of the utterance have an effect on the proportion of nouns versus verbs in CDS?
Figure 3 shows that in both socioeconomic groups and types of activities there is a greater proportion of verbs than nouns. However, the proportion of nouns is higher in the middle-SES group than in the low-SES one, especially during child-centered activities (both solitary and social).

Noun-to-verb tokens ratio aggregated by SES and activity type.
To test the impact of SES, the type of activity and the pragmatic function of the utterances on the proportion of noun versus verb tokens in CDS we built a two-stage hierarchical model. The results are presented in Table 6. At stage one, the model included SES and type of activity, which were both significant predictors: the probability that utterances with many nouns occur increased in child-centered social activities and middle-SES households. However, the model at this stage explained only 4% of the variance (as shown by the pseudo R2). At stage two, after including the pragmatic function of the utterance as a predictor, the model explained 32% of the variance and all the pragmatic functions considered emerged as significant. In entity-oriented utterances the probability that more nouns than verbs occur increases while the opposite happens in action-oriented utterances. At this stage, type of activity – unlike SES – was still a significant predictor of the noun-to-verb tokens ratio.
Mixed-effects beta regression model predicting the noun-to-verb tokens ratio in child-directed utterances.
The level of significance is cued as follows: *p < 0.05; **p < 0.01; ***p < 0.001.
Do SES and type of activity have an effect on the proportion of entity and action-oriented utterances in CDS?
Figure 4 shows differences on the occurrence of entity and action-oriented utterances directed to the target child according to SES and type of activity. To test the effects of SES and type of activity, we fitted mixed-effects logistic regression models. The results are presented in Table 7. Household-centered activities and low SES have been used as reference levels in all the models. The model with action-oriented commands as the dependent variable showed the best fit (pseudo R2 = 0.19). The results revealed that there is a significantly higher probability that the target child is addressed with an action-oriented command in the low-SES group than in the middle-SES group. The probability of being addressed with this type of utterance increases in child-centered solitary activities and decreases in child-centered social activities. The model with entity-oriented requests as the dependent variable, which also showed a good fit (pseudo R2 = 0.16), indicated that the probability of a child being addressed with these utterances increases significantly in middle-SES households and decreases significantly in child-centered solitary activities. The probability of being addressed with an action-oriented request increases in middle-SES households and in child-centered social activities. In addition, an interaction effect between SES and type of activity was found in relation to entity-oriented commands: the probability that a child is addressed with this type of utterance is higher in child-centered activities (social and solitary) that happen in middle-SES households.

Mean proportion of utterances for each pragmatic function aggregated by SES and activity type.
Mixed-effects logistic models predicting the occurrence of utterances with different pragmatic functions.
Discussion
This study adopted a naturalistic perspective to investigate the vocabulary composition of the at-home linguistic environment of a sample of Argentinian Spanish-speaking children. First, we examined whether previously reported effects of SES on general characteristics of the input (Fernald et al., 2012; Hart & Risley, 1995; Hoff, 2013; Rowe, 2012, 2008) were also found on the quantity and distribution of nouns and verbs in children’s everyday linguistic environments.
In the local context, low-SES children are often part of extended families, unlike their middle-SES counterparts. Their participation in daily routines is mediated by many adults and children. Therefore, low-SES children’s linguistic environments are not only – or mainly – made up of CDS by primary caregivers, but also of OHS and the language stemming from the multi-party interactions in which they participate (Sperry et al., 2019). The vocabulary in OHS may also be a source for the development of linguistic knowledge, as shown by previous experimental studies (Akhtar, 2005; Arunachalam et al., 2013). Thus, we examined whether SES influences the use of nouns versus verbs in CDS and OHS. In addition, we analyzed the vocabulary composition of CDS considering the ongoing activity (child-centered social, child-centered solitary and household-centered) in which children participate, as well as the pragmatic function of the utterances directed to them.
A general overview of the distribution of nouns and verbs indicated that in both SES groups there was a similar proportion of verb and noun types and a slightly higher proportion of verb than noun tokens. Consistent with data from other populations that speak pro-drop languages such as Italian and Hebrew (Adi-Bensaid et al., 2015; Camaioni & Longobardi, 2001), this distribution may be explained by the typological features of the language, in this case Spanish. Fewer noun phrases are necessary to communicate the same meaning, subject nouns may be omitted as they are inferable from the grammatical information in verb morphology.
Although Spanish is shared as the common language, certain linguistic properties of children’s input vary as a consequence of social factors. The results revealed a main effect of SES on the noun-to-verb tokens ratio: the proportion of noun versus verb tokens is slightly higher in middle-SES households than in low-SES households. Nonetheless, SES effects are not homogeneous among households that belong to the same group: as shown by the beta regression analysis there is significant intra-group variance. Also, our results yielded a significant effect of the interaction between SES and addressee on the noun versus verb types ratio. Among low-SES households, the proportion of noun versus verb types is higher in OHS than CDS, and among middle-SES households, the proportion of noun versus verb types is higher in CDS compared to OHS. This suggests that the CDS produced in middle-SES households could make speech more easily understandable for young children as it contains more nouns – compared to the number of verbs – that refer to entities (e.g., concrete objects that can be seen, heard, or touched) which may be referred to in joint attention contexts (Hoff, 2006). The higher proportion of nouns in the OHS of low-SES households may be attributed to the presence of multiple participants in the situations in which the children are involved (as demonstrated in Table 3). A greater number of participants may imply more topics addressed during overlapping conversations, and thus more noun types to introduce these topics. In contrast, the fact that middle-SES families have fewer members than low-SES families might imply fewer surrounding conversations, and fewer nouns to introduce the conversational topics. The latter may explain, at least in part, why OHS contains fewer noun versus verb types than CDS among middle-SES households.
The fact that CDS in middle-SES households – compared to low-SES households – is more frequently deployed in child-centered social activities may also account for the significant effect of the interaction between SES and addressee on the noun-to-verb types ratio. Child-centered social activities (e.g., reading story-books, playing with objects, adult–child conversations focusing on a topic) drive adults’ use of referential language in their interaction with children, and thus may explain the increased number of noun types in the CDS of middle-SES caregivers. In the low-SES group, the lower proportion of noun types in CDS –compared to the OHS – produced by low-SES caregivers could be related to the larger amount of commands used to regulate children’s activities (see Table 3).
Nonetheless, the model that included SES and type of activity as predictors of the noun-to-verb tokens ratio explained little variance. Why do our results differ from those that found systematic differences in the ratio of nouns to verbs depending on the type of activity (Altınkamış et al., 2014; Choi & Gopnik, 1995; Tardif et al., 1999)? Discrepancies could be accounted for by methodological differences: the quasi-experimental studies mentioned only elicited child-centered socially structured activities including sets of objects, story-books and specific types of toys in order to measure the occurrence of words in the input. These activities were carried out by child–caregiver dyads in a highly controlled setting. Instead, our naturalistic approach captured activities in everyday life. As found by Bergelson et al. (2019), different methods may lead to different portrayals of children’s language experience. Language input in elicited situations of play tends to be consistently dense, whereas language in naturalistic routines, interspersed with silence, shows fluctuations. Although there is a strong correlation between general averages of types and tokens in the input produced in each context type (Tamis-LeMonda et al., 2017), certain properties of quasi-experimental and natural everyday situations may explain the differences between the results. Quasi-experimental contexts, clearly delimited in time, space and objects, are likely to elicit high levels of shared attention, talk about objects, and specific language forms such as nouns. In turn, activities in everyday life flow naturally and overlap frequently, without precisely defined limits. Therefore, the effect of the type of activity on the lexical measure we analyzed (i.e., the proportion of nouns versus verbs) may not be so strong.
Nevertheless, the model that included the type of activity together with the pragmatic function of child-directed utterances accounted for 32% of the variance in the proportion of nouns versus verbs in CDS. Our results showed that the pragmatic function of child-directed utterances significantly affected the proportion of nouns versus verbs tokens in CDS. After the pragmatic function was added to the model, SES (unlike activity type) ceased to be significant.
Although CDS produced in both socioeconomic groups contains more commands than declaratives and requests (in accordance with other studies on Spanish-speaking populations, e.g., Jackson-Maldonado et al., 2011), our results showed an effect of SES on the frequency of action-oriented commands in CDS (replicating findings on English-speaking populations, Hoff, 2006; Kuchirko et al., 2020; Rowe, 2008). In particular, the probability that an action-oriented command aimed to regulate the child’s activity (e.g., No corras [Don’t run]) occurs is higher in low-SES households than in middle-SES households. Additionally, our analyses indicated that the type of activity affects the amount of action-oriented commands addressed to the target child: in both socioeconomic groups it is more likely that children are addressed with an action-oriented command during child-centered solitary activities and less likely in child-centered social ones. Compared to low-SES, middle-SES children are more likely to be addressed with an entity-oriented command during child-centered activities (social and solitary).
There are some limitations to this study. First, although the procedure of data collection adopted here – using naturalistic recordings at home settings – preserves the ecology of data, it has some inherent disadvantages. This procedure leaves out visual information about participants’ gestures, stances, facial expressions and gazes, which are helpful to determine the ongoing activity. As a consequence, we were able to code the activity of only half of the utterances transcribed (5653). Had we recorded videos, we would have obtained such visual information. However, video recordings also entail limitations: battery-life capacity does not allow extended recordings and handheld cameras require the presence of an observer, rendering the situation less natural. As noted by Bergelson et al. (2019), compared to long audio recordings, even naturalistic observer-free video recordings may blur or distort children’s everyday linguistic experience by inflating input measures of quantity and diversity and yielding specific words and syntactic constructions in talkers’ interactions.
Second, our four-hour audios capture a good portion of children’s daily experiences, although not their entirety. Moreover, due to the high costs involved in transcribing audio recordings we were able to transcribe and code only the two middle hours (60 hours in total), which implied hundreds of hours of work by well trained and experienced research assistants and researchers.
Self-selection is a further limitation to be considered. Not every family accepts home recordings. Therefore, although our sample included only those families that fulfilled our inclusion/exclusion criteria (caregivers’ level of schooling and place of residence), conclusive generalizations to the entire population in these groups cannot be assured. Furthermore, our findings mainly help to understand the impact of SES in the linguistic environments of children from populations that resemble ours, regarding social fragmentation, household density and quantity and quality of parents’ education. Even though this hinders generalization and could be regarded as a disadvantage, therein lies the value of the present study since it represents a step towards understanding the diversity in children’s linguistic experience across communities and cultural groups.
To sum up, the results of this study offer a picture of the complexity of real language input. They present evidence that SES, the proportion of CDS versus OHS (likely related to the number of people in a child’s household), the type of activity (child-centered social and solitary and household-centered), and the pragmatic style of interaction contribute to the nature of the linguistic environments. These findings help us to understand how context shapes the sources from which children learn language. The contextual situation provides models of communicative interaction and linguistic data (highlighting some data over others: for instance, the relative weight of nouns and verbs) for the language analysis that children perform. How language is deployed in interactions with children likely influences the developmental course of vocabulary acquisition, and in particular, the acquisition of different lexical categories.
