Abstract
Pointing has long been considered influential in language acquisition. Certain pre-linguistic vocal expressions may hold even greater value in addressing the transition to language. The goal of the present study is longitudinal evaluation of early communicative development, addressing the influence of pre-linguistic gestures and vocal expressions. This multiple case study report analyzes longitudinal development in five children from 9 to 16 months of age, a critical language transition period. We include gestures of pointing and extending the hand, with interactive as well as request functions. Gestures, communicative grunts, words, and multimodal events combining gesture with vocal accompaniment comprise the data. Results demonstrate group trends and stark individual differences in children’s use of vocal and gestural modalities, and the influence of grunt communication onset on overall communicative frequency in single and combined communicative events. We imbed this analysis within the broader context of mutually interacting variables in a dynamic system. These results argue for greater attention to vocalization as well as gesture in monitoring children’s approach to language development. Based on the role of communicative grunts demonstrated here, this variable should be further studied in both typical and language-delayed children.
Introduction
Language and communication must be understood as emanating from developmental and evolutionary sources characterizing a dynamic system (Thelen, 1989; Van Geert, 2020). Previous studies of gesture development in children typically omitted attention to vocalization, and language acquisition researchers gave little thought to gesture: now relationships between early gestural and vocal development are under active investigation. Colonnesi et al. (2010), in a powerful meta-analysis, found both concurrent and longitudinal correlations between the pointing gesture and measures of language production and comprehension. Infant pre-linguistic vocal development has received less attention, although recent studies (e.g. Donnellan et al., 2020; McGillion et al., 2017) report that vocalization measures predicted early language production, while pointing did not. McCune and Vihman (2001) reported that productive use of two or more consonants predicted referential word onset. McCune et al. (1996) found that onset of ‘communicative grunts’, that is, laryngeal vocalizations based in the biology of respiratory management, predicted children’s transition to referential language. The role of laryngeal vocalizations in adult communication is also notable (e.g. Esling, 2012; Ward, 2006).
McCune (1995) demonstrated that early language transitions can be predicted from children’s representational development as assessed through symbolic play. McCune (1992, 2008) proposed that a set of variables, considered as a dynamic system, including communicative grunt onset as a variable, could account for children’s transition to verbal communication. Missing from this picture was the potential interactive role of pre-linguistic vocal and gestural communication in this transition.
Children engage in both gestural and vocal communication prior to producing words. Gesture has long been held as a strong contributor to both evolution and development of language, while pre-linguistic vocal communication has been less emphasized. Deacon (1997) wondered ‘how most symbolic communication became dependent on one highly elaborated medium: speech’ (p. 353). He proposed that ‘gesture likely comprised a significant part of early symbolic communication, but that it existed side by side with vocal communication for most of the last 2 million years’ (pp. 355–256). Zlatev (2014) reviews the ‘bodily mimesis hypothesis’, originally proposed by Donald (1991) as a primarily gestural theory of the evolutionary origin of language that may include vocal expression. This theory fits well with the Piagetian (1962) notion of the development of imitation as the route to symbolic expression, and the prediction of language milestones from representational play (McCune, 1995). Mimesis is proposed as a wide-ranging skill that emerged prior to the vocal capacity for communication in hominid species (as bodily imitation does in infant development). This would have potentiated development of a primarily gestural system of communication as a step in the evolution of language. There is no necessity that such communications occurred silently, however. Zlatev provides strong evidence for such a step, and expands the hypothesis by suggesting how the shift to a primarily vocal medium might have come about. Development of grunt communication in contemporary non-human primates (McCune, 1999) would suggest early evolutionary origin of vocal communication, but not necessarily as an independent mode of symbolic communication.
Werner and Kaplan (1963) proposed a theory of symbolic and language development, based equally on gestural and vocal development. Relying on detailed early diary studies, they described gestural communication (in particular, pointing) and pre-linguistic vocal communication as emerging from examining and contemplating objects in social situations with communicative partners. Tomasello (2003), in contrast, proposed that pointing begins as infants recognize adults as intentional agents capable of assisting in meeting infant needs, and it is this connection that influences the relationship between pointing and language. In support, Boundy et al. (2019) found that infants’ extension of objects to adults by 10 to 11 months of age, in an elicitation situation, gave evidence of communicative function, supporting social knowledge prior to the typical 12-month age of the emergence of pointing.
But Caselli et al. (2012) found at least 50% of 492 participants producing both points and extends of objects by 10 months of age, while Colonnesi et al. (2010) reported a range of 7 to 15 months of age for the beginning of communicative pointing across the studies they analyzed. Clearly, there is variability in age of onset for pointing, and Orr (2018) reported 10 months of age as the 50% point for both pointing and extending objects to mother.
Colonnesi et al. (2010) interpret Werner and Kaplan (1963) as proposing infant communicative pointing as a step toward symbolization, related to word learning on that basis. More specifically, Werner and Kaplan emphasized the outward direction of the point and the triadic interaction of self, interlocutor, and world. They proposed pointing as functioning for the infant in directing her own attention (point for self), in examining objects prior to communicative function, as earlier reported by Bates et al. (1975). Carpendale and Carpendale (2010), in a diary study (6–14 months), further support the Werner and Kaplan view. Their participant began exploring surfaces with his extended index finger at 7 months of age and pointed to phenomena at a distance from 9 months, but with no social intention. Their detailed observations allow consideration of the gradual emergence of awareness of the social value of pointing beginning at 11 months and culminating in clear communicative goals at 14 months.
Lennon’s (1984) report based on 30-minute monthly observations found that non-communicative pointing was first observed between 8 and 11 months of age in the five participants (out of nine included in the study) who were observed to point for self. Four first showed both communicative and non-communicative pointing in the same or adjacent sessions, while one pointed non-communicatively 2 months prior to social pointing. Ruff (1982, 1984) reported that object exploration with an extended index finger is prominent in infants beginning as early as 6 months, long before typical communicative use. Delgado et al. (2011) provide evidence that pointing for self may serve cognitive functions in children 2 to 4 years of age.
The Werner and Kaplan (1963) relational view sees early communicative pointing as emerging from exploration in the context of an initial sense of mutuality in the infant/caregiver relationship. Specific communicative strategies arise as the child experiences increasing differentiation from the caregiver and seeks to maintain a sense of psychological connection (Mahler et al., 1975). According to this view, the infant comes to experience a sense of oneness (symbiosis) with the mother. With advances in locomotor skill and consequent distancing, the child seeks other means of maintaining a sense of closeness, including gesture. Greater theoretical clarity regarding the multifaceted nature of gesture is needed to allow a more complete understanding of the developmental role of gesture and its integration with communicative development over time (Zlatev, 2018).
As researchers pursue pre-linguistic gesture as a critical source for predicting language abilities and milestones, studies are often correlational in nature, linking earlier gesture and vocal measures to later language assessments. Some contemporary studies adopt experimental models for teasing out developmental sequences for forms and/or functions of gesture (e.g. Cameron-Faulkner et al., 2015), omitting attention to vocalization entirely, or analyzing vocalizations only when accompanying gestures (e.g. Grunloh & Liszkowski, 2015).
How strong is the evidence for the influence of pointing on language acquisition? Prediction of later language from earlier gestural measures implies that gesture is not only a precursor but a necessary component in the transition to language. The common term ‘prediction’ encourages such suppositions. In this regard, the pointing gesture, especially with finger extended, has been the primary focus as this gesture was assumed to be absent in chimpanzees, our closest relatives from an evolutionary perspective (Povinelli et al., 1997). But a recent review establishes finger-pointing in captive animals and includes strong suggestions that it may also exist in the wild, although perhaps rarely (Krause et al., 2018).
LeBarton et al. (2015) produced experimental results where 16-month-old children who were randomly assigned to gesture training, and who increased their gesture performance thereafter, out-performed non-trained peers in words produced in a play session with mothers 2 weeks later, as well as on a parent report measure. This demonstrates a group effect of gesture performance on language during early language development, perhaps implying, but not demonstrating a causal effect on onset of word production. Random assignment to developmental conditions is obviously impossible.
McGillion et al. (2017), in a longitudinal observational study of 46 infants from 12 to 18 months of age, found that a measure of babbling (stable production of two supraglottal consonants, referred to as Vocal Motor Schemes, or VMSs: McCune & Vihman, 2001) predicted initial word production, while pointing onset predicted language comprehension, but not production. In fact, several participants began word production (defined as four different words in a 30-minute session) before either producing a point in a session or being reported to do so by their parents. By the logical supposition of a single false positive invalidating a proposition (Observation of a single black swan falsifies the statement that ‘All swans are white’.), this study demonstrates that pointing is not a necessary precursor to language production. A particular strength of this study is that the relationships were evaluated longitudinally in individual children, rather than through group analysis. The finding that the median age at the babbling milestone was 10 months while the median age at four-word production was 15 months was of interest as it suggests the possibility of intervening variables.
Grunt communication in infants is a potential intervening variable (McCune et al., 1996). Non-human primates (e.g. chimpanzees, gorillas, baboons, and vervet monkeys) exhibit a rich repertoire of varied laryngeally based vocalizations (grunts) with specific functions, produced by mechanisms similar to those of humans (McCune, 1999), suggesting a role for this vocalization in the evolution of language. For example, baboons, whose laryngeal anatomy differs from humans’, produce varied vowel sounds acoustically identical to some human vowels across a range of call types, of which grunts are the most frequent (Boë et al., 2017). Dezecache et al. (2019) reported that infant chimpanzees produce grunt vocalizations across varied contexts of interaction, unlike whimpers that are restricted to negative circumstances.
Ward (2006) reported the use of ‘conversational grunts’, sounds such as uh huh, mm, hn, functioning to regulate conversation and facilitate continued coordination of information between speaker and hearer, a phenomenon noted earlier (Schegloff, 1982). Dingemanse et al. (2013), having examined 20 languages, reported that use of the form Huh? (a laryngeal vocalization similar to a grunt) as a conversation repair initiator, occurs across the globe, and hence deserves status as a word. Esling (2012) described the manner in which biomechanical development of laryngeal function in infancy influences the ability to produce a variety of speech sounds across languages. Laryngeal communication in adults could be derived from earlier laryngeal vocalizations in infancy.
The contemporaneous interrelationships between gestural and vocal expressions during the period of initial communicative and language development are of particular interest, but have not been comprehensively studied. In particular, grunt communication is not typically included as a focus of study, although communicative use of grunts is noted in several studies of gesture. Salerni et al. (2007) identified grunts (using the McCune et al., 1996 method) in the vocal repertoire of 6-month-old pre-term and full-term infants. In an elicitation study, Grunloh and Liszkowski (2015) reported that ‘non-speech-like’ vocalizations such as communicative grunts accompanied at least 30% of points across conditions, with greater frequency in request contexts (70%). Iyer and Ertmer (2014) found that grunts 1 occurred under reflexive conditions in the first months of life, but functioned communicatively in their oldest age group, 9 to 18 months. McCune et al. (1996) reviewed anecdotal reporting of grunts in research over the previous several decades (e.g. Ferguson et al., 1973; Roug et al., 1989). Bordenave (2005) found that children with developmental disabilities continued grunting to communicate at age 4 years if their cognitive level was suited to linguistic communication but they were unable to produce language.
McCune et al. (1996) reported that onset of communicative function for grunts predicted the shift to referential word production and heralded a sharp increase in word production within 1 month in infants studied longitudinally who also met VMS criteria (McCune, 1992; McCune & Vihman, 2001), as well as onset of referential comprehension in children who had not. They proposed that infants’ experience of their own grunts influences recognition of sound/meaning correspondence. Addressing the history of grunt communication in the five children’s vocal repertoires, they reported laryngeal vocalizations (grunts) from 9 months of age when observations began. Initially such vocalizations accompanied movement or effort, while they later accompanied acts of focused attention. Finally, the children produced grunts with evidence of communicative intent, and this is the milestone that predicted the shift to reference, a sequence reminiscent of the Iyer and Ertmer (2014) findings of early reflexive and later communicative grunt productions.
The physiology of grunt production in relation to children’s behavioral condition and their implied psychological experience of this vocalization provides insight into the sequence of development: effort, attention, communication, and the underlying developmental process linking this vocalization with referential word learning. The larynx interacts in a complex process with the lungs and intercostal muscles in service of energy management mediated by the vagus, the 10th cranial nerve (England et al., 1985). Ultrasonic grunt vocalizations in rat pups isolated from the nest function to elicit retrieval from dams (Hofer & Shair, 1978), but the vocalization is an acoustic by-product of ‘laryngeal braking’, that is rapid opening and closing of the glottis to enhance oxygenation under physiological stress (Blumberg & Alberts, 1990; Hofer & Shair, 1993). This accounts for the occurrence of the grunt vocalization under conditions of effort (De Troyer et al., 1985; Remmers, 1973). Attention also is metabolically demanding (B. H. Cohen, 1986; Kahneman, 1973; Porges et al., 1994; Richards & Casey, 1992). Ribot (1890) found ‘modification in respiratory rhythm accompanying intense reflection’ (as cited by B. H. Cohen, 1986, p. 25), linking respiratory effects with attention.
The association of communicative grunt onset with referential words suggests that as the children experience their own attention grunts while in a focused state of meaning, the vocalization becomes associated with that internal state of meaning. Communicative grunts were observed in the McCune et al. (1996) participants only following the onset of representational play, indicating a potential combination of representational meaning with vocalization. Speech production begins with activation of the larynx, so grunts share this feature with words. When children, perhaps attending to an object, experience a simultaneous internal meaningful state and a desire to communicate that meaning, in the absence of a suitable word, they may tend to produce a communicative grunt as a result of laryngeal activation and a simultaneous representational state. Babbling vocalizations have not been associated with representational states. Roug-Hellichius (1998), in a case study of a boy learning Swedish (from 9 to 18 months of age), supported the McCune et al. (1996) findings. Both Roug-Hellichius and McCune (2008) reported that communicative grunts tended to use the same vowels as the children’s early words, while effort grunts used the default central vowel, suggesting the potential for shaping grunts toward word production.
The current report is the first study examining gesture and communicative grunt development in relation to the transition to language in the same group of children. The McCune et al. (1996) study did not include gestures. Lennon (1984) examined gestural development in the same longitudinal sample as McCune et al., but at that time, vocalization data were not available. The present study addresses gestural and vocal modes as Lennon’s gestural data have been cross-tabulated with the later-published vocalization findings.
Following description of the methods of investigation, we examine trends for the communicative variables under study, address the potential association of communicative grunt onset with increases in overall frequency of communicative events, and address individual differences. We then evaluate the timing of onset for point versus extend gestures, and the association of gesture form and function. Finally, we examine the role of gesture and communicative grunt development within a dynamic system including other critical developmental variables.
Methods
Participants
The participants in this research (referred to here by pseudonyms) were five English-learning children: three girls, Alice and Nenni (both first born) and Aurie (second born), and two boys, Rick and Danny (both first born), who were studied monthly beginning between 8 and 10 months of age until 24 months of age. Data for the present study are taken from the 9- to 16-month period. Participants were not selected by social class, but parents were middle class on the basis of their education, employment, and area of residence. In all cases, the mothers were the primary caregivers for their children. The five children met typical language milestones by 36 months of age.
Procedure
The participants were video recorded at home monthly during half-hour free-play interactions. Mother and baby were seated on the floor and played with the same standard set of toys at each visit (McCune, 1995). The present report is based on previously published data except for the gesture coding, as described below. Additional details of methodology and reliability are provided in those publications.
Transcription and coding
English orthographic transcriptions of the children’s language were made with accompanying contextual descriptions of the children’s actions, the mothers’ actions, and the mothers’ language. All transcripts were subsequently entered into the CHILDES database (MacWhinney, 2000). Phonetic transcripts of all vocalizations produced were made for the 9-month to 16-month sessions and form the basis for the current analyses.
Defining communicative events
A communicative event might include a gesture, grunt or word alone (all described below), or a combination of a gesture with a grunt, word, or non-word vocalization. (Murillo and Capilla [2016] found that vocalizations of children 9 to 14 months old may have different acoustic properties when they are produced alone vs with a gesture.) Only vocalizations that overlapped in time with a gesture were considered as accompanying that gesture. Non-word vocalizations (except for communicative grunts) occurring in the absence of a gesture were not considered as communicative. To identify communicative events, the third author located all gestures included in the Lennon study by time log on the video and noted the timing of previously identified vocal items (communicative grunt, word, non-word vocalization) that had been identified in the earlier published reports. The first author reviewed these judgments with the third author to ensure that both agreed on each vocal element noted as accompanying a gesture.
Words
Words were identified based on phonetic and contextual parameters. Consensual agreement was required from two investigators. The primary results were published in Vihman and McCune (2008).
VMSs
The VMS measure documents productivity with at least two specific consonants. Each consonant was produced at least 10 times per month for at least 3 months out of 4 (including the month credited and the next 3). Primary results were published in McCune and Vihman (2001).
Grunts
Grunts analyzed here were first coded and the results were published in McCune et al. (1996). Those identified as communicative were directed toward the mother and might be accompanied by looks at her, offering an object, or other interpretable action. Examples include taking a toy from mother, patting her head, and holding a hand to her to be kissed after it was pinched in a toy.
The criteria for communicative grunt onset were two productions in a given month with continuing occurrence in subsequent months. The time window for considering a grunt and look to the mother as evidence for communicative function was temporal overlap between the vocalization and the look.
Gestures
Gestures were identified and categorized by Lennon (1984) who considered any bodily action with an identified communicative goal as ‘gesture’. Five of the nine children she studied have full vocal data available and, for that reason, form the sample for the current investigation. Her original goal was to identify all communicative gestures, including point, extend, reach, and typical referential gestures (e.g. arms up for ‘pick me up’ and open palms for ‘allgone’), as well as any other gestures the children might produce. Rather than defining additional acceptable forms prior to the study, these were allowed to emerge from the data. She analyzed both form and function of gestures separately, basing her work on Halliday’s (1975) study of his son’s development of functional vocal communication from pre-language to language. Only the gestures point and extend occurred sufficiently to warrant group analysis.
For the current study, we analyzed pointing and extending gestures that signified a wish for joint attention toward an object or event, and those primarily requesting either objects or action from the mother. Bates et al. (1975) described these gesture functions as proto-declarative and proto-imperative. Following Halliday (1975), we refer to them as interactive and request functions. We included more advanced interactional functions relating to imaginative play and dialogic interaction with the mother in the interactive category, as occurrence of these functions was too infrequent between 9 and 16 months to merit separate analysis. In essence, the ‘interactive’ category includes all gestural bids that did not place a demand for an object or action on the mother. Thus, the current study pools over the seven original categories, yielding the two categories: interactive and request. 2
Reliability for gesture
Two raters, the second author and another graduate student, independently scored the first 10 minutes of video for four children randomly selected from the nine who comprised the original sample at months 9, 14, and 18. Reliability was assessed (number of agreements/number of agreements plus disagreements) first for identification of gestures and then for assignment of those gestures seen by both raters to one of the seven functional categories analyzed in Lennon’s original study.
The intraclass correlation coefficient (ICC) for identifying gestures was 0.91, considered excellent agreement (Bartko, 2016). Percent agreement identifying gestures was 72% (83/115). The Kappa Cohen (1960) for agreement on assigning gestures to the same one of seven functional categories was .83. McHugh (2012) provides a mathematical rationale for considering a Kappa value between .80 and .90 as indicating strong agreement. Percent agreement for assignment to one of seven functions was 83% (69/83).
Results
The setting for this study included a large number of available toys and allowed the mothers and children freedom to move about, with broad opportunities for mutual interaction. A total of 1,183 communicative events comprising individual gestures, grunts, and words as well as events combining gesture with a vocal expression are included in the analysis. The results are organized in five sections: (1) Developmental trends, (2) Effect of communicative grunt onset, (3) Individual differences, (4) Forms and functions of gesture, and (5) A dynamic system of developments.
Developmental trends in gesture, communicative grunt, and word production
Children produced gestures, grunts, and words as single communicative bids as well as combining gesture with word, gesture with vocalization, 3 and gesture with grunt. We analyzed productions in these six categories of communicative events to evaluate frequency of use and developmental trends over time for each type of bid. We used Friedman tests rather than analyses of variance (ANOVA) (Friedman, 1937), as the data do not meet ANOVA assumptions. Word alone, gesture with word, grunt alone, and gesture alone (marginally) increased significantly between 9 and 16 months of age, while gesture with grunt and gesture with vocalization did not increase significantly (Table 1). No significant results in month-to-month analysis were found with a Wilcoxon signed-rank test (Wilcoxon, 1945) with a Bonferroni-adjusted alpha level of .0018 (.05/28). Figure 1 displays the frequencies of occurrence of the six modes of communication over time. Because of the small sample size, we were unable to evaluate frequency of use of the six categories across the group as a whole.
Friedman test results for increases in frequency of six communicative event categories.
Significant findings.

Frequencies of Communicative Event Modalities from 9 to 16 Months of Age. All Modalities Increased Significantly Except for Gesture + Grunt and Gesture + Non-Word Vocalization.
Association of communicative grunt onset with total communicative events
In a previous report (McCune et al., 1996), onset of grunt communication was established as production of two communicative grunts in a given session with continuous production in subsequent sessions. McCune et al. found that children first showed referential word production and/or comprehension at the same time as, or following, communicative grunt onset. In the current study, with vocal and gestural communicative data available, it is possible to examine the effect of grunt onset as a possible punctuation point associated with an increase in development of communicative capacity across the five participants despite variation in modality.
To evaluate this question, we pooled communicative events from the three single element events (gesture, grunt, word) and the three forms of combined events (gesture plus word, grunt, or non-word vocalization) to obtain the number of total communicative events for each participant at each month. We then noted the onset month for communicative grunts and evaluated the longitudinal data to determine whether the month of grunt onset marked a punctuation point in the expected increase in communicative frequency occurring with development.
A two-level linear model 4 was built to predict the effect of the punctuation point (Communicative Grunt Onset) and Time on total communicative events. Participants are treated as a cluster and the effect of Time is random at the second level. The results of the level 1 model are shown below. The effect of the punctuation point, as shown in Table 2, is statistically significant (p = .039 < .05), which means the punctuation point (communicative grunt onset) is associated with the increase in total communicative events across children as a group. Inspection of Figure 2 shows the shift in frequency of communicative events following the onset of grunt communication for each participant, either in the month we record as onset or the following session. We will address the issue of the timing of these changes in the discussion.
Effect of punctuation point on communicative event frequency.
SE: standard error.
Significant findings.

Association of Communicative Grunt Onset and Total Communicative Event Frequency in Individual Children. The Punctuation Point for Each Participant Is Marked With an Asterisk.
Individual differences in use of gestural, vocal, and combined communicative bids
Individual data comparing the children’s development of the six formats of communicative events reveal a complex picture with extreme frequency and stylistic variation across participants including the communicative formats they favored (Figure 3(a)–(e)). Individual histograms are arranged in order of frequency of communication. Vihman and McCune (1994) identified child words as either context-limited or referentially flexible (referential words). Context-limited words do not generalize from their typical context of use, while referential words do (although children’s extension of the words may not match the range typical of adult usage). Words produced in the early months of the study were context-limited (e.g. animal sounds while pointing in a picture book; hi and bye as greetings.) Communicative grunts were first observed between 12 and 16 months, and for each child they emerged later than either context-limited words or gestures.

Frequencies of Forms of Communication for Each Participant Across Time: (a) Alice, (b) Aurie, (c) Rick, (d) Nenni, and (e) Danny.
Individual differences are evident in both volubility and frequency of various vocal forms. While grunt communication is more frequent for Alice than the other participants, this appears to be due to her overall high volubility. When grunt communication is considered as a proportion of total communicative events, Alice, Aurie, and Rick are fairly similar, while Danny’s proportion of grunts was higher and Nenni’s was much lower. Across all months, proportion of total communicative events that included a grunt were as follows: Alice = .183; Aurie = .183; Rick = .177; Nenni = .10; Danny = .283.
Alice accompanied approximately 75% of her gestures with vocal signals. She exhibited words alone, gestures alone, and gestures accompanied by vocalization from the first month of observation, but rarely combined gestures with words until 14 months of age, 1 month following her grunt punctuation point. Words alone or with gestures became her modality of choice. Grunts were typically produced without a gesture.
Aurie produced a few gestures in early sessions, sometimes accompanied by vocalization, beginning words only at 13 months, her grunt punctuation point. She accompanied approximately 75% of her gestures with vocal signals across the study period: 50% of her communicative grunts accompanied a gesture. Her increase in communication following communicative grunt onset included all categories.
Rick produced few gestures before 13 months, when only one of nine (11%) was accompanied by a vocalization. At 14 months, his communicative grunt onset month, he produced 22 gestures of which 19 (86%) were accompanied by a non-word vocalization (17) or a communicative grunt (2), but no words. He produced fewer gestures in the final months (7 at 15 months and 11 at 16 months), of which approximately 50% were accompanied by vocal signals, as words became more prominent.
Nenni is remarkable both for relatively low productivity compared to the other children, and an extreme reliance on gesture. Vocalization accompanied gestures approximately 25% of the time until 14 months (her communicative grunt month) where the percentage rose to 50%, continuing at that level in months 15 and 16. She produced very few words by 16 months, and emphasized silent communication by gesture. Increases following the punctuation point are attributable to gesture alone or with vocalization, and grunt alone.
Danny communicated infrequently (0–7 gestures per month, mean = 3; and no words) until the final session at 16 months. This was his communicative grunt session where he produced 25 gestures, 19 of them (76%) accompanied by a word or vocalization, seven words alone, and 21 communicative grunts, none accompanied by gestures.
Form and function of gestures
To evaluate changes in frequency of pointing and extending gestures across time we conducted two Friedman tests. The effect of time was not statistically significant for the pointing gesture (pointing:
We next evaluated the association of gesture form with gesture function across children as a group. The Cochran–Mantel–Haenszel test (Cochran, 1954) takes account of data from several individuals while examining the association of two variables. Proportion of interaction versus request in relation to point versus extend does differ significantly taking the five children’s performance into account (
Chi-square tabulation.
Consideration of form and function of gestures again reveals stark individual differences, including initial gesture observed: point first for Rick, Danny, and Nenni; extend for Aurie; the same session for Alice. Alice, Aurie, and Nenni produced both point and extend consistently across observation months. For Rick, point was early but sporadic throughout, and extend began only at 12 months, continuing in use through 16 months, while for Danny point was fairly continuous, and there was only one occurrence of extend prior to 16 months when he produced both gestures frequently. In summary, Alice, Aurie, Rick, and Nenni produced substantially more extend gestures than points while Danny favored pointing. With respect to function, interactive bids emerged earlier in development than requests for all five participants, and all favored interactive bids over requests, with percent request ranging from 7% to 23% (Figure 4). Rick and Nenni requested with less than 10% of their gestures, while Alice, Aurie, and Danny requested with approximately 20% of their gestures. These differences are apparent in Figure 4.

Forms and Functions. Extend Gestures (in Red) Were Favored Over Points (in Blue) for Four of Five Participants. Interactive Functions (Lighter Shades) Were Favored Over Requests (Darker Shade): (a) Alice, (b) Aurie, (c) Rick, (d) Nenni, and (e) Danny.
The participants also differed in the preferred form for the request function. Alice used both forms approximately equally for request. Aurie, Nenni, and Rick used extend gestures for all requests, and Danny, who produced few requests, used only point. Figure 4 is a histogram showing frequencies of forms and associated functions for each child across all months of the study.
A dynamic system of developmental variables
Grunt communication can be considered as an element within a dynamic system of variables that interact in children’s transition to referential language (McCune, 1992, 2008; McCune & Zlatev, 2015). Table 4 displays the age at which the pointing and extending gestures were first identified for each participant along with the month of achievement of other dynamic variables influencing the transition to reference. The current study shows that onset of communicative grunts was followed by a sharp increase in communication for all five participants when both gesture and vocal modalities are included. Considered as a dynamic system, the variables in Table 4 have roles to play in the transition to referential word production and comprehension. When the children began communicative grunts, they also produced representational play (single acts, and for most participants, symbolic combinations), indicating a capacity for mental representation perhaps not previously integrated with communicative goals. The children communicated with gestures of pointing and extending objects for at least 2 months prior to referential word onset, showing an established route to intersubjectivity (McCune & Zlatev, 2015). Prior research (e.g. McCune & Vihman, 2001; McGillion et al., 2017) has demonstrated the importance of VMS (productive consonant use) for lexical development.
Acquisition month for dynamic variables.
The variables are ordered by the most common sequences of development exhibited between them. Parentheses indicate that reference was achieved in comprehension only. Children did not all demonstrate the variables in the same order, so elements in the columns are not perfectly ordered.
Danny and Nenni failed to achieve the VMS milestone and did not produce referential words by the end of the study. This lack should limit their ability to produce words (McCune & Vihman, 2001; McGillion et al., 2017). Both showed lower communicative productivity and tended to favor gestural communication. Interestingly, in the month of achieving communicative grunts, each of these participants developed a stable vocalization (perhaps best termed a protoword) that they used to engage their mothers in interaction (for Danny, ada, and for Nenni, hah). This suggests that, although they had limited phonetic resources for learning the words of English, they recognized the relevance of a stable vocal pattern, and developed a consistent production that their resources would allow. In Figures 2 and 3(d), Nenni’s frequency of communicative events seems to fall off in month 16 after showing the expected spurt at month 15 when she first produced communicative grunts. If her hah vocalization is considered, her frequency of communication at 16 months equals that at 15 months.
Discussion
How does use of gestures and grunts prepare the way for word production and understanding? In developing a theoretical model, we find that casting these developments within a dynamic system of variables (Thelen, 1989; Van Geert, 2020) reveals the varied influence of each on the others and their resultant joint effect on the transition to referential word comprehension and production. Pointing is frequently noted as a contributor to language development (e.g. Colonnesi et al., 2010). The influence of other single variables (e.g. representational play, consonant production) have also been addressed and communicative grunt onset has been shown to predict the transition to reference (McCune et al., 1996). Our results point to the onset of grunt communication as a marker variable for significant effects on communicative development, regardless of modality. Communicative grunt onset occurred at 13 or 14 months for the three more voluble children; 14 and 16 months, respectively, for the other two, so grunt onset month varied, but was consistently associated with an increase in communication 1 month later for four participants, and in the same month for one. The influence of multiple underlying variables, needing to be integrated, in addition to the observed communicative grunt can account for the 1-month lag in some children. Given the 1 month spacing of observations, these developmental processes were likely initiated between sessions, so the 1-month marker for various transitions is far from exact.
McCune (1992, 2008, McCune & Zlatev, 2015) found referential word onset in participants studied longitudinally was associated with onset of grunt communication, suggesting a qualitative difference between the earlier and later months of observation. Communicative grunt production may be an important milestone in children’s communicative development, but it has been rarely studied. The use of communicative grunts does not itself influence frequency of communication or onset of referential word production or comprehension. Rather, use of communicative grunts indicates underlying cognitive and communicative developments.
Although order of development of dynamic variables varies, referential word production is found only following gesture, facility with consonant production (VMS), representational play, and grunt communication (Table 4). The theoretical interpretation of these findings is that gestures demonstrate the interest and capacity for communication, and perhaps the infant’s earliest intentional communication, although parents have been responding to cry and other signals from birth. Representational play shows expression of internal experience of representational meaning. VMS production demonstrates the vocal capacity to shape sound production in relation to words experienced from adults. Communicative grunts demonstrate a linkage between a potentially representational mental state (with meaning perhaps underspecified), communicative goals, and a productive signal. Referential word onset demonstrates communication of a meaning codified in the ambient language, although without full adult specification. Thus, the punctuation point in communicative production associated with communicative grunt onset relies on several underlying variables. Where phonetic skill (VMS) is lacking, the acceleration in communication is expressed in the gestural mode. Further study of these developments in a large set of children is needed to determine the validity of this analysis.
The communicative grunt vocalization is derived from earlier involuntary vocalizations marking effort and later non-communicative vocalization accompanying attention. Unlike gestural communication, where the object of interest is typically present in the environment and the motive seems primarily instrumental or social, in the case of communicative grunts, reference is to an internal state of meaning. This proposal presupposes that the child becomes aware of the grunt/mental state relationship accompanying effort and/or attention. Such awareness might increase recognition of the potential for communication using sound as well as eagerness to engage. In a sense, the children’s experience of their own grunts in the context of internal focus may prompt recognition of the relationship of internal meaning to vocal symbol, in this case, a ‘personal symbol’ with fluctuating meaning (Piaget, 1962; Werner & Kaplan, 1963) and thus encourage attention to consistent pairings of vocal symbols (words) with environmental events in adult talk. Once symbolic potential is recognized, the nature of development would suggest use of communication and the symbolic potential would rapidly increase, as our observations demonstrate. The participants increased their rate of communication relying on their own favored modalities.
We found individual differences in volubility across vocal modalities, in onset of extending gestures versus point, in the tendency to accompany gestures with a vocal signal, in favored gestural form in general, and in favored form for interactive bids versus requests. Some of this variability may be attributable to the setting (30-minute play session) in the sense that, with many toys available, requests would be less likely than sharing interaction, and extending toys may be favored over pointing. The setting did function well in promoting high-frequency interaction and allowing time for communication across the varied modalities. With respect to gesture function, Leavens (2012) argues that all communicative efforts are ‘instrumental’, citing Bates et al. (1975) to the effect that proto-declarative gestures solicit adult attention, while proto-imperative gestures seek specific assistance. One might also consider interactive bids as developmentally prior, because they attribute only the desire for joint attention, rather than a more specific effect such as assistance with a toy. Leavens points out that different experimental procedures used for eliciting declarative versus imperative pointing may affect results.
Regarding the importance of pointing for word onset and development, studies evaluating pointing in isolation, as reviewed in the introduction, typically find strong contemporaneous and predictive relationships with language. As Donnellan et al. (2020) argue, studying a variable in isolation may mask the role of underlying correlates. When considered as part of a dynamic system, gestures appear earlier than language milestones, but intervening developmental variables seem to play a contributing role as referential word production and/or comprehension was only observed subsequent to vocal productivity (VMS), representational cognition (play), and communicative grunt production.
Laryngeal vocalization was likely a factor in the evolution of communication, as contemporary non-human primates grunt to communicate. Going beyond primates and even the mammalian line, Bass et al. (2008) proposed that vocal circuitry underlying vocalization in all tetrapods emerged from ancestral structures linking the swim bladder in fish with the bird syrinx and the mammalian larynx. They mapped the neural circuitry for vocalization in larval fish and, using taxonomic analysis, found comparative embryonic neural developments across tetrapod taxa. The route from such historically common structures to structure and function in extant species is unknown. Bass et al. relate their findings to Darwin’s (1871) proposal that ‘purposeless sounds . . . if they proved in any way serviceable, might readily be modified and intensified by the preservation of properly adapted variation’ (Bass et al., 2008, p. 420). This same process may characterize the developmental step from effort grunts to communicative grunts across current primate species, including humans.
Non-human primate vocalization has historically been considered involuntary, expressing emotion rather than information, and relatively immutable. This view is currently undergoing revision. Schel et al. (2013) demonstrated experimentally that chimpanzee food grunts in the wild are directed to certain individuals rather than others. Slocombe and Zuberbühler (2005), noting that chimpanzee grunts to food exhibited vocal variants, demonstrated that animals in hearing range used this information in selecting a location to search for a favored food, suggesting the form of grunt has a referential quality. Yet the basis for initial grunt production in mammals is the physiology of respiration management. As with human infants, the vocalization must undergo a shift in function to serve intentional communicative use (Werner & Kaplan, 1963). McCune (1999), based on extant literature, described the sequence of development in grunt use from effort to communication in chimpanzee (Goodall, 1986; Plooij, 1984) and vervet infants (Seyfarth & Cheney, 1986) in the wild, and chimpanzee infants raised in human environments (e.g. Gardner et al., 1989; Hayes, 1951; Savage-Rumbaugh, 1979) within the same dynamic systems framework proposed here. Species differences in cognitive status differentiate the communicative capacity of chimpanzees and monkeys, but their sequence of grunt use and relation to other calls mirrors that of humans.
Strong cross-species functional and structural homologies suggest that the autonomic phase of grunt vocalization has been active in pre-human primates from the earliest period. Effort is inevitably accompanied by this vocalization on some occasions. Recent studies of infant chimpanzee vocal development dovetail with the human findings available so far. Infant chimpanzee grunts, unlike whimpers, occur across positive, negative, and neutral circumstances (Dezecache et al., 2019), where acoustic variation between categories was demonstrated. Laporte and Zuberbühler (2011) traced the changing circumstances of chimpanzee grunts from infancy to adulthood. They did not have acoustic data available but use in varied circumstances over time suggests the potential of shaping toward the multiple distinct grunt forms and functions in adult chimpanzees. Initial productions (0–6 months) accompanied effort or movement. Beginning at 3 months, grunts occurred in response to others’ presence and vocalizations, but vocalization in this context dropped to near zero at 2 years, becoming more frequent at about 10 years. Grunts in response to food began between 15 and 24 months but were infrequent until 10 years as well. The authors suggest reduction in frequency between 2 and 10 years is attributable to a learning period as the young chimpanzees developed appropriate call forms for varying circumstances. Variation in age of grunt production in comparison to human infants may be attributable to differences across species in motor development and cognitive potential. It may be that chimpanzee grunts in response to food (beginning at 15–24 months) provide the closest analogy to human communicative grunts.
Conclusion and limitations
Grunt communication is part of the repertoire of human infant communication. In addition to the small sample included in this study, there have been anecdotal reports for many decades, as described herein, as well as recent studies of gesture that included some attention to grunt communication. The interpretation of the data given here is, to a large extent, tentative until confirmed on a larger sample. Naturalistic data currently in hand in various laboratories might serve to test our findings. More frequent observation would clarify the roles of specific dynamic variables. Future studies should include acoustic analysis, to facilitate comparison with individual word production patterns, as well as attention to maternal responses to child vocalizations, both lacking in the current report. Findings thus far strongly suggest that pursuit of this line of research will prove fruitful.
Footnotes
Acknowledgements
We thank Yu Wang of Rutgers University for assistance with statistical analyses.
Author contributions
Ethical approval/Patient consent
The research was approved by the Rutgers University Institutional Review Board. Parents of participants completed an approved consent form on behalf of their children. The data are anonymized.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by Grant NSF 4-2 0205 BNS 83-19753 from the National Science and by Grant PHS4-2-22992 PHS HD 11731 from the National Institute of Child Health and Human Development.
