Abstract
Symbolic representation is a central facet of human development that enables people to depict experiences and communicate meaningful information with others. Participation in social interaction relies on graphical symbols, gestures, and symbolic artifacts to form relationships, acquire language, and represent the world. However, substantial theoretical differences between cognitive and social-constructivist accounts of the development of symbolic representation prevent a unified model from forming. Thus, the task of this work is to introduce a theoretical model for symbolic development in early childhood. Throughout this work, Wittgenstein’s picture theory of language is empirically grounded with nascent, cross-cultural research in iconicity and experimental semiotics in order to propose a picture theory of symbolic development in early childhood. Last, this work critically reformulates current debates regarding the representational facets of symbolic development and offers novel insights that bridge philosophy of language and cognitive anthropology.
Symbolic communication is a central facet of human development that enables people to depict concrete experiences and communicate semantically rich information with others. Participation in social interaction relies on graphical symbols, pictures, gestures, and other symbolic artifacts in order to form relationships, acquire language, and form meaningful models of the world. Symbolic communication is enacted through language (words, letters, grammar), pictures (drawings, photographs, film), gestures (pointing, waving), and other physical objects that stand for reality in a representational manner (maps, toys, models).
Contemporary accounts of symbolic development can be broken down into two schools of thought. First, the cognitive approach (e.g., Perner, Carlson & Zelazo, and Nelson; see Callaghan & Corbit, 2015; Carlson & Zelazo, 2008; Perner, 1991) proposes that symbolic representation stems from procedural phases in cognitive development and can only be considered symbolic or representational behavior when children are able to take an intentional metarepresentational stance. Second, the social-constructivist approach (e.g., Tomasello and Callaghan; see Callaghan, 2013; Callaghan & Corbit, 2015; Tomasello, 2001, 2009), proposes that representation is primarily a social behavior that is used to communicate meaningful content about reality in an intentional manner. It is taken to be a “process of constructing meaning through social action” and the adoption of an “inter-active mind” (Callaghan & Corbit, 2015, p. 253).
While both approaches to symbolic development are supported by empirical research (see Callaghan & Corbit, 2015, for discussion), foundational differences between these accounts prevent a unified theoretical model of the symbolic development in early childhood. Thus, this work offers a new theoretical account of symbolic development that attempts to bridge the differences of the two dominant contemporary approaches. By focusing on the development of symbolic expression in young children’s iconic production of drawings, I will support the twofold argument that (1) children’s use and production of symbols expressively depicts reality, regardless of whether a child takes a metarepresentational stance; in this sense, any symbolic act can be considered semantically rich and (2) that symbolic expression is an act of depiction that is shaped by the objects it communicates. This argument will be theoretically informed by a modified form of Wittgenstein’s (1921/2001) picture theory of language (PT) and supported by recent empirical research in symbolic cognition and iconicity.
This work departs from dominant contemporary approaches to symbolic development by arguing that children do not require metarepresentational awareness nor reflexive sociocognitive capacities in order to make meaningful, communicative depictions of reality. Where past work suggests that a child becomes symbolic once they have sociocognitively developed the capacity to reflect on their own representations, I propose the stronger claim that the use and production of pictorial symbols is already semantic in that they depict real states of affairs. Hence, the thesis underscoring this argument is that mature language and pictorial symbolic expression differ in form and sophistication, but not in semantic content nor communicative use. Particularly as recent empirical work provides cross-cultural evidence that linguistic utterances are at least partially shaped by the physical objects they represent, providing a salient theoretical framework is vital for progressing nascent debates in experimental semiotics, developmental psychology, cognitive anthropology, and pedagogical science (Long et al., 2021; Morin et al., 2022).
This work takes the following structure: first, I critically examine the dominant accounts of symbolic development; next, I discuss the resolute reading of Wittgenstein and offer a resolute account of the PT; and finally I elaborate a picture theory of symbolic development (PT-Sym) and ground my argumentation with recent empirical findings.
Dominant accounts of symbolic development
The following section will detail dominant contemporary approaches to symbolic development in early childhood. The general facets of both accounts will be outlined in order to exhibit the theoretical gap between them. This will include discussing cognitive and social-constructivist accounts of symbolic development. While there are other accounts of symbolic development, I take it to be the case that these two approaches have been central to the majority of experimental research investigating the phenomenon. Thus, the aim of this section is to critically clarify dominant paradigms related to the origins of symbolic representation in order to show how the PT-Sym stands as a viable alternative.
The theory–theory approach to symbolic development
In general, both contemporary accounts of symbolic representation fall in line with the theory–theory approach to cognitive development (Leslie, 2000). On this account, children are understood to acquire modular theories about the world that enable them to form representations of reality and allow them to take a metarepresentational perspective on their own representations. Knowledge, and the capacity for abstract representational knowledge in particular, is framed as a sequential set of theory-like updates that a child grasps throughout development (Callaghan, 2013; Leslie, 2000; Perner, 1988, 1991). Such accounts are grounded upon the supposition that cognitive development is associated with the acquisition of concepts as “packets of theory-like knowledge” (Leslie, 2000, p. 197). On this account, conceptual knowledge is framed as various modular theories about reality that children acquire in a procedural manner, depending on the stage and sophistication of their development. In effect, the ways in which children come to understand and represent the world are contingent upon stratified courses of development that enable them to update their ability to reason about reality.
Implicit to this account is the assumption that not all knowledge is rooted in sense data. For example, knowledge regarding abstract phenomena like belief, trust, and love is not necessarily based on perceptually available objects. Instead, such conceptual knowledge springs from children’s changing ability to reason about reality as a mind-dependent entity. In other words, theory-like knowledge is itself taken to be a theory of the world, wherein the acquisition of abstract knowledge depends on a child understanding that their knowledge is itself abstract.
A consequence of this account is that it erroneously splits representational knowledge into two forms. First, there are concepts that explicitly correlate to sense data (e.g., the concept DOG corresponds to dogs; the concept HOUSE corresponds to houses). Second, there are concepts that do not explicitly correspond to sense data (e.g., the concepts SPIRIT, LOVE, and TRUST) have no empirical correlates in the world, thus must be acquired as individual, mind-dependent theoretical modules (Callaghan & Corbit, 2015; Callaghan et al., 2012; Leslie, 2000). I refer to the former category as correlated concepts and the latter as noncorrelated concepts.
In the case of correlated concepts, the capacity to make symbolic representations is essentially an act of empirical tracing. Children have examples of correlated concepts that allow them to make representational depictions. Put simply, a child grasps sets of physical sense data, thereby simultaneously grasps the physical symbolic structure of it, independent of whether the child actually understands what the sense data is. If a child understands what the object DOG visually corresponds to, and if they are able to make a primary expression (drawing, gesturing, uttering, etc.), then they are capable of representing DOG-like objects.
In the case of noncorrelated concepts, the capacity to make representations is more opaque. If, according to this position, there is an absence of perceptual examples that constitute the structure of noncorrelated concepts, then how do children acquire the ability to represent them? On theory–theory accounts, children develop the social-cognitive ability to take a metarepresentational stance towards theoretical concepts, which are initially represented to the child as a theory. Put differently, the primary representation of noncorrelated concepts is itself theoretical. Hence, symbolic representations of noncorrelated concepts do not necessarily refer to anything but a cognitively acquired, socially conditioned, mind-dependent theory (Leslie, 2000; White et al., 2016).
This account falls short on two fronts: first, it does not account for representational similarities and differences between noncorrelated concepts across culture and age groups; second, it lacks detailed information regarding the source of representations of noncorrelated concepts by attributing representational acts to cognitive developmental milestones. These two shortcomings can be formulated as follows: (a) the similarities and differences between representations of concepts in early childhood across individuals and cultures remain unaccounted for and (b) the source for the representation of abstract concepts is missing.
The result of these dilemmas is that dominant theories are required to rely on posthoc explanations in order to address both points. Additionally, the symbolic representation of noncorrelated concepts is explained as a cognitive–reflective or social ability that children acquire after experience. Now, I will illustrate how both cognitive and social-constructivist approaches to the development of symbolic representation have attempted to address these shortcomings but remain insufficient on their own.
The cognitive account of symbolic development
The cognitive account of symbolic development assumes that children obtain representational capacities in a procedural manner throughout early childhood in accordance with their neuro-cognitive development. In this sense, it is primarily focused on investigating individual processes of early childhood and particular capabilities for language acquisition. Influential among cognitive accounts of symbolic representation is Perner’s multiple-models system of representational development (Callaghan & Corbit, 2015; Perner, 1991). In this model, there is a threefold process of representational development:
Children are born with an innate capacity for perceptual cognition and are able to form basic perceptual reflections of their environment (e.g., infant facial imitation, culturally specific prosodic intonation).
Children begin to form rudimentary symbolic representations (e.g., via pretense and play) around 1 to 1.5 years of age, but such representations are not yet taken to be symbolic.
Around 4 years of age, children become representational by acquiring the capacity to take a meta-representational stance.
Point 1 constitutes a single-model of perceptual representation, and points 2 and 3 constitute the acquisition of a multiple-model system of symbolic representation. Perner (1991) states, “Fairly early in life—if not at birth—babies entertain a single model of the world. Then, around 1 to 1½ years, they begin constructing multiple models” (p. 43). In the single-model system, infants are able to form mere perceptual representations that are nonsymbolic; such representations are better framed as mimetic capacities (e.g., facial imitations and a rudimentary understanding of object permanence). The single-model system develops in a cumulative manner, depending on the perceptual stimuli available in a child’s environment.
In the multiple-model system, children cognitively switch from using a single-model to a multiple-model of reality, and begin to develop the cognitive capacity to understand complex relationships between objects and internal states of themselves and others. This includes spatial–temporal cognition, a developed understanding of object permanence and location change, theory of mind, and eventually the capacity to take a metarepresentational stance (Perner, 1991). On Perner’s account, the multiple-model system replaces the single-model system; this replacement is attributed to the development of the cognitive complexity required to understand abstract representational relations between entities. This cognitive transition is primarily supported by dated experimental evidence from research exploring children’s ability to understand spatial–temporal change as well as their increased ability to use language in a representational manner. For Perner, the multiple-model system is the developmental foundation for how one comes to understand the mind of another through representing abstract representational relations later in development.
The cognitive account does not address the previously mentioned shortcoming (a); while there is an acknowledgment of sociocultural factors impacting language use and acquisition, this account prioritizes individual cognitive development as the primary source of children’s representational use of language and symbols. Thus, this account does not find it necessary to explain individual or cultural representational similarities and differences beyond chance or mimetic social learning. Accordingly, little weight is attached to shortcoming (b), as the source for representing noncorrelated concepts is framed as a cognitive achievement that occurs universally in child development.
Problems with the multiple-models account of symbolic development
The primary issues with Perner’s (1991) multiple-model account of representational development are evident in the majority of cognitive accounts of symbolic development. The throughline problem of these accounts is that representation takes on a contradictory character between the external world which is represented and a child’s cognitive ability to represent it. There is a distinct separation between the world as experienced through representation and the source of representation itself.
First, cognitive accounts rely on breaking symbolic representation from reality, in that representational acts are taken to be individual cognitive achievements. Central here is the notion that children cannot be representational until they become metarepresentational. However, this metarepresentational contingency only succeeds in splitting off the mental life of humans from the material reality of the world.
In spite of Perner’s attempt to look beyond individual experience apropos of representational metacognition, such accounts rely on an inconsistent interplay between cumulative first-hand experience and distinct cognitive milestones that are assumed to be innate abilities in humans. For example, children utilize first-hand sense data to form perceptual reflections of their surroundings (perspective-taking, introspection, social interaction) and such sense data form the basis of becoming representational in early childhood; on the other hand, children are supposedly unable to become representational through sense data alone. Only after children acquire multiple-models and the accompanying metarepresentational stance are they able to form symbolic representations (including basic linguistic representations) of what they perceive. The problem is that the source (sense data) and the act (symbolic representation) are severed without explanation or empirical justification; thus, the force behind developing representational capacities remains unaccounted for.
This amplifies the contradictory character of cognitive accounts because the source for symbolic representation itself—if it is truly cognitive and thus broken from sense data—has no concrete source beyond individual development. But this cannot be the case, which is the reason that Perner (1991) proposes a multiple-model system in the first place. Yet, it is supposedly this same multiple-model system that enables representation to become “freed from reality” (p. 45). So, there is an unexplained gap on this account between that which becomes represented and the act of representing.
It results from this that the cognitive account of representational development does not account for the previously mentioned shortcoming (b) insofar as it attributes higher level representations of noncorrelated concepts to something internally evoked within an individual child. But surely, noncorrelated concepts have an empirical or perceptual source. Yet the multiple-model systems approach overlooks the object-status of social interaction; put differently, it mistakenly attributes socially acquired notions like happiness and love to cognitive–intellectual development, and thereby removes the objectively perceptual nature of such concepts. For even though happiness is not an object per se, it is nevertheless perceptible as such and is therefore observable by children. In its most basic form, a noncorrelated concept is a perceptually available object, even if there is no physical entity attached to it. As such, it is demonstrative and can be evoked through primary expression (e.g., crying from pain). Thus, noncorrelated concepts can be expressively depicted because they are perceptual. However, the only thing that cognitive accounts of symbolic development attribute to increased representational capacity of noncorrelated concepts is a series of “updates.” To reiterate, such accounts do not illustrate the source of information that the update utilizes. This establishes an impasse in that mere updates cannot explain representational development without explaining where the represented thing comes from. This contradiction can be framed as follows: if I have an empty bucket and fill it with water, the bucket does not simply acquire an update; it is in fact filled with water and undergoes a qualitative change based on external factors interacting with the internal factors of the bucket (i.e., its dimensions). Similarly, if a child becomes able to symbolically represent a past instance, they have necessarily experienced the past that they represent.
The contradictory quality of Perner’s (1991) account becomes exacerbated once the affinity between mind, language (linguistic representation), and pictures (pictorial representation) is negated. Of note, such negation similarly forms the basis of Carlson and Zelazo’s (2008) levels of consciousness model. Perner (1991) states (and Carlson & Zelazo follow) that “language and pictures cannot be used like perception as an uncritical guide for updat[ing] one’s model of reality” (p. 70); but he also states that language can be used to “inform about reality” and to “describe a possibility” (p. 70). The problem here is that Perner denies the fact that linguistic and symbolic representations are always necessarily grounded on real objects. It is perhaps telling that he only uses anecdotal thought experiments from a family member who is “incorrect or misinforming” about a state of affairs to justify this negation. Nevertheless, Perner makes the strong, unsupported claim that, because language and pictures can be misinforming, one requires a multiple-model system to understand if claims are truly representational. Again, the basis of linguistic and symbolic representation is missing here.
It is not that the misrepresentation of situations renders sense and referent as inherently different. Instead, a misrepresentation can be framed as a false proposition about (or depiction of) reality. In any case, it is real and always depicts reality because it correlates to real states of affairs. There can be no pure distinction between sense and referent, especially on the semantic plane. There is no false belief at play here: there can only be incorrect propositions. Instead, what I will argue for below is to frame symbolic communication as being a two-way act, wherein a child is considered to make meaningful expressive depictions that become propositional through being taken up in a given language game.
More, cognitive accounts of representational development try to demonstrate how multiple-models or levels of cognition are necessary by citing people’s ability to symbolically depict nonreal situations as hypotheticals. However, such accounts again overlook the fact that the actual source of representational acts can only be found in perceptually available entities or situations. In other words, people are supposed to be able to create linguistic and symbolic representations of nonreal events purely by thinking them, that is, out of thin air. By overlooking the real perceptual foundation of all symbolic expression, these accounts fail to account for both shortcomings (a) and (b) of the theory–theory approach to representational development.
Perner (1991) further states, “Sensory information is about reality. It informs the perceived how the real world is. Unlike language and pictures, which depict objects in past or hypothetical situations, perception specifies reality here and now” (p. 92). But this claim misses the fact that language and pictures are also used to depict present reality. One only needs to augment their definition of perception to understand this. For example, a person experiencing blindness makes use of symbolic representations (e.g., via Braille, sight-assisting walking sticks, etc.) in order to perceive the present. Such symbolic artifacts are immediately informative of a real, present situation. Similarly, I can describe a present situation to someone on the telephone. While my description is technically in the past, it is only past insofar as perception itself occurs through microsecond delays. Perner here egregiously creates conditions for perception that are neither empirically justified nor objectively true.
Instead of emphasizing the real from the nonreal, a more rounded account of symbolic development would prioritize truth judgments that stem from depicting acts. In representing, we assert things being the case, despite one’s ability to be metarepresentational (see Kimhi, 2018; Sellars, 1968). In other words, there can be no prerepresentationality, only a refined ability to understand the flexibility of depicting reality based on the structure of a given language game.
The usage-based model of symbolic development
The usage-based model of language acquisition is a social-constructivist approach to symbolic development. It states that meaningful language develops through an individual’s social interactions once they have gained certain cognitive skills (e.g., analogical reasoning, subject–object differentiation, predictive capacities; Callaghan & Corbit, 2015; Homer & Hayward, 2008; Tomasello, 2001, 2007, 2009). Tomasello, building on the philosophy of Pierce and de Saussure, argues that the semantic structure of language emerges through an ongoing process of shared, intentional, communicative acts between individuals. Similar to the cognitive approach, symbolic representation is framed as a social–cognitive skill obtained from the ability to reason about the intentional states of others, and is grounded on a socially learned, patterned structure of grammar. On this account, there are three primary developmental processes that occur in order for a child to become symbolically representational, as Callaghan and Corbit (2015) state:
Children apply intention-reading proclivities to the linguistic interactions they have with others, attempting to understand the communicative significance of an utterance.
They also muster a variety of cognitive skills . . . that serve to enable abstraction of the regular and more irregular rules from the use of language.
Children’s acquisition of language is facilitated by a supportive social context. (p. 256)
These processes have been shown in experimental contexts to obtain across cultures. This has enabled social-constructivist approaches to make the strong claim that the semantic status that a language is grounded upon stems from a shared context with others, and that a linguistic expression (symbolic or otherwise) is in itself meaningless (Tomasello, 2007, 2009).
With regard to preliterate linguistic development, the social-constructivist account emphasizes young children’s use of gestures to form joint-attentional frameworks with “knowledgeable others” in order to communicate when mature language is otherwise unavailable. In this sense, even elementary symbolic representations are contingent on joint attention in order to be semantic. For example, if a preverbal child were to point at a glass of water, a knowledgeable caregiver may take that to mean that the child is symbolically representing (in a gestural way) their experience of thirst and desire for satisfying it. The act of gesturing then can only be meaningful insofar as there is another person ready and willing to form a joint-attentional framework with the gesturing child. The usage-based model takes a similar stance on children’s drawings, preverbal utterances and sounds, and the use of objects to communicate.
In sum, symbolic representations of prelinguistic children are only considered to be half-semantic: without the more knowledgeable others engaging in a joint-attentional framework, the semantic status of a child’s symbolic expressions remains meaningless. Once a knowledgeable other recognizes what a child attempts to communicate, the loop can close and the semantic structure can obtain. Prelinguistic utterances and gestures (referred to here as primary expressions) are taken to be composite structures of language that become refined and developed through both cognitive development and socialization. Hence, for the usage-based model of language acquisition, meaning is constructed exclusively through an individual’s active use of language with others.
By prioritizing a social contingency for symbolic development, the usage-based model successfully addresses the previously mentioned shortcoming (a) from the theory–theory account above. For if meaningful linguistic representation is a culturally contingent aspect of child development, then the similarities and differences between representations of noncorrelated concepts in early childhood across individuals and cultures can be easily explained by social learning (imitation, mimicry, emulation) and enculturation (cognitively developing in accordance with the convention of one’s culture). This fits squarely within the PT-Sym account, discussed below. However, the usage-based model has additional shortcomings that must be addressed.
Problems with the usage-based model of symbolic development
While language is unquestionably a socially contingent aspect of human life, I argue that making judgments about the meaningfulness of a child’s primary depictions is problematic. The main shortcoming of the social-constructivist account is the direct inverse of the cognitive account: by prioritizing social interaction as the foundation of semantic richness in symbolic representation, it overlooks the importance of physical sense data to construct meaning. In other words, the usage-based model, in prioritizing the social structure of language above all else, foregoes the immediacy of that which is represented. Or, in Sellars’ (1968) terms, it overlooks the fact that the symbolic expression is at the very least analogous with what is expressed and thus fails to recognize that the kernel of its meaning is already present in the expression.
Using the example above, for a child to gesture toward a glass of water in order to communicate thirst to a more knowledgeable other, the child must first understand that their pointing expresses both thirst and the real object that would satisfy it. The gesture is itself already meaningful: it demonstrates that the child comprehends something semantically essential about both the glass and their gesturing toward it. Whether the more knowledgeable other recognizes or is willing to form a joint-attentional framework with the child is therefore beside the point. While it is true that the joint-attentional framework would provide the meaningful satisfaction of thirst, thereby facilitating similar future uses of communicative gesturing, to state that the gesture is in itself meaningless cannot be correct.
The gestural symbolic expressions demonstrate and depict a meaningful understanding of spatial directionality. The child must already comprehend that the glass of water is located there, which is, in my view, the foundation of its communicative form. It is this directionality, in conjunction with the experience of thirst and the awareness that the caregiver can assist the child, that becomes modeled in the child’s gesture. Accordingly, to model or represent directionality, thirst, and the satisfaction of thirst, is only contingent upon spatial knowledge, proprioception, and the self-awareness of needs and is not fundamentally dependent on creating joint-attentional frameworks with others. To take the example to its extreme point, if the child were alone and experiencing severe dehydration, whether or not another person was around to acknowledge their gesture in a communicative framework, one could not possibly discount the semantic richness of the gestural motions that the child might make as an attempt to satisfy their thirst. Put simply, the principal weakness of the usage-based model is that it reasons from the assumption that there can be no meaning without social interaction.
In requiring joint-attentionality for meaningful language acquisition, social-constructivist accounts bypass the foundation of communicative action; namely, that a child must already possess the capacity to depict objects and inner episodes as pictures, gestures, and/or sounds in order to form a joint-attentional framework. While I agree with Tomasello (2007, 2009) that joint-attention facilitates language acquisition and sophisticated world representation, I disagree with the primacy that his account gives it. Instead, I will argue that the primacy of the communicative act is that it can already depict reality independently of joint-attention. As the usage-based model takes it to be the case that language structure is grounded in language use, I will argue that language structure emerges from depicting real, physical objects, and later becomes refined through the linguistic structuring of social relationships, according to a given language game.
Another shortcoming of the usage-based model is that it does not account for the possibility of representing noncorrelated concepts (e.g., the gesturing done to represent thirst and the satisfaction of thirst) without social interaction. Again, for the usage-based model, there could be no meaning to a representation without joint attention. But how does this work for representing, for example, inner episodes? Tomasello does not clarify this point. So, while the usage-based model offers a robust explanation for pragmatic linguistic development, it remains silent on the symbolic representation of noncorrelated concepts. This creates an impasse for such an account, as it is evident that children use drawings, models, sounds, and gestures to represent emotions, perspectives, judgments, and even concepts about the mental states of others (Coates & Coates, 2006; Long et al., 2021; Longobardi et al., 2015). The dilemma for the usage-based account stems from its reliance on a dichotomous perspective of meaning: either symbolic representations are social and thus meaningful or are simply meaningless behaviors that represent nothing.
The solution to this impasse can be found in removing the social contingency for semantic representation by acknowledging that there is something meaningful and communicative, no matter how underdeveloped or rudimentary it may be, that is inherent to symbolic expression. There would need to be the concession that meaning is built into the structure of representation in order for that representation to become shared with another in an intentional, joint-attentional circumstance. Said differently, the meaning is in the object or concept depicted and becomes communicated through social interactions.
The final weakness of social-constructivist accounts (in general) is that their empirical work focuses on individuals and individual states of mind. So, if meaningful language acquisition is supposed to emerge exclusively through intentional social interactions throughout development but the research making such a claim does not measure real social interactions, then this claim lacks definitive explanations for the development of symbolic representation in early childhood. Thus, social-constructivist theorists have no basis to even claim an “essentially social nature of symbolic representation” because these approaches “do not examine the social activity itself” (Callaghan & Corbit, 2015, p. 258).
Thus, the usage-based model cannot successfully address the previously mentioned shortcoming (b) of the theory–theory account. There is no account of where theoretical concepts originate within culture, and therefore any attribution of only culture to noncorrelated concepts is erroneous. In other words, on the usage-based model, children are supposed to acquire semantic representational ability through social interaction; however, apropos of noncorrelated concepts, there must be an original source for theory, independent of joint-attention. This creates a contradiction for the model: on the one hand, a child is supposed to acquire the ability to meaningfully represent the world through their interaction with others; on the other, there is no explanation for how such meaning originates within culture.
So, the usage-based model offers a partial resolution to the paradox that symbolic expressions must be taken up by others to be communicative, but cannot serve as the ground for all meaningful linguistic representation. Put differently, the usage-based model eschews the real object or concept that is depicted and moves directly to its social use. By neglecting the semantic richness of the expression in its first instance (i.e., without joint attention), the usage-based model mistakenly abandons the capacity for representations to be socially used. I suggest that the evidence supporting the usage-based model should be framed as the experimental demonstration of Wittgensteinian language games: that children use symbolic expressions of real objects in order to develop socially meaningful communication with others.
The picture theory of language
I suggest that the shortcomings in the dominant accounts of symbolic development can be ameliorated by integrating Wittgenstein’s (1921/2001) picture theory of language (PT), put forth in the Tractatus Logico-Philosophicus (TLP). This account of the PT will integrate critical refinements that Wittgenstein (1953/2002) offers in the Philosophical Investigations (PI). This work assumes the resolute reading (elaborated below) of Wittgenstein: that there is continuity and resolution—rather than fissures—in the development of Wittgenstein’s thought and method (Bronzo, 2012; Conant, 2012; Kuusela, 2012).
The PT suggests that language stands for states of affairs as a model or picture of reality. Wittgenstein proposed that it is through the modeling function of logical grammar that atomic facts—the basic component facts about reality—can become represented. While preparing studies for the TLP, Wittgenstein (1921/2001) read about a Paris courtroom wherein a traffic accident was recreated with models in order to depict the events that preceded it (Anscombe, 1959). Serving as a proposition for a real state of affairs, the models symbolically represented a situation that allowed the court to have a clearer sense of the affair. This was a pivotal point for the PT in that it showed how language and its symbolic counterparts can depict real states of affairs (the traffic accident) as well as atomic facts (the particular model cars and streets involved in the accident, arranged in such-and-such a way). Wittgenstein (1914/1979) stated,
In the proposition a world is as it were put together experimentally. (As when in the law-court in Paris a motor-car accident is represented by means of dolls, etc.) We can say straight away: Instead of: this proposition has such and such a sense: this proposition represents such and such a situation. It portrays it logically. Only in this way can the proposition be true or false: It can only agree or disagree with reality by being a picture of a situation. (pp. 7–8, 29 Sept. 1914 & 2 Oct. 1914/1979)
The Paris courtroom led Wittgenstein to consider how propositions depict possible states of affairs, thereby providing the analogical groundwork for the PT (Anscombe, 1959; Morris, 2008; Wittgenstein, 1914/1979).
The PT argues that language about reality is isomorphic with it: the structure of a state of affairs becomes modeled by language and the atomic facts underlying linguistic depictions of the world can be depicted by logical notation; that is, in order for language to represent reality, there must be something in common between linguistic representation and real, represented objects (Magee, 1987/2009; Sellars, 1968; Wittgenstein, 1921/2001). The propositional function of linguistic representation is of central importance: in the TLP, the way of getting at the nature of truth is to build models that show whether what becomes depicted corresponds to real states of affairs. Wittgenstein (1921/2001):
TLP.3.1: In a proposition a thought finds an expression that can be perceived by the senses. (p. 13) TLP.4.01: The proposition is a picture of reality. The proposition is a model of the reality as we think it is. (p. 23)
That is, if one wants to make a truth-claim about reality, one must be able to depict it. In doing so, the proposition becomes logically verifiable, thus permitting a truth-judgment to be made. For example, if the car accident depicted in the Paris courtroom did not agree with witness accounts or the location of the damage, then the real state of affairs would suggest that the picture was false. But if the situation had never been pictured as such, then verifying it would be untenable. The depiction of a state of affairs allows for the coherence of real components to be judged for veracity.
Consequences of the PT are that language is object-dependent, that its depicting function propositionally pictures things or situations, and that language is only able to propose possible models of reality rather than take part in shaping it (Morris, 2008). Linguistic representation alone cannot shape the logical basis for verifiable truth-judgments: it can only provide the tools through which the truth or falsity of a situation can be judged. For example, statements like “this cube is red” or “it is raining in London” do not make the external situations they depict true or false; instead, such statements are models of states of affairs that can be logically verified or rejected.
In the PI, however, Wittgenstein (1953/2002) posits that the meaningfulness of language occurs in its use within a language-game. A shallow reading of language games suggests that linguistic expressions are empty utterances and that all expressive language is learned. On closer inspection, this is evidently not what Wittgenstein has in mind. Instead, the language game determines the use of the expression, thereby facilitating its shared, joint-attentional sophistication. One who expresses learns how to express in such-and-such a way, but the act of expression is a priori. In other words, learning the use of an expression is to practice a set of behaviors that obey a rule for the (veridical and/or semantical) function of the expression. This is explicit throughout the PI (Wittgenstein, 1953/2002):
PI.180. This is how these words are used. It would be quite misleading, in this last case, for instance, to call the words a “description of a mental state”. —One might rather call them a “signal”; and we judge whether it was rightly employed by what he goes on to do. (p. 62e) PI.244. Here is one possibility: words are connected with the primitive, the natural, expressions of the sensation and used in their place. A child has hurt himself and he cries; and then adults talk to him and teach him exclamations, and, later, sentences. They teach the child new pain-behaviour. (p. 75e)
That is, the veracity of an expression, whether primary (crying, gesturing, etc.) or sophisticated (propositional statements), is not necessarily determined by its accuracy as a model, but by the ways in which it become taken up in everyday language. Morris (2008) states:
Suppose that someone is able to think about the world. On the account of thought which is presented in the Tractatus, this means that [they are] able to form for [themselves] pictures or models of the world. Someone could be in this position without yet having the ability to speak a particular language . . . or to use a certain notation. But [their] ability to form pictures or models for [themselves] can itself be counted as knowledge of a kind of language . . . Our subject’s position is, then, that [they] can operate with a kind of language—which uses what we may call the symbols of thought—while not yet being able to use any familiar, everyday language or notation. (pp. 156–157)
This does not, however, suggest that language no longer depicts. The linguistic expression itself still corresponds to a real object (both correlated and noncorrelated) and evokes a particular sense of it. Whether or not the expression is sensible in this context can only be determined by its use in this context. Wittgenstein functionally expands the PT to account for the fact that people produce inaccurate models of reality and to account for the social nature of meaningful propositional statements. Systematic correspondence to reality still obtains. Sellars (1968) states:
1. IV. 47. . . .the ability to teach a child the colour-shape language game seems to imply the existence of cues which systematically correspond . . . to the colour and shape attribute families, and are also causally connected with combinations of variously coloured and shaped objects in various circumstances of perception. (p. 23)
That is, by engaging in language games with others, propositional statements take the place of primary expressions. So, while a child’s developmentally early symbolic expression may not be propositional, it is still meaningfully expressive: it still depicts something. It is not that children always represent reality in a meaningful way but instead always meaningfully express reality in such a way that can be taken up, taught, and recapitulated through social interaction. The result is that the symbolic act (reaching for a glass of water) is not necessarily representational (of thirst), but is still expressive. What is symbolically expressed is real.
To use the terms of the TLP (Wittgenstein, 1921/2001), what can be shown but not said is a primary expression: it depicts a possible state of affairs but cannot in itself be given a truth-value. However, because it expressively shows, the depiction still has a particular, perhaps ineffable sense, irrespective of the ways it gets taken up in a language game. I suggest that this account demonstrates the relevance of the PT (Wittgenstein, 1953/2002) for symbolic development: precisely because primary expressions can only express, the need for communication with others requires us to form rules for sharing expressions in a semantically rich manner. These rules are taught, learned, and recapitulated. This demonstrates how the propositional statement (I am thirsty) comes to replace the primary expression which depicts (thirst). Hence the interwoven quality of proposition and primary expression. It is just because we cannot identify the veracity of primary expressions (what cannot be said) that we come to use rule-based language games to sensibly describe them in propositional terms. This is an explicitly continuous line of thought throughout Wittgenstein’s writings:
TLP.5.4733: Frege says: Every legitimately constructed proposition must have a sense; and I say: Every possible proposition is legitimately constructed, and if it has no sense this can only be because we have given no meaning to some of its constituent parts. (Wittgenstein, 1921/2001, p. 57) PI.261. the use of a word stands in need of a justification which everybody understands. (Wittgenstein, 1953/2002, p. 79e) PI.337.–338. An intention is embedded in its situation, in human customs and institutions . . . After all, one can only say something if one has learned to talk. Therefore, in order to want to say something one must also have mastered a language; and yet it is clear that one can want to speak without speaking. Just as one can want to dance without dancing. And when we think about this, we grasp at the image of dancing, speaking, etc. (Wittgenstein, 1953/2002, p. 92e )
This resolute account of the PT (Wittgenstein, 1921/2001) serves as a model for the development from (primary) symbolic expressions to symbolic propositions (and sophisticated language use). It requires neither metarepresentational capacities nor does it immediately require joint-attention. It only requires, in a fundamental way, something that can be expressed. Joint-attention then completes the expressive act by providing contextual understanding and social use for the expression.
To summarize, the picturing of the world as the first instance of language development is what enables someone to enter into the world of language games. It demonstrates that a person understands possible configurations of states of affairs without having necessarily obtained knowledge of the entire configuration of the picture it represents (Anscombe, 1959; Wittgenstein, 1921/2001). That is, in using a world picture, one understands the components of a picture; whether or not the picture is sensible depends on the language games one plays. So, depicting acts show the possibility of playing a language game. Morris (2008) states, “all one needs in order to understand the meaningfulness of a sentence is to know which items in reality its elements are correlated with” (p. 149). It is the particular language game that one engages in which makes communicating world pictures possible.
The picture theory of symbolic development (PT-Sym)
I propose that symbolic development begins with children’s ability to model the world, regardless of metarepresentational capacity, communicative intention, or joint-attentional framework. On this account, all representational acts—linguistic, pictorial, gestural, and so forth—are models of reality that do not require joint-attention or the ability to reflect on being representational in order to correspond to the world. In this sense, children’s primary expressions (gestures, vocalizations, drawings) precede the meaningful exchange of expressive contents with others and serve as the foundation for participation in a given language game. Framed in this way, representational acts can be understood to have a direct isomorphism with the world and that it is a result of such isomorphism that more sophisticated language users guide children to learn the rules of meaningful communication. This account argues that humans use sounds, symbols, and gestures to arrange proto-linguistic models of reality and, in doing so, form symbolic tracings of it. Following Wittgenstein, the PT-Sym argues that the world and its semantic structure in language are mind-independent and shape the depictions one constructs of it.
It is necessary to loosen the social and cognitive constraints placed on the development of symbolic representation by dominant theories so that young children’s primary expressions become prioritized as foundational achievements in modeling reality. I suggest that children do not require sets of cognitive updates or repeated exposure to joint-attentional frameworks to be able to form representationally rich expressions of reality. That is, while it is necessary for children to learn the rules of their language game to become successful language users, the meaningfulness of symbolic expression is grounded in the expression itself, rather than in its joint-attentional framework or metarepresentational awareness. Put differently, the primary expressions of young children are always meaningful in their correspondence to the world; only the shared use of an expression determines its role in a language game, which informs the course of ongoing language development. So, this account works from a simplified framework both with regard to symbolic development and the semantic structure of symbolic representation itself. It will be shown that this simplification is empirically justified and is necessary for understanding symbolic development.
Children’s drawings as symbolic models
Children typically begin to regularly make drawings around 3 years of age, before they learn formal written alphabets but after they have a rudimentary verbal command of language (Coates & Coates, 2006). During this time, the majority of children’s drawings consist of exaggerated features, nonsensical shapes or objects, and other visual characteristics that appear to differ greatly from real objects in the world (Coates & Coates, 2006; Longobardi et al., 2015). The obscure content of young children’s drawings is a primary reason for how dominant accounts of symbolic representation support the claim that children cannot be symbolic or representational without a meta-awareness of being so (Callaghan & Corbit, 2015; Homer & Hayward, 2008). Accordingly, such accounts use the refinement and increased accuracy of visual features in older children’s drawings to explain the transition from prerepresentationality to metarepresentationality. This process happens to occur with the development of (written and verbal) language skills. Hence, language is commonly thought to enhance children’s representational drawing ability.
Central to these accounts is a formal distinction between language and drawings, where language is thought to help children make more accurate drawings of the world. I argue that such a distinction is unjustified and prevents a dialectical understanding of different forms of symbolic representation in early childhood. In line with empirical work, I suggest that drawings and formal language are different functions of an identical process of symbolic development, namely, the process of modeling reality.
Past research shows that preliterate children’s drawings are linguistically structured and facilitate the development of verbal description. For example, Coates and Coates (2006) demonstrate that children as early as 3 years of age use spoken narrative while drawing to meaningfully explain what it is they depict, even though the final drawing has minimal resemblance with the narrative that was explained. Similarly, Wright (2007) found that preschool aged children engage in high-level representational depictions of internal states, real-world occurrences, and physical objects, but that the symbolism of such depictions is found not in the drawings but in the overall act of depiction (spoken analogical narrative, visual observation, identity projection onto the drawing, gesturing while drawing, and using textual features to make meaning). Such research suggests that it is not the final product of a child’s drawing that is meaningful; instead, it is the act of depiction that is. In other words, a preliterate child’s drawing may appear to be nonsensical or noncorrelated with reality, but by attending to the child’s act of depiction—the spoken, embodied, and textual facets that occur during the production process—the meaningful and linguistic structure of a drawing becomes observable. This research supports the PT-Sym argument by demonstrating isomorphism between a preliterate child’s drawing and the depicted world. I argue that the meaningfulness (i.e., correspondence to reality) of such depictions is found in the ensemble of procedural acts that a child produces rather than the final, static drawing. Whether or not the drawing can symbolically enter into a language game occurs after the fact.
If meaning is first located in the depicting act (i.e., is legitimately constructed), then the unexplained rapid development of children’s drawings can be understood as refinement shaped by its possible role in a language game. The sense of the symbolic expression is determined by its being taken up by others and has nothing to do with the possible legitimacy of the expression itself. For example, Coates and Coates (2006) describe an example where a 3-year-old was prompted to draw the researcher who was in the same room; the child’s final drawing (Figure 8 in the original study, p. 237) was a chaotic assemblage of scribbles that bore no resemblance to the person she was prompted to depict. However, in the act of depiction, the child made repeated verbal propositions about the visual characteristics of the researcher: “That’s you, holding a balloon . . . a big head. . . that’s yours” (p. 231).
So, while the drawing in itself had no sense, it is clear that it was explicitly a corresponding model of the child’s reality. Without the drawing, it would not be possible to determine whether or not its sense was legitimate, and without the context of the drawing, it would not be possible to determine its sense. This is how primary expressions precede language games: one can quite naturally imagine agreeing (asserting) with the child’s narration of the drawing while simultaneously disagreeing (negating) with its form. From continued exposure to this type of situation, the primary symbolic expression enters into the realm of propositions (that x is the case) and will be subsequently shaped by the rules that determine propositions.
Wright (2007) describes numerous examples of children making propositions about the world through their acts of depiction. For example, children use narrative and gestures to enact different characters or objects they depict (e.g., barking while drawing a dog, making a grasping gesture while drawing a hand). In doing so, they model the depicted objects: hand-like objects grasp and dog-like objects bark. Wright states that, while drawing, children were, “actively within the experience” and that the researcher was only able to understand the representational facets of the drawing by reading the entire experience of its production (p. 42). Longobardi et al. (2015) demonstrated that children as early as 2 years of age use scribbles as onomatopoetic extensions of pointing and grasping gestures; from their findings, they suggest that very young children’s drawings are expressive of real-life processes. This differs greatly from past empirical accounts which state that the scribbles of late infancy are merely the result of nonlinguistic, unrefined motor activity. Taken together, I argue that the acts that accompany children’s primary expressions show the meaningful, legitimate construction of symbolic depictions of reality. That is, young children produce models of reality without needing to acquire metarepresentational capacity or rely on joint-attentional frameworks. Here marks the radical break of the PT-Sym from other accounts of symbolic development: the particular language game which a child learns builds from the primary expression.
Iconicity and symbolic representation
So far, I have focused on young children’s drawings to emphasize the expressive isomorphism of symbolic representation that facilitates primary expressions being taken up in a language game. This has only focused on one half of a two-part structure of the isomorphism between representation and reality. This relation can be expanded to argue that the material structure of physical objects in the world and relations between physical objects greatly impact how people come to make symbolic representations; that is, the contents of a language game itself are partially determined by the objects expressed within it between language users.
Recent large-scale, cross-cultural, and transdisciplinary research exploring iconicity provides robust evidence for the objective and bilateral interplay between objects and their representation. Iconicity is a burgeoning topic in cognitive science and experimental semiotics that seeks to understand if and to what extent objects in the world correspond to the iconic expressions that humans make of them. Iconicity can include the study of written language, grammar, pictorial representations (from drawings to hieroglyphs), sounds and linguistic utterances, music, and artificial language (Morin et al., 2022).
The work of Long et al. (2021) investigates the parallel change of drawing and recognition of visual concepts and provides support for the picture theory of drawing pictures. They tested children’s changing ability to convey and recognize categorically semantic information via drawings. In their two-part experiment, children drew pictures and categorized other children’s drawings. Long and colleagues found that when children miscategorized drawings, they did so in a semantically rich manner. For example, drawings of books were miscategorized as drawings of televisions, and octopi confused for spiders. They state, “Even when children were unable to convey the basic-level category that they were intending to draw, their drawings still contained rich information about the visual features of that category” (p. 11). Their findings show that children are able to both receive and communicate basic semantic content through sorting other children’s drawings and producing drawings of their own. Rather than attribute this process to the social-cognitive development of a child, Long et al. suggest that it is the categorically similar basic visual features (shape, real-world size, color) that enable young children to form meaningful representations of the world and communicate representational knowledge to others.
Ćwiek et al. (2022), Sidhu et al. (2021), and Sidhu et al. (2020) provide cross-cultural evidence that the linguistic form of language is directly shaped by its referent. They demonstrate an isomorphism between referent and reference by showing patterned similarities in onomatopoeia, nominal labels for shape and color, and vocal tonality across a range of sociohistorical contexts. Such findings corroborate the results of Perniss et al. (2017), Perniss and Vigliocco (2014), and Winter et al. (2022), who argue that iconicity is the basis for all linguistic form. By focusing on language learning in infancy, Perniss et al. (2017) found that caregivers tend to instrumentalize the iconic facets of objects to teach children how to use meaningful language to learn absent objects. Similarly, Winter et al. (2022), demonstrate cross-cultural similarity of consonant sounds across 179 Indo-European languages throughout history. For example, “words that express roughness [e.g., abrasive, barbed, harsh, scratchy] are likely to feature a trilled /r/” (p. 2) rather than words that express smoothness (e.g., lubricant, oily, silky). Last, Ćwiek et al. (2022) expand Köhler’s finding that the artificial word bouba is associated with round objects whereas the artificial word kiki corresponds to sharp or spiky objects; the authors demonstrate this across 25 languages representing nine language families and 10 writing systems. The empirical evidence supporting the iconicity of linguistic forms demonstrates that there are fundamental qualities of objects in the world that shape humans’ use and description of reality. It is therefore necessary to expand the theory of symbolic development to encompass the bilateral relationship between our use of representational language and real, represented objects.
Concluding remarks regarding the PT-Sym
I argue that the PT-Sym addresses the previously mentioned shortcoming (a) by suggesting that it is in the material, iconic qualities (sensation, color, associations, size) of concepts that enable them to become expressed in order to be meaningfully communicated with others in joint-attention. It addresses shortcoming (b) by suggesting that the primary expressions of abstract concepts (e.g., crying from pain) in early childhood precede their abstract formulation in developed language games. That is, a language game facilitates the abstraction of primary expressions, but the expression itself still relies on a material substrate in order to be taken up by others. The variability in linguistic representation across cultures can be explained as a fundamental aspect of the reference–referent relationship due to the isomorphic nature of representation and reality. Such a relationship can be framed as a foundational developmental process that enables humans to learn and participate in language games; this is supported by the above evidence, which shows that humans make use of the material substrate of correlated and noncorrelated concepts and physical objects to linguistically express them. Thus, clear representational patterns of both concepts and objects can be understood to result from a mind-independent reality that informs the ways that humans meaningfully use language.
In sum, I argue that the PT-Sym can serve as the groundwork for the cognitive and social-constructivist accounts of symbolic development. By providing an empirically supported philosophical account of symbolic representation, this account offers an integrated theoretical framework that can be taken up across analytic and experimental paradigms. As discussed above, this is because the main argument of the PT-Sym precedes—and does not contradict—dominant accounts. This work has ultimately sought to demonstrate the need for meaningful language use to be empirically investigated first in terms of the linguistically modeled world in order to understand if and to what extent joint-attentional language games build upon direct, primary, and modeling expressions of reality.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
