Exemplar-based learning probably requires learning abstractions: A commentary on Ambridge (2020)

Abstract

Exemplar-based learning requires (i) a segmentation procedure for identifying the units of past experiences that a present experience can be compared to, and (ii) a similarity function for comparing these past experiences to the present experience. This article argues that for a learner to learn a language these two mechanisms will require abstractions such as linguistic features. Moreover, because the segmentation procedure will vary depending on the language, a radical exemplar theory is likely to require linguistic abstractions for learning.

Keywords

Grammar learnability learning mechanisms phonotactics word segmentation

Any exemplar-based approach to learning requires some mechanism to identify the units or elements of past experiences that will serve as exemplars, to which the present experience will be compared. Experience does not come neatly packaged into the linguistic units that Ambridge (2020) discusses such as words and sentences; instead some mechanism is required to identify and extract them out of James’s (1890/1950) ‘great blooming, buzzing confusion’.

In addition, an exemplar-based approach needs a mechanism for identifying which exemplars from prior experience are relevant to the present experience, and precisely how they are relevant. As Heraclitus famously observed, ‘no man ever steps into the same river twice’ (cf. Plato, Cratylus, 402a). No two experiences are ever exactly the same. This means that exact identity cannot be the criterion for matching the present experience to two exemplars, because no past exemplars can ever be exactly the same. Instead, some aspects of the present experience must be ignored when matching past exemplars, but which?

If exemplars are linguistic units of the kind hypothesized by Ambridge (2020), then both these mechanisms will have to involve linguistic abstractions, such as phonemes, words, syntactic phrases, sentences and the like. But these units are not perceptual primitives, directly available in experience. Moreover, because the properties of these linguistic units vary from language to language, these mechanisms will have to vary depending on the language being learnt. So even a radical exemplar theory of language acquisition still requires linguistic abstractions in order to explain what aspects of past experience can constitute exemplars, and how exemplars are matched with the current experience.

For example, words are a basic building-block of human language, and several of Ambridge’s examples assume that the learner can identify words in the perceptual input. But words are not perceptual primitives; co-articulation means that the speech signal does not come pre-segmented into words, so mapping from the speech waveform to a sequence of words is not just a simple segmentation. As anyone listening to a newly-heard language knows, word-segmentation is language specific; the phonotactics of possible words vary from language to language, so what counts as a word exemplar is something that has to be learned. Thus, words are language-dependent abstractions, and abstract knowledge is required to recognize that ‘this sequence of speech is a possible word’ or ‘this is the location of a possible word boundary’. Without the ability to recognize reusable units like words, an exemplar theory cannot even get off the ground.

An exemplar-based learner also needs to be able to tell whether two word exemplars are instances of the same or different words. This matching process is not mere perceptual similarity; it must ignore perceptually salient features such as speaker identity, and focus on language-specific phonetic properties of the utterance: e.g., aspiration is phonemic in Thai but not in English; tone is phonemic in Chinese but not English. Thus, an exemplar-based learner needs to learn to attend to aspiration in Thai and to tone in Chinese, but to ignore these in English for the purposes of matching and word learning.

The same kinds of points can be made for virtually every linguistic level. For example, word order conveys different information in different languages; in a strict word order language (e.g., English) it conveys predicate-argument structure, while in languages with freer word order (e.g., German, Sesotho) it may convey discourse structure and topicality. A language learner needs to learn whether the similarity function they use when identifying predicate-argument structure should involve attention to word order or not (i.e., should they consider retrieving past sentences with differing word orders as exemplars when identifying the predicate-argument structure of the current utterance?). Thus, this similarity function involves learning abstractions itself.

Note that while we speak of learning abstractions for both the segmentation mechanism and the similarity function, this learning might be something close to what (i) rationalists claim is involved in language acquisition (i.e., there might be an innately constrained set of abstract primitives defining the space of possible segmentation mechanisms and similarity functions), or it might be (ii) empiricist in nature (i.e., the segmentation mechanisms and similarity functions might arise from powerful learning procedures applied to perceptual primitives), or (iii), it might be something else entirely.

Thus, exemplar-based approaches and generalization-learning approaches are less different than Ambridge claims. This shouldn’t be surprising; theoretical work in machine learning established decades ago that feature-based approaches and exemplar-based approaches to learning are often formally equivalent; given a feature-based approach it is possible to find an exemplar-based approach that learns the same generalizations from the same data, and vice versa (Jäkel et al., 2008). However, even though the outcomes of both approaches may be the same, the amount of memory and computation required may be considerably different, with ‘performance’ differences between exemplar-based and feature-based approaches, e.g., in terms of the amount of computation, memory and time each requires.

In sum, exemplar-based learning requires (i) a segmentation procedure for identifying the units of past experiences that a present experience can be compared to, and (ii) a similarity function for comparing these past experiences to the present experience. To truly evaluate the efficacy of a radical exemplar theory approach to learnability, it would need to be evaluated against other approaches in terms of language processing. It may turn out that language processing is much faster when the relevant linguistic units of abstraction can be accessed and deployed on-line. Results from such an evaluation may also vary depending on the memory capacity of the learner. It would therefore be necessary to explore the learning/processing mechanisms of, say, 2-year-olds compared to older children where memory capacity would be greater, and this might differ greatly from that of adults, who have many more exemplars and/or more robust abstractions. Evaluating models of such learning would also need to be compared with the types of overgeneralizations human learners actually make. This would best be carried out in collaboration between proponents of each theory, ensuring appropriate, fully comparable design. As always, the devil is in the detail.

References

Ambridge

(2020). Against stored abstractions: A radical exemplar model of language acquisition. First Language 40(5-6): 509–559.

Jäkel

Schölkopf

Winchmann

F. A.

(2008). Generalization and similarity in exemplar models of categorization: Insights from machine learning. Psychonomic Bulletin & Review, 15, 256–271.

James

(1950). The principles of psychology. Dover Publications. (Original work published 1890).