Abstract
Grainger, Dufau, Montant, Ziegler, and Fagot (2012a) taught 6 baboons to discriminate words from nonwords in an analogue of the lexical decision task. The baboons more readily identified novel words than novel nonwords as words, and they had difficulty rejecting nonwords that were orthographically similar to learned words. In a subsequent test (Ziegler, Hannagan, et al., 2013), responses from the same animals evinced a transposed-letter effect. These three effects, when seen in skilled human readers, are taken as hallmarks of orthographic processing. We show, by simulation of the unique learning trajectory of each baboon, that the results can be interpreted equally well as an example of simple, familiarity-based discrimination of pixel maps without orthographic processing.
Keywords
Grainger, Dufau, Montant, Ziegler, and Fagot (2012a) taught baboons to discriminate words from nonwords in an analogue of the lexical decision task. They argued that their baboons learned the discrimination by orthographic processing (i.e., computation of letters’ identities and relative positions). They cited two key pieces of evidence to support their claim: The baboons (a) more readily identified novel words than novel nonwords as words (i.e., learning generalized to unstudied items) and (b) had trouble rejecting nonwords that were orthographically similar to words that they had already learned (a feature of orthographic processing in skilled human readers). Grainger et al. concluded that orthographic processing precedes language and that the primate brain is better prepared to process written language than previously thought (cf. Platt & Adams, 2012).
Bains (2012) demonstrated that the baboons studied by Grainger et al. (2012a) could have discriminated words from nonwords by recognizing single letters (i.e., shapes) in specific serial positions. He concluded, therefore, that the evidence could not force the conclusion that the baboons exhibited orthographic processing. Grainger, Dufau, Montant, Ziegler, and Fagot (2012b), however, argued (a) that Bains’s model was consistent with their claim that the baboons discriminated words from nonwords by recognizing letters in position and (b) that Bains neglected to address the fact that the baboons could discriminate novel words from novel nonwords.
The argument by Grainger et al. (2012a, 2012b) rests on the assumption that their baboons treated the stimuli as horizontal arrays of discrete symbols, rather than whole pictures, but they offered no direct evidence to corroborate that assumption. Furthermore, their experimental design confounds category (i.e., word vs. nonword) with category frequency: Each word in a given baboon’s list of learned words was presented at least 80 times, but many of the nonwords were presented only once.
Ziegler, Hannagan, et al. (2013) built on the efforts of Grainger et al. (2012a) by presenting the same baboons with two other tests. The first test compared responses to transposed-letter (TL) nonwords with responses to double-substitution (DS) nonwords. TL nonwords were created by transposing the two internal letters of learned words (e.g., DONE → DNOE). DS nonwords were created by substituting each of the two internal letters of learned words with letters of the same kind (i.e., vowels or consonants: e.g., DONE → DAGE). A higher false alarm rate in response to TL than DS nonwords is referred to as the TL effect—a phenomenon linked to orthographic processing in humans (Grainger, 2008). The second test compared responding to visually similar (VS) and visually dissimilar (VD) nonwords. VS nonwords were formed by randomly selecting one of the two internal letters in a learned word and replacing it with the most visually similar letter (e.g., DONE → DQNE). VD nonwords were also formed by randomly selecting an internal letter to be replaced, but in this case, the least visually similar letter was substituted (e.g., DONE → DFNE). The comparison of performance with VS and VD nonwords was conducted to control for visual similarity as a basis for responding. If the baboons’ responses evinced a TL effect but no similarity effect, the data would strengthen the position that the baboons responded to the orthography rather than visual similarity of the items. That is precisely what Ziegler, Hannagan, et al. found.
A Visual-Familiarity Account
Although the empirical results from Grainger et al. (2012a) and Ziegler, Hannagan, et al. (2013) are consistent with the claim that the baboons engaged in orthographic processing (cf. Frost & Keuleers, 2013; Ziegler, Dufau, et al., 2013), responding on the basis of simple visual familiarity could have yielded a similar pattern of results. For example, it is possible that there was no similarity effect because the method of creating VS and VD nonwords failed to manipulate the particular sources of visual similarity that the baboons used. Furthermore, although the TL effect is thought to reflect orthographic processing, visual familiarity might produce a TL effect as well. In the study reported here, we applied a standard visual-familiarity model previously used to model face recognition (Abdi, Valentin, & Edelman, 1999; Turk & Pentland, 1991; Vokey & Hockley, 2012; Vokey, Rendall, Tangen, Parr, & de Wall, 2004), fingerprint identification (Vokey, Tangen, & Cole, 2009), and artificial-grammar learning (Vokey & Higham, 2004) to evaluate whether the behavior of the baboons used by Grainger et al. demands a conclusion that they engaged in orthographic processing.
Simulating Grainger et al. (2012a)
We applied a principal component analysis, autoassociative neural-network model of memory to the materials and procedures of Grainger et al. (2012a). We represented words and nonwords as pictures by drawing each item into a 28 × 5 black-and-white pixel map of zeros and ones, where each letter appeared as a 7 × 5 dot-matrix character (see Vokey & Higham, 2004). The pixel maps were then converted into a 140-element column vector.
We constructed an autoassociative memory of the n words in a training list for each baboon by (a) forming a 140 × n stimulus matrix,
We computed the familiarity of each test item (i.e., word or nonword),
We used the leave-one-out technique (Abdi et al., 1999) when computing the cosine familiarity of each word (i.e., we removed the word from
We applied the procedure to the particular training and test lists that Grainger et al. (2012a) presented to each of their 6 baboons. Figure 1 shows the model’s discrimination of words from nonwords as a function of animal and the number of eigenvectors used to reconstruct test probes. As shown, the function for each animal reached asymptote at about A′ = .8 (i.e., excellent discrimination) with as few as 10 eigenvectors.

Discrimination of novel words and novel nonwords as a function of the eigenvector range (1:m) used to construct the autoassociative memory for each of the 6 simulated animals. Because each of the animals learned a different number of words (e.g., Dan learned the most, and Dora the fewest), the total number of eigenvectors available to construct the autoassociative memory also necessarily varied. Discrimination is expressed as A′, an unbiased nonparametric measure of discrimination. A′ varies from 0 to 1, and values greater than .5 indicate increasingly good discrimination.
The first two bars in Figure 2 show the mean hit and false alarm rates for novel words and nonwords using only the first 10 eigenvectors. As shown, the model discriminates novel words from nonwords, F(1, 5) = 1,447.20, MSE = 0.0005, p < .0001, rpb2 = .97. The model captures the first result that Grainger et al. (2012a, 2012b) interpreted as evidence for orthographic processing.

Mean proportion of items the model labeled as words as a function of item type. These proportions were calculated on the basis of the first 10 eigenvectors. Error bars indicate Fisher’s least significant difference (α = .05) for each comparison. TL = transposed letter; DS = double substitution; VS = visually similar; VD = visually dissimilar.
Figure 3 shows the model’s false alarm rate in response to the 7,832 nonwords each animal received as a function of orthographic similarity to the learned words, as measured by orthographic Levenshtein distance (OLD20; Keuleers, 2011; R Development Core Team, 2012; Yarkoni, Balota, & Yap, 2008). As shown, the more orthographically similar each nonword was to words in

The model’s false alarm rate (calculated on the basis of the first 10 eigenvectors) for nonwords (i.e., identification of nonwords as words) as a function of orthographic Levenshtein distance (OLD20; Keuleers, 2011; R Development Core Team, 2012; Yarkoni, Balota, & Yap, 2008) from the learned words. Results are shown separately for the lists presented to each of the 6 baboons. OLD20 indexes a string’s similarity to a comparison set as the mean edit distance of the nearest 20 exemplars in the comparison set. The greater a string’s OLD20, the more different the string is from the comparison set (in this case, the less wordlike it is).
We also computed the discrimination index, A′, from the hit and false alarm rates of the 6 baboons studied by Grainger et al. (2012a) and their simulated counterparts (values calculated using only the first 10 eigenvectors). The mean A′ values for the actual baboons (.8206) and the simulated baboons (.8190) were very close. A paired t test failed to detect a statistical difference between the two means, t(5) = 0.1850, p = .8604.
Simulating Ziegler, Hannagan, et al. (2013)
We applied the model to materials from Ziegler, Hannagan, et al. (2013) using the same memory matrices developed in the previous simulations. We constructed TL and DS nonwords using Ziegler, Hannagan, et al.’s instructions. However, for many of the TL nonwords, the transposition operation produced a word (e.g., SANG → SNAG, as determined by the British Dictionary 2.2 for the Excalibur spell-checker; http://excalibur.sourceforge.net); we removed such words from the test set. The same problem arose for many of the DS nonwords, and we solved it by applying Ziegler, Hannagan, et al.’s algorithm iteratively until it produced an acceptable nonword.
The creation of VS and VD nonwords posed a different problem. Ziegler, Hannagan, et al. (2013) used Hausdorff distance as their measure of visual similarity between individual letters, but they provided no detailed information about the appearance of the letters (e.g., font, style, pixel map vs. vector graphic) to which they applied the measure. To resolve the problem, we used a human letter-recognition confusion matrix to manipulate letter similarity (Gilmore, Hersh, Caramazza, & Griffin, 1979). We are not arguing that this matrix represents the letter confusion matrix of Ziegler, Hannagan, et al.’s baboons; rather, it captures the interletter confusions of the species that Ziegler, Hannagan, et al. claimed the baboons emulated (i.e., humans). Using the matrix from Gilmore et al. (1979), we computed the rank ordering of similarity for letters in the alphabet and constructed VS and VD nonwords as described in Ziegler, Hannagan, et al. That is, we randomly chose one of the two internal letters in each word and substituted the most or least similar letter. If the substitution produced a word, we repeated this process using the next most or least similar letter until a nonword was found.
The model’s discrimination of TL, DS, VS, and VD nonwords is shown in Figure 2. The proportions in the figure are mean false alarm rates. As shown, the model matched the critical pattern of discrimination: a large TL effect, F(1, 5) = 48.74, MSE = 0.0017, p = .0009, rpb2 = .91, and no similarity effect, F(1, 5) = 2.01, MSE = 0.0015, p = .2153, rpb2 = .29.
We computed A′ from the hit and false alarm rates for each of the baboons and their simulated counterparts. The model’s fit was good. Mean A′ for the TL effect for the actual baboons (.6639) was not significantly different from that for the simulated baboons (.6430), t(5) = 0.2638, p = .8025. The same was true for the similarity effect: Mean A′ for the actual baboons (.5180) was not significantly different from that for the simulated baboons (.5286), t(5) = 0.1220, p = .9076.
Discussion
A standard, autoassociative model of memory applied to the materials of both Grainger et al. (2012a) and Ziegler, Hannagan, et al. (2013) reproduced the principal results that they cited as evidence of orthographic processing in baboons (i.e., recognition of letter identities in serial positions). Although we cannot rule out the possibility that the baboons performed orthographic processing, our demonstration shows that the principal results can be explained equally well as an example of familiarity-based visual discrimination. Therefore, the results of Grainger et al. and Ziegler, Hannagan, et al. do not force the conclusion that the baboons engaged in orthographic processing.
A criticism of our analysis is that it provides an alternative explanation for results that in and of themselves are more fundamentally problematic than we have discussed. For example, one could object to the original experiments on the grounds that they conflated orthographic and alphabetic processing and argue that additional experiments are needed to disentangle the two. Similarly, the nonword stimuli used by Grainger et al. (2012a) were designed to violate the bigram structure in English words, so it is possible that their results are idiosyncratic to their materials. If the interpretation of experimental results is ambiguous, it could be argued that there is no profit in trying to explain them or debate candidate explanations, as we have. However, we disagree with this view. Although these criticisms merit attention, the possibility that the conclusions might be overturned in the future does not provide rational grounds for rejecting them at this point or for ignoring contrasting explanations for the current empirical data.
Another criticism that could be applied to our analytic approach is that we cannot deconstruct the model’s memorial representation of the training list (i.e.,
Our analysis shows that the behavior of the baboons studied by Grainger et al. (2012a) can be understood as reflecting visual rather than orthographic discrimination of words and nonwords. However, the analysis points to a grander and more interesting possibility: Just as participants in an artificial-grammar experiment can behave as if they learned the grammar when they did not, our analysis suggests that the baboons used by Grainger et al. can behave as if they learned English orthography when they did not. This analogy points to an important conceptual bridge between work on artificial-grammar learning in humans (e.g., Brooks, 1978; Jamieson & Mewhort, 2009, 2011; Reber & Allen, 1978; Vokey & Brooks, 1992; Vokey & Higham, 2004) and alleged orthographic processing in baboons (Grainger et al., 2012a, 2012b; Ziegler, Dufau, et al., 2013; Ziegler, Hannagan, et al., 2013). At a minimum, the analysis by Grainger et al., especially in combination with the analysis we have presented here, poses a thoughtful and positive challenge for researchers to reexamine how they conceptualize orthography and the methods they use to study it.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This work was supported by Natural Sciences and Engineering Research Council of Canada Discovery Grants 14391 (to J. R. Vokey) and 355882-2013 (to R. K. Jamieson).
