Abstract
To examine great apes’ on-line prediction of other individuals’ actions, we used an eye-tracking technique and an experimental paradigm previously used to test human infants. Twenty-two great apes, including bonobos, chimpanzees, and orangutans, were familiarized to movie clips of a human hand reaching to grasp one of two objects. Then the objects’ locations were swapped, and in the test event, the hand made an incomplete reach between the objects. In a control condition, a mechanical claw performed the same actions. The apes predictively looked at the familiarized goal object rather than the familiarized location when viewing the hand action in the test event. However, they made no prediction when viewing the claw action. These results are similar to those reported previously for human infants, and predictive looking did not differ among the three species of great apes. Thus, great apes make on-line goal-based predictions about the actions of other individuals; this skill is not unique to humans but is shared more widely among primates.
People’s eyes move proactively rather than reactively when they perform manual actions, as well as when they observe manual actions performed by others (Falck-Ytter, Gredebäck, & Von Hofsten, 2006; Flanagan & Johansson, 2003; Johansson, Westling, Bäckström, & Flanagan, 2001; Land & Furneaux, 1997). That is, people’s eyes spontaneously seek the goal object that is slightly ahead of their own or another person’s action. Such on-line, spontaneous prediction about other individuals’ actions should have important functions in the dynamic social world. For example, it helps people coordinate smoothly with others and also conveys an advantage in competition with others. Also, attending similarly to one’s own actions and to other people’s actions may make it possible to encode the two kinds of actions similarly, which would enhance understanding of other people’s actions.
Ontogenetically, this proactive, goal-directed eye movement during the observation of action emerges around the age of 6 months in humans (Falck-Ytter et al., 2006; Kanakogi & Itakura, 2011). By the end of the 1st year of life, these predictions are performed in a cognitively sophisticated way. For example, Cannon and Woodward (2012) found that when 11-month-old infants saw a person reach toward two objects, they looked predictively toward the object for which the person had previously reached even after the objects’ locations had changed. In a study by Southgate, Senju, and Csibra (2007), when 2-year-old children saw a person approach two containers, they looked predictively toward the one in which the person had previously seen a toy hidden, even though the toy was no longer there. Thus, during the course of development, humans acquire a sophisticated skill in which they visually predict another person’s actions on the basis of that person’s underlying goals and intentions.
Evidence of such proactive goal-directed eye movements in nonhuman species is relatively sparse (Kano & Tomonaga, 2013; Myowa-Yamakoshi, Scola, & Hirata, 2012). Kano and Tomonaga (2013) examined the eye movements of a chimpanzee while she was performing a manual imitation task with an experimenter in a real-life setting. They found that the chimpanzee looked predictively at the reward when the experimenter was reaching for it, but instead looked reactively at the goal of the experimenter’s reaching during the imitation task itself. Myowa-Yamakoshi et al. (2012) found that chimpanzees, 12-month-old human infants, and adults predictively looked at the goals of various actions presented in movies, whereas 8-month-old human infants looked at the goals reactively. Thus, chimpanzees seem to predict the goals of some actions spontaneously, as humans do. However, it remains unclear whether such eye movements are truly goal based, that is, not simply oriented toward the direction of hand or object movements but oriented toward the object that the actor intends to grasp.
Numerous behavioral studies have shown that nonhuman primates make predictions about another individual’s actions on the basis of perception and knowledge. For example, when a subordinate and a dominant chimpanzee were competing for two foods, the subordinate chimpanzee avoided the food the dominant one could see and instead chose the food the dominant one could not see (Hare, Call, & Tomasello, 2000). In another study, chimpanzees helped a human or conspecific partner by passing an out-of-reach object to the partner (Warneken & Tomasello, 2006), possibly as a result of identifying the partner’s current needs (Yamamoto, Humle, & Tanaka, 2012). In a study using a violation-of-expectation paradigm, when macaque monkeys were habituated to a human experimenter reaching for an object behind a barrier and the barrier was then removed, the monkeys were more surprised to see the same reaching movements, which had become inefficient, than to see more efficient, direct reaching movements (Rochat, Serra, Fadiga, & Gallese, 2008; also see Uller, 2004). For most of these studies with primates, similar results have been obtained in corresponding studies with human infants (Call & Tomasello, 2008). Therefore, like humans, nonhuman primates seem to understand another individual’s actions not only in terms of surface behaviors but also in terms of the underlying goals and intentions (Call & Tomasello, 2008).
However, an important outstanding question is whether nonhuman primates make proactive goal-directed eye movements during the observation of other individuals’ actions and do so on the basis of the other individuals’ underlying goals. Such spontaneous goal-based action prediction seems to be a basic skill for general goal attribution. The aim of the current study was to investigate this issue in three species of great apes (chimpanzees, bonobos, and orangutans) with different degrees of phylogenetic distance to humans. All great-ape species show basic similarities in eye movements and scene scanning (Kano, Call, & Tomonaga, 2012; Kano, Hirata, Call, & Tomonaga, 2011). Because previous studies had found evidence of goal attribution in chimpanzees, we anticipated that chimpanzees would exhibit goal-based action prediction. Much less is known about the other species; one study found positive results for orangutans, but not for bonobos (Buttelmann, Carpenter, Call, & Tomasello, 2008). Therefore, it was unclear whether these other species would spontaneously produce goal-based predictions when observing another individual’s action.
We used the eye-tracking technique and adopted an experimental paradigm previously used with human infants (Cannon & Woodward, 2012). Apes were familiarized to a human agent’s repeated reaching action directed to one of two objects. The locations of the objects were then swapped, and the apes’ predictive looks were examined as the agent made an incomplete reach between the objects (Fig. 1). If the apes made goal-based predictions, then they would look predictively toward the prior goal object, rather than its prior location, as human infants do. We also included a control condition in which a mechanical claw moved in a manner similar to that of the hand. Cannon and Woodward found that infants did not predict the goal of claw actions during the test event, although the grasping action of both claw and hand caught their attention strongly during familiarization. Thus, this control allowed us to rule out the possibility that apes looked at the prior goal object during the test event simply because they had attended to a salient grasping action and thereby formed an association between the hand and object during familiarization.

Illustration of the procedure. After three familiarization events in which the hand or claw reached for and grasped one of two objects (the target), the locations of the objects were swapped, and then the hand or claw reached straight between the two objects, pausing equidistantly in front of them. For video illustrations of the procedure, see Videos S1 (hand condition) and S2 (claw condition) in the Supplemental Material.
Method
Participants
Twenty-two great apes participated in this study (4 bonobos, Pan paniscus; 12 chimpanzees, Pan troglodytes; 6 orangutans, Pongo abelii; 12 females, 10 males; mean age = 16.0 years, SD = 11.5; for details, see Table S1 in the Supplemental Material available online). Five additional apes were tested but excluded from analysis because they attended insufficiently to the stimuli (excessive off-screen fixations: n = 4; excessive eccentric eye movements: n = 1). All the apes lived with their conspecifics in seminatural indoor and outdoor enclosures at the Wolfgang Köhler Primate Research Center. All were tested in testing rooms at the research center, and their daily participation in this experiment was voluntary. They were given regular feedings, daily enrichment, and water ad lib. Their care complied with the EAZA (European Association of Zoos and Aquaria) Minimum Standards for the Accommodation and Care of Animals in Zoos and Aquaria, and the research protocol complied with the WAZA (World Association of Zoos and Aquariums) Ethical Guidelines for the Conduct of Research on Animals by Zoos and Aquariums.
Apparatus
The eye movements of the apes were noninvasively recorded with an infrared eye tracker (60 Hz; Tobii X120, Tobii Technology AB, Stockholm, Sweden). Stimulus videos were presented using Tobii Studio software on a 22-in. LCD monitor (1,366 × 768 pixels) at a 70-cm viewing distance (1° of gaze angle corresponded to approximately 1.2 cm on the monitor). The apes were unrestrained but separated from the experimenter and eye tracker by a transparent acrylic panel. However, in order to keep their heads relatively still, we attached a nozzle and tube that dripped grape juice to the acrylic panel and let the apes suck the nozzle during recording (see Fig. S1 in the Supplemental Material). The apes did not receive any explicit training for viewing the stimuli.
Calibration
Two-point automated calibration was conducted by presenting a small object on each reference point. Relatively small numbers of reference points were used in this study because the apes tended to view those reference points only briefly. However, we manually checked accuracy at five points after the initial calibration and repeated the calibration if necessary. With this procedure, a validation session with 19 apes obtained accuracy comparable to that obtained with human participants (the positional error was, on average, 0.5–0.7° on the screen; for details, see Kano et al., 2011). We started the test session when we confirmed that the error value was less than 1.5° around the center of the screen.
Stimuli and procedure
We created our videos by slightly modifying those used in the previous study with infants (Cannon & Woodward, 2012). Overall, we slightly shortened the length of the videos (speeding up the movements) to match the apes’ relatively rapid shifts of attention (Kano et al., 2011). On each trial, we presented a 13,160-ms video (resolution of 1,280 × 720 pixels) at the center of monitor. The video showed a scene with a rubber toy duck (yellow) and frog (green; see Fig. 1). The video comprised three familiarization events (2,230 ms each), one swap event (2,550 ms), and one test event (2,920 ms). Events were separated by a 250-ms blank gray scene. During the familiarization events, either a human left hand (hand condition) or a plastic rod with a claw (claw condition) appeared in the center right portion of the scene, reached to one of the objects (the target; reaching phase) in the top or bottom left of the scene, and grasped that object (grasping phase). The hand or claw moved along a curvilinear path from the starting position to the object. Grasping of the frog or duck was accompanied by a croak or quack sound, respectively. During the swap event, two hands (left and right hands) appeared in the left portion of the scene, grasped the two objects simultaneously, and swapped their locations. Finally, during the test event, a hand or a claw appeared, reached straight, and paused equidistantly in front of the two objects.
Each ape viewed one video (one trial) per day, for a total of four videos in the hand condition and four videos in the claw condition (i.e., within-subjects design; eight trials over 8 days). A given ape completed either the hand or the claw condition first and then completed the other condition; order of the conditions was counterbalanced across apes. Also, the target’s location (top or bottom of the screen) and identity (duck or frog) during familiarization was counterbalanced across apes. To prevent hand orientation from serving as a cue to predict the hand action during the test event, we used the same hand (left hand) during the familiarization and test events. Therefore, if the participants learned from the familiarization events to respond to hand orientation, their action prediction during the test events (after the locations of the objects were swapped) would be based on location but not object.
The experimenter initiated the presentation of each video when the ape was attending to the monitor. If the ape moved away from the monitor during recording (thereby severely disrupting the eye-tracking signals), we repeated the same video on the next day. This occurred with 4 apes, only once each. All apes completed all trials.
Data analysis
For the familiarization events, the apes’ predictive looking was measured by viewing time during the reaching phase (from the onset of hand or claw movement to the touch to the target). We also examined viewing time for each object during the grasping phase (from the touch to the target to the end of the familiarization event) in order to examine whether the participants attended similarly to the target and the distractor (i.e., the other object) in both the hand and the claw conditions. For the test events, the apes’ predictive looking was measured by viewing time during reaching (from the onset of hand or claw movement to the end of video). As there was slight variation in phase durations among the videos (familiarization reaching phase: M = 506 ms, SD = 110; familiarization grasping phase: M = 1,023 ms, SD = 107; test reaching phase: M = 1,899 ms, SD = 51), viewing times were rescaled to values proportional to the mean duration of the appropriate phase.
On average, the apes viewed off-screen areas for 22% (SD = 10.3) of total video duration. When an ape fixated off-screen areas for the entire duration of a given phase, we encoded missing values for that phase (4.5% of all data).
The apes’ predictive looking was additionally measured by the proportion of trials in which they looked at the target (rather than the distractor) first during the test event. We excluded from analysis the trials in which apes viewed neither the target nor the distractor. In addition, as 2 apes in the hand condition and 1 ape in the claw condition viewed neither the target nor the distractor in any of the trials, we excluded those apes from the analysis. Consequently, we analyzed 60% of the trials of the remaining 19 apes for this measure.
A square-shaped area of interest, 200 × 200 pixels in size, was defined for each object (target and distractor; both were approximately 150 × 150 pixels in size). In addition, an area of interest was defined for the trajectory in which the hand or claw moved (see Fig. S2 in the Supplemental Material). Fixation filtering was conducted using the Tobii fixation filter (Version 3.2.1; Tobii Technology AB, Stockholm, Sweden). All measurements were calculated using the Tobii Studio software and MATLAB (The MathWorks, Natick, MA). Statistical analyses were conducted in SPSS Version 20.
Results
Familiarization
The viewing times during the reaching and grasping phases of familiarization (averaged over three events) are shown in Figure 2a. These scores were subjected to repeated measures analyses of variance (ANOVAs) with condition (hand, claw) and object (target, distractor) as within-subjects factors and species (bonobo, chimpanzee, orangutan) as a between-subjects factor.

Experimental results. The graphs in (a) show viewing time during the reaching and grasping phases of familiarization and during reaching in the test events as a function of condition (hand or claw) and object (target or distractor). The graph in (b) shows the proportion of trials in which the apes looked at the target (rather than the distractor) first during the test events in the hand and claw conditions. Error bars denote 95% confidence intervals. Asterisks indicate significant differences between targets and distractors (*p < .05, **p < .01, ***p < .001).
We found a significant interaction between condition and object when the hand or claw was reaching for the target, F(1, 19) = 8.42, p = .009, η2 = .30. We also found a main effect of species, F(2, 19) = 3.87, p = .039, η2 = .29. The other main effects and interactions were not significant. Post hoc tests revealed that during this phase, the apes viewed the target for a longer time than the distractor in the hand condition, t(21) = 2.58, p = .017, Cohen’s d = 0.57, but not in the claw condition, t(21) = 0.51, p = .61, Cohen’s d = 0.16. Also, there was a significant difference between conditions in viewing time for the target, t(21) = 2.64, p = .015, Cohen’s d = 0.56, but not for the distractor, t(21) = 0.03, p = .97, Cohen’s d = 0.007. Thus, the apes predicted the target of a hand action but not that of a claw action during familiarization.
To examine the presence of predictive saccades to the target, we measured the timing of gaze arrival at the target relative to the start of grasping (the end of reaching) in the first of the three familiarization events of each trial. On average, the apes viewed the target 12 ms (SD = 411) before the start of first grasping in the hand condition and 213 ms (SD = 771) after the start of first grasping in the claw condition. The typical saccadic reaction time in these species is longer than 200 ms (Kano et al., 2011), so we tested these values against 200 ms and found that the apes viewed the hand action proactively, t(21) = 2.41, p = .025, Cohen’s d = 0.51, but the claw action reactively, t(21) = 0.08, p = .93, Cohen’s d = 0.01.
Unlike in the reaching phase, there was no interactive effect of condition and object on viewing time during the grasping phase, F(1, 19) = 1.17, p = .29, η2 = .058. The main effect of condition was also not significant, F(1, 19) = 1.09, p = .30, η2 = .055. However, we found a significant main effect of object, F(1, 19) = 143.01, p < .001, η2 = .88. Thus, the apes attended more to the target than to the distractor, and the magnitude of this difference was similar in the claw and hand conditions. We also found an interaction between object and species, F(2, 19) = 5.39, p = .014, η2 = .36, as well as a main effect of species, F(2, 19) = 6.31, p = .008, η2 = .39.
Test
Viewing times during reaching in the test event are shown in Figure 2a. We found a significant interactive effect of condition and object on viewing time, F(1, 19) = 7.48, p = .013, η2 = .28. We also found a main effect of species, F(2, 19) = 7.93, p = .003, η2 = .45. The other main effects and interactions were not significant. Post hoc tests revealed that during this phase, the apes viewed the target for a longer time than the distractor in the hand condition, t(21) = 2.50, p = .020, Cohen’s d = 0.53, but not in the claw condition, t(21) = 1.46, p = .15, Cohen’s d = 0.27. Also, there was a significant difference between conditions in viewing time for the target, t(21) = 3.75, p = .001, Cohen’s d = 0.80, but not for the distractor, t(21) = 0.68, p = .50, Cohen’s d = 0.14. Thus, these results show that apes predicted the target of hand action on the basis of the object but not the location.
To examine the initial responses to the target and distractor when the hand or claw started reaching, we examined the proportion of trials in which apes looked at the target (rather than the distractor) first (Fig. 2b). We found that first look to the target was more frequent in the hand than in the claw condition, t(18) = 2.19, p = .041. Post hoc tests revealed that first look to the target was more frequent than chance (.5) in the hand condition, t(20) = 3.47, p = .002, but not in the claw condition, t(19) = 0.15, p = .88. (For data and analyses across trials, see Fig. S3 and Supplemental Results in the Supplemental Material. Supplemental Results also reports confirmation of the main results with nonparametric tests.)
Species differences
As noted, the three species differed in overall looking times to the objects during both familiarization and test, but not in predictive looking (i.e., there were no significant interactions of species, condition, and object). In general, orangutans looked at the objects for a longer time (and thus at the trajectory area for less time) than the other two species did (see Fig. S4 in the Supplemental Material).
Discussion
Great apes’ eye movements were proactive when the apes viewed the reaches of a person but reactive when they viewed the reaches of a mechanical claw. These proactive eye movements were goal directed, and not simply oriented toward the direction of movements. That is, after the apes viewed a person reaching to grasp one of the two objects (familiarization event) and then saw that the objects’ locations were swapped (swap event), they predicted that the person’s subsequent reaches would be directed to the prior goal (test event). In contrast, they did not make any prediction when they viewed a claw making the same actions during the test event.
The apes’ goal-based prediction about hand action is unlikely to have resulted from simple learning of action goals across trials because the apes never viewed completed hand actions after the objects’ locations had been swapped. The apes’ prediction is also unlikely to have resulted from a simple association of the grasping action and object during familiarization because the apes similarly and strongly attended to the target being grasped in the hand and the claw conditions. Finally, the absence of goal-based prediction about claw action is unlikely to have resulted from excessive attention to the unfamiliar claw stimulus (i.e., failure to disengage attention from the claw itself) because the apes viewed the objects for similar durations during the test events of the hand and the claw conditions (i.e., no main effect of condition) but viewed the target longer than the distractor in the former but not the latter condition (i.e., significant Object × Condition interaction). Thus, the apes’ goal-based action prediction seems to have depended on the familiarity of the agents’ goal-directed behaviors rather than the agents’ saliency or movement per se.
Our findings are strikingly similar to those reported previously for human infants (Cannon & Woodward, 2012). Also, in this study, although the ape species (bonobos, chimpanzees, and orangutans) differed in their overall viewing patterns (looking times to the objects vs. the agent), the pattern of predictive looking did not differ across the species. Thus, we conclude that on-line goal-based action prediction is not uniquely human, but is shared more widely among hominoids. The only difference between our findings and those of Cannon and Woodward’s (2012) study with human infants is that when viewing the claw actions during the test events, infants made location-based action predictions (i.e., more frequent first looks to the distractor than to the target), whereas apes did not make significant predictions (i.e., chance-level performance). However, in the claw condition, the apes tended to view the distractor longer than the target (Fig. 2a), a pattern consistent with location-based prediction, although this effect was not significant. In addition, in Woodward’s (1998) infant study using a violation-of-expectation paradigm, the infants did not distinguish between congruent and incongruent outcomes of the claw action. The most important result of our study and previous studies is that apes and infants made goal-based predictions only in the hand condition, although the grasping action captured their attention similarly in the claw and hand conditions.
Why do great apes seem to predict the goal of a hand but not a claw action? One possibility is that action familiarity enhances action understanding in great apes, as has been shown in many previous studies with macaque monkeys and humans (Falck-Ytter et al., 2006; Flanagan & Johansson, 2003; Rochat et al., 2008; Sommerville, Hildebrand, & Crane, 2008). Theories have proposed that familiar actions are more efficiently encoded than unfamiliar actions because of the operation of direct matching, in which observed action is mapped onto a motor representation of that action (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996; Rizzolatti, Fogassi, & Gallese, 2001). This mechanism may be widely shared phylogenetically (Bonini & Ferrari, 2011; Hecht et al., 2013). Thus, in our study, this same mechanism may have enhanced the apes’ understandings of the familiar hand action compared with the unfamiliar claw action. Another possibility is that cues for agency enhance action understanding in great apes. For example, providing infants with additional abstract movement cues (e.g., self-propulsion) helped them to understand the goal of a mechanical object (Biro & Leslie, 2007; Luo & Baillargeon, 2005). Also, after seeing or interacting with a human agent operating a mechanical claw, infants understood the goal of the claw (Gerson & Woodward, 2012; Hofer, Hauf, & Aschersleben, 2005). Thus, in our study, the lack of cues indicating the cause of the claw action may have prevented the apes from understanding that action as goal directed. Further studies are necessary to examine these possibilities.
Does the finding of goal-based action prediction in great apes reflect their understanding of mental or intentional state of other individuals? Recent evidence and theories suggest that the answer is not necessarily yes (e.g., teleological-stance theory; Gergely & Csibra, 2003). That is, they suggest that humans may develop two modes of action interpretation ontogenetically, and that the attribution of goal states to other individuals may precede the attribution of mental states to them. These two modes of action interpretation can be distinguished from one another by examining interpretation of actions based on true belief and actions based on false belief. Thus, it is conceivable that 11-month-old human infants who showed goal attribution in Cannon and Woodward’s (2012) study might fail to pass a nonverbal false-belief task (Southgate et al., 2007). As previous studies have consistently failed to produce positive evidence for nonverbal false-belief attribution in great apes (Call & Tomasello, 2008; Kaminski, Call, & Tomasello, 2008), future studies should examine whether great apes also fail to make on-line predictions about actions based on false beliefs.
In conclusion, by adopting an eye-tracking technique and an experimental task used in a previous study with infants (Cannon & Woodward, 2012), we found evidence that great apes, like human infants, make on-line goal-based predictions about other individuals’ actions. We suggest that humans are not the only hominoids who are sensitive to other individuals’ goals and spontaneously predict their actions.
Footnotes
Acknowledgements
We thank the keepers of the Wolfgang Köhler Primate Research Center for their help in data collection.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This study was funded in part by the Japan Society for Promotion of Science.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
