Abstract
The present study examined the contribution of efficiency reasoning and statistical learning on visual action anticipation in preschool children, adolescents, and adults. To this end, Experiment 1 assessed proactive eye movements of 5-year-old children, 15-year-old adolescents, and adults, who observed an agent stating the intent to reach a goal as quickly as possible. Subsequently the agent could four times either take a short, hence efficient, or long, hence inefficient, path to get to the goal. The results showed that in the first trial participants in none of the age groups predicted above chance level that the agent would produce the efficient action. Instead, we observed an age-dependent increase in action predictions in the subsequent repeated presentation of the same action. Experiment 2 ruled out that participants’ nonconsideration of the efficient path was due to a lack of understanding of the agent's action goal. Moreover, it demonstrated that 5-year-old children do predict that the agent will act efficiently when verbally reasoning about his future action. Overall, the study supports the view that rapid learning from frequency information guides visual action anticipations.
Research has shown that humans are not merely passive observers of physical events and others’ behaviour, but that their allocation of visual attention is proactive. For example, it has been demonstrated that the mere perception of a grasping cue leads to obligatory attentional shifts to the related action goal (Fischer, Prinz, & Lotz, 2008). Similar anticipatory shifts of visual attention are also observed in other domains of vision indicating that visual perception is active and shaped by the observers’ expectations (e.g., Geyer, Müller, & Krummenacher, 2008; Rao & Ballard, 1999; Rolfs, Jonikaitis, Deubel, & Cavanagh, 2011; Võ & Wolfe, 2012). Such visual anticipations are present already early in human development (e.g., Canfield, Smith, Berzsnyak, & Snow, 1997; McMurray & Aslin, 2004). These and related findings thus suggest that from early on humans form expectations about the future course of events, which modulate their perceptual processing (Atkinson, 2000).
Active perception is not restricted to the processing of physical events. Also the perception of others’ actions leads to anticipatory gaze shifts (e.g., Ambrosini, Costantini, & Sinigaglia, 2011; Falck-Ytter, 2012; Flanagan & Johansson, 2003; Schneider, Bayliss, Becker, & Dux, 2012), that is, a visual fixation towards the future target or future means of another's action before being attained by the protagonist herself. These gaze shifts are supposed to reflect the observers’ expectations about future courses of others’ actions and possible end states.
Moreover, from a social-cognitive point of view the examination of these anticipatory looking behaviours is highly interesting as they are supposed to reflect implicit social-cognitive processes. That is, in contrast to time-consuming, effortful, and reflective verbal reasoning about others’ behaviour (e.g., Bello et al., 2014), they are supposed to indicate an implicit understanding of others (e.g., Apperly & Butterfill, 2009) and/or an automatic and unconscious processing of social information (e.g., Bargh, 2006; Paulus, 2012a; Wilson, 2002). Indeed, research has shown that verbal responses and anticipatory gaze shifts do not necessarily converge to the same results (e.g., Ambrosini, Pezzulo, & Costantini, 2015; Clements & Perner, 1994). Based on these and related findings, dual process accounts of social cognition have been put forward (e.g., Evans, 2008; Frith & Frith, 2008). For example, Apperly and Butterfill (2009) suggested that humans rely on two types of social-cognitive information processing. One of them, the implicit system—usually assessed by gaze measures—is fast, efficient, but inflexible. The other one, the explicit system, is slow, cognitively demanding, but open to reflection and correction. Thus, whereas verbal responses are proposed to result from reflective processing triggered by the explicitly asked test question (Perner & Roessler, 2012), measures of anticipatory gaze tap into the implicit social-cognitive information processing.
Moreover, these implicit measures have been heavily used in developmental research to assess preverbal children's processing of others’ behaviour (Ambrosini et al., 2013; Kanakogi, & Itakura, 2011). Given the fundamental difference between the explicit and effortful verbal reasoning processes on the one hand and the implicit looking behaviour on the other hand, there has been a debate in cognitive as well as developmental science on the psychological mechanisms that subserve these implicit processes (e.g., DeBruin & Newen, 2012; Müller & Giesbrecht, 2008; Ruffman, 2014). We expand on these different views in the next paragraphs.
The present study aims at contributing to this debate by characterizing the psychological mechanisms that lead to anticipatory gaze shifts in greater detail. One mechanism put forward by a number of theories concerns the idea that statistical learning guides action perception (Baldwin, Andersson, Saffran, & Meyer, 2008; Ruffman, 2014; Ruffman, Taumoepeau, & Perkins, 2012). According to this approach, adults and infants employ frequency information about actions to learn about behavioural patterns (e.g., Boseovski, Chiu, & Marcovitch, 2013; Boseovski & Lee, 2006; Lagattuta & Sayfan, 2013; Saffran, Aslin, & Newport, 1996; Smith & Yu, 2008) and to anticipate future actions (Paulus, Hunnius, van Wijngaarden, et al., 2011).
On the other hand, it was suggested that the “principle of rational action” helps us to cut down the complexity of observed behaviour to a small subset of possible future actions: we predict others’ actions based on the expectation that they act rationally and efficiently—that is, they attempt to reach a goal with minimum expense (Gergely & Csibra, 2003). Studies in adults have suggested that the principle of rational action guides action understanding and goal inferences (e.g., Baker, Saxe, & Tenenbaum, 2009; Brass, Schmitt, Spengler, & Gergely, 2007). Yet, there is an intense ongoing debate on the ontogenetic origins of this ability in developmental science (Bíró, 2013; Elsner, Pfeifer, Parker, & Hauf, 2013; Gergely, Nádasdy, Csibra, & Bíró, 1995; Paulus, Hunnius, Vissers, & Bekkering, 2011; Perner, Sprung, & Steinkogler, 2004; Scott & Baillargeon, 2013; Sodian, Schoeppner, & Metz, 2004). Some suggested that even three-month-olds (Skerry, Carey, & Spelke, 2013) expect agents to act efficiently. Moreover, it remains hotly disputed whether also online action prediction is subserved by efficiency reasoning (e.g., Bíró, 2013; Paulus, Hunnius, van Wijngaarden, et al., 2011).
One recent study tried to examine the impact of both statistical learning and efficiency reasoning on children's and adults’ anticipatory eye movements (Paulus, Hunnius, van Wijngaarden, et al., 2011). In this study, adults and nine-month-old infants observed an agent who repeatedly took a long path to reach a goal. A shorter path leading to the same goal was closed. Thus, during this learning phase the agent's action was frequently observed and efficient. In a subsequent trial, the short path was passable. In line with the statistical learning account, adults’ and infants’ gaze behaviour showed that they predicted that the agent would again take the long (now inefficient) path rather than the shorter and more efficient path.
However, because frequency information and the first occurrence of an efficient action were confounded in the first trial, this study did not allow for separately evaluating the influence of the respective mechanism on action prediction in the first trial. It might therefore underestimate people's reliance on efficiency information to predict future actions and overestimate the impact of frequency information. A more appropriate empirical test on the presence of efficiency considerations in human action prediction would be a task in which efficiency considerations are—at least in the first trial—not confounded with frequency information.
To this end, we designed a task in which the influence of statistical learning and efficiency expectations on action prediction could be evaluated separately. We introduced an agent who explicitly stated that he would reach his goal as fast as possible. Even though it has been proposed that infants already possess an a priori assumption that others act efficiently (Gergely & Csibra, 2003), we wanted to exclude all possibilities of doubt that the agent would not be interested in acting efficiently (see Bíró, 2013). Subsequently the agent could either take a short (and thus efficient) or long (hence inefficient) path to reach the goal. The agent took the short path four times and reached the goal every time. Given the study's main aim to examine the processes subserving proactive visual attention during action perception, we analysed participants’ anticipatory eye movements.
Our first aim was to clarify how these two cognitive mechanisms might affect adults’ action anticipation. Moreover, given the interest in the developmental literature (e.g., Ambrosini et al., 2013; Bello et al., 2014; Ruffman, 2014), we also decided to examine preschool children and adolescents. We chose 5-year-olds as the youngest age group because at least by that age children understand that an agent's behaviour can be driven by explicitly stated underlying goals (e.g., Perner & Roessler, 2010). We also examined a group of 15-year-olds, as a number of recent findings indicated an increasing understanding of others’ actions in adolescence (e.g., Güroğlu, van den Bos, & Crone, 2009).
Of major interest was the question of whether participants would anticipate an efficient action already in the first test-trial, before learning about the frequency of the protagonist's actions. If efficiency considerations affect adults’ and even young children's visual action predictions, we should find this effect in all age groups. Alternatively, if these considerations are the product of a more protracted developmental process, we would expect developmental changes up into adulthood. Finally, if efficiency considerations do not play a role in visual action prediction (e.g., as it is a cognitively effortful process that requires explicit reasoning), we would expect no systematic anticipation to the short path in the first trial. Additionally, by repeatedly presenting the protagonist's efficient behaviour, we were able to examine whether and how quickly participants’ action prediction is informed by frequency information—that is, their memory of previous events.
Experiment 1
Method
Participants
Twenty-three 5-year-olds (M = 5.0 years, SD = 0.1 years; 10 females), twenty 15-year-olds (M = 14.9 years, SD = 0.5 years; 13 females), and twenty-two adults (M = 24.6 years, SD = 8.6 years; 17 females) took part in the study. Two additional 5-year-olds, one 15-year-old, and one adult were excluded because they did not reach the inclusion criterion of providing gaze data in more than two out of the four test phases. The 5-year-olds and the 15-year-olds were recruited via birth records. Adults were recruited from a student population. Children's caregivers and adult participants gave informed written consent. All subjects received small gifts and/or monetary compensation for their participation. The study was approved by the local ethics committee and was conducted according to the principles set in the declaration of Helsinki.
Stimuli
Stimulus material consisted of two introductory movies and one test movie that contained four test trials. Animated movies were prepared using Adobe CS5.5 (Adobe Systems, San Jose, CA). The first introductory movie (21 s) depicted a light-brown path on a green background. On the path there was a turtle and a lettuce in frontal view (Figure 1A). The turtle said “yummy—lettuce—I love lettuce” and ate the lettuce. Afterwards the turtle stated verbally “I want to have more lettuce as fast as possible!” The second introductory movie (17 s) began with a freeze frame, which depicted the top view of a path, four branch points (branching into a short and a long path, which rejoined after a short distance), four transparent occluders that overlay these branch points, and four lettuces on the path behind these branch points (Figure 1B). Simultaneously, the participants heard the turtle saying, “I am so hungry, I have to quickly eat lettuce!” Subsequently the turtle entered the scene on the path from the left, and the scene zoomed in on the first branch point and lettuce.

Paradigm: (A) Introductory movie 1 introduced to the participants the agent and its preference for lettuce. (B) In the introductory movie 2 the entire course of the path was presented. The agent stated that its goal was to get more lettuce as soon as possible. (C) Example of one trial: the 2.4 s during which the moving turtle was invisible served as test phase for the analysis of gaze data (bordered frame). In total, four trials were presented. (D) Rectangles resemble approximate positions of areas of interest for initial analysis. (E) Approximate positions of areas of interest for the post-hoc analysis. To view this figure in colour, please visit the online version of this Journal.
These introductory movies were presented for the following reasons. First, it was demonstrated that the turtle moves on the path and not on the green background. Second, we familiarized the participants with the entire course of the path and objects on the path. Further purposes of these introductory movies were to show that (a) the agent has a preference for lettuce and (b) the agent's goal is to get more lettuce as fast as possible. Previous studies with infants have demonstrated that even for infants, one familiarization of an agent's goal-directed behaviour is sufficient to yield an understanding of another's action goal (e.g., Krogh-Jespersen, Liberman, & Woodward, 2015; Paulus, 2011). Nevertheless, to ensure an understanding of the other's action goal, we additionally added the agent's verbal statements about his goal.
The test movie contained four trials. At the beginning of the first trial, the turtle was standing in front of the first branch point and stated again its preference for lettuce (“yummy—lettuce”). After that the scene again zoomed in so that the lettuce disappeared from the screen (to prevent fixations on the lettuce during the test phase). Subsequently, the occluder gradually turned opaque, so that the branch point of the short and long path was no longer visible. The turtle wiggled, walked along the path towards the occluder, and disappeared under it for 2.4 s (eliciting anticipatory gaze shifts). The turtle then reappeared on the short path on the other side of the occluder. It followed the short path until it reached the lettuce, ate it, continued to walk on the path, and stopped in front of the next branch point. The aperture of the scene followed the turtle's movement, so that the lettuce and the next branch point appeared at the right-hand side of the screen (Figure 1C). The second, third, and fourth trial were identical to the first trial. The total duration of the trial was approximately 25 s. The orientation of the long and short path (on the upper/lower part of the screen) changed from trial to trial and was counterbalanced between participants.
The 2.4 s during which the moving turtle was invisible served as test phase for the analysis of gaze data. Because the participants could not see which path the turtle entered at the branch point, anticipatory fixations on the short or the long path could be employed as a measure of action anticipation.
Apparatus and procedure
Eye movements were recorded with a Tobii T60 eye tracker (60 Hz sampling rate; Tobii Technology, Stockholm, Sweden). Tobii Studio 3.1 software (Tobii Technology) was used to present the stimuli on an integrated 17″ TFT monitor (1280 × 1024 pixels). Participants sat on a chair at a distance of approximately 60 cm from the screen. The eye-tracker was mounted onto a flexible monitor arm, so that its height could be individually determined. A 9-point calibration procedure preceded the stimulus presentation.
Data analysis
The Tobii standard fixation filter with a velocity threshold of 35 pixels/window and a distance threshold of 35 pixels was used to identify fixations from the raw data. We were interested in the location of the first fixation after the turtle disappeared behind the occluder as a measure of action anticipation. Therefore, the areas where the long and short path reappeared from behind the occluder served as same-sized areas of interest (AOI “short path” and AOI “long path”; each covered 4.5% of the scene, see Figure 1D).
To check whether our stimuli sufficiently triggered anticipatory looking, and whether groups would differ in their overall amount of anticipatory fixations, we analysed the frequency of anticipatory first fixations to one of the two paths in the first trial and in the repeated presentation of the trials (fixation to short or long path was coded as 1, fixation elsewhere on the scene as 0).
Following up on previous research (e.g., Paulus, Hunnius, van Wijngaarden, et al., 2011), we calculated the following first fixation score as principal measure: A trial with an anticipatory first fixation to the short path was coded as 1, a trial with an anticipatory first fixation to the long path was given the value −1. If fixations were directed elsewhere on the screen during the test phase but not on one of the AOIs, the trial was coded as 0. Gaze data were missing in 5 out of 260 trials (1.9% of all trials watched by the participants). To allow full statistical treatment of the data, we inserted the group mean for these trials. For statistical analyses we employed IBM SPSS Statistics 22 (SPSS Inc., Chicago, IL, USA). The significance level for all analyses was p ≤ .05. All reported post-hoc comparisons are Bonferroni-corrected.
As complementary measure, we analysed the durations of all fixations to the short and long paths during the 2.4 s of the anticipatory period by calculating a differential looking score (DLS; e.g., Senju, Southgate, White, & Frith, 2009). The DLS takes all fixations during the anticipatory period into account and is therefore also sensitive to corrective eye movements—for example, it could be that although a first fixation was directed to the long path, following fixations during the anticipatory period landed on the short path. This score was calculated by subtracting the total duration of all fixations on the long path from the total duration of fixations on the short path, divided by the sum of total duration of all fixations on the short and long paths. A DLS of 1 indicates a strong looking bias towards the short path, a DLS of −1 a strong looking bias towards the long path. A DLS of 0 indicates no preference for either path.
Results
We separately analysed fixations to the short or the long path for the first and the repeated presentation of the trials based on following considerations: First, we aimed to independently assess the impact of any a priori assumptions and frequency information on anticipatory looking at different ages. Second, to avoid alpha inflation, we grouped the four trials in the smallest number of bins (i.e. two), defined by their key features of interest: the first trial lacks prior frequency information about that agent's action, whereas the second, third, and fourth trial include previous experience with the agent's action (detailed 3 × 4 repeated measures omnibus ANOVAs with the factor age and the within subjects factor trial are provided in the Supplementary Material; further, descriptive statistics for anticipatory first fixations and the DLS are provided in Supplementary Tables 1 and 2) Additionally, we report the analysis of the overall frequency of anticipatory looking on the paths for any group differences (see Table 1 for descriptive statistics).
Frequency of anticipatory first fixations to one of the two paths for each age group and trial, listed for the initial analysis and the post-hoc analysis
Note: First fixations in percentages.
First trial: Frequency of anticipatory looking
A chi-square test with location of fixation as dependent variable (to one of the paths or elsewhere) yielded no significant difference between age groups, χ2(2, N = 65) = 2.81, p = .246, ΦCramer = .21. This analysis shows that the stimulus material elicited a comparable amount of first fixations on the short and long path in the first trial in all age groups.
First trial: First fixations
Figure 2 displays mean scores of anticipatory first fixations for each age group and trial. Of particular interest was whether (a) age groups differed in their anticipation to the short path in the first trial, and (b) participants systematically predicted that the agent would appear on the short path, indicated by a mean first fixation score above chance level. To this end, we performed a one-way ANOVA with first fixation score as dependent variable and the factor age (5-year-olds, 15-year-olds, adults). This analysis showed no significant difference between the age groups, F(2,62) = 1.25, p = .295,

Mean first fixation scores (±SEM) of anticipatory first fixations for each age group and trial of the initial analysis. Fixations either on the short or the long path during the test phase were the basis for this score. A positive score indicates a looking bias towards the short path, a negative score a bias towards the long path. The horizontal line indicates chance level.
Given that the null results of this one-sample t test across all groups is a critical basis for our conclusions, we examined this result by estimating the Bayes factor in favour of the null hypothesis, adhering to the procedure described by Rouder, Speckman, Sun, Morey, & Iverson, 2009). The resulting estimated Bayes factor (null/alternative) suggested the data are 8.44 times more likely to occur under a model without a looking bias towards the short path in the first trial. In other words, this Bayes factor yields substantial evidence (Wetzels et al., 2011) for the null hypothesis, supporting the classical t test finding.
First trial: Differential looking score
Consistent with the first fixation analysis, the one-way ANOVA with DLS in the first trial as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) revealed no significant difference between age groups, F(2,62) = 2.91, p = .062,
Repeated trials: Frequency of anticipatory looking
To analyse the overall amount of anticipatory looking in the repeated presentation of the trials, we averaged the score for first fixations on either path over the last three trials. A one-way ANOVA with this score as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) indicated no significant group difference, F(2,62) = 0.27, p = .766,
Repeated trials: First fixations
To assess the impact of frequency information on action anticipation, we calculated a mean first fixation score averaged over the last three trials. A one-way ANOVA with this averaged score as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) revealed a significant difference between groups, F(2, 62) = 9.85, p < .001,
Repeated trials: Differential looking score
The one-way ANOVA with mean DLS over the second, third, and fourth trial as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) also showed a differential effect of repeating the trials on the age groups, F(2, 62) = 18.28, p < .001,
Post-hoc analysis
Even though participants were familiarized with the setup, one could argue that it is possible that our analysis underestimates participants’ anticipations of the agent's action: to get to the lettuce as fast as possible, the turtle could have taken the most direct route across the green background without using one of the paths. To account for this, we re-ran the analysis with an additional AOI covering the area on the right-hand side of the occluder between the two paths (Figure 1E). A trial with an anticipatory first fixation to the efficient path or on the area between the paths was coded as 1, a trial with an anticipatory first fixation to the inefficient path was given the value −1. If fixations were directed elsewhere on the screen during the test phase but not on one of the AOIs, the trial was coded as 0. It is important to note that in this analysis the probability for a fixation to be coded as “efficient” is higher than to be coded as “inefficient”. Thus, this post-hoc analysis is biased to classify predictive eye movements as in line with the principle of rational action. The amount of missing data and their treatment was identical to initial analysis. Supplementary Tables 1 and 2 provide descriptive statistics for anticipatory first fixations and the DLS of the post-hoc analysis.
First trial: Frequency of anticipatory looking
Also in the post-hoc analysis we computed the overall frequency of anticipatory looking (see Table 1). Again, the age groups did not differ in the frequency of anticipatory looking, χ2(2, N = 65) = 0.89, p = .641, ΦCramer = .12. The age groups showed comparable amounts of first fixations also in the more liberal analysis.
First trial: First fixations
Figure 3 shows mean scores of anticipatory first fixations in the post-hoc analysis for each age group and trial. One aim of this more liberal post-hoc analysis was to check whether (a) our age groups would now differ in first fixations on the short path and (b) these first fixations now differ significantly from chance, indicating efficient action predictions already in the first trial. A one-way ANOVA with first fixation score as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) again revealed no significant difference between the age groups, F(2, 62) = 0.899, p = .412,

Mean first fixation scores (±SEM) of anticipatory first fixations for each age group and trial of the post-hoc analysis. This analysis included an additional area of interest (AOI) between the two paths. Fixations on this AOI were also considered to resemble an anticipation of an efficient action. Thus, this analysis is biased to classify predictive eye movements as in line with the principle of rational action. A positive score indicates a looking bias towards the short path, a negative score a bias towards the long path. The horizontal line indicates chance level.
Although statistically not justified (due to the lack of a group difference in first fixations in the first trial), we computed one-sample t tests separately for each age group to accommodate the important theory-motivated question whether participants in different age groups anticipate efficient actions without prior frequency information. This analysis demonstrated a lack of looking bias towards the short path in 5-year-olds, t(22) = 0.75, p = .459, Cohen's d = 0.16, and in 15-year-olds, t(19) = 1.30, p = .211, Cohen's d = 0.29. However, the adults’ first fixation score was significantly different from zero, t(21) = 2.81, p = .010, Cohen's d = 0.60, indicating that—at least in the more liberal analysis—adults showed a looking bias towards the short path as early as in the first trial. It is important to note that this result has to be interpreted with caution, given the lack of a significant group difference (see also discussion).
First trial: Differential looking score
The DLS for the post-hoc analysis was calculated by subtracting the total duration of all fixations on the long path from the sum of total duration of fixations on the short path and on the area between the paths. This was divided by the sum of total duration of all fixations on the short and the long path, as well as on the area between the paths. The one-way ANOVA with DLS in the first trial as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) showed again no significant difference between age groups, F(2, 62) = 1.62, p = .206,
Repeated trials: Frequency of anticipatory looking
In the post-hoc analysis, a one-way ANOVA with the averaged score for first fixations on either path over the last three trials as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) again revealed no significant difference between the age groups, F(2, 62) = 0.87, p = .423,
Repeated trials: First fixations
We again calculated a mean first fixation score averaged over the last three trials. The one-way ANOVA with this averaged score as dependent variable and the factor age (5-year-olds, 15-year-olds, adults) showed a significant difference between the groups, F(2, 62) = 7.05, p = .002,
Repeated trials: Differential looking score
We again performed a one-way ANOVA with mean DLS over the second, third, and fourth trial as dependent variable and the factor age (5-year-olds, 15-year-olds, adults). Parallel to the first fixation analysis, this ANOVA revealed a significant age group difference, F(2, 62) = 13.73, p < .001,
Discussion
Both gaze measures, anticipatory first fixations and the differential looking score, suggest that 5-year-olds, 15-year-olds, and adults do not systematically visually anticipate an agent's action in a novel situation: Although information about the agent's goal and situational constraints were available, participants at all ages did not make use of this information to systematically direct predictive saccades towards the short path congruent with the agent's efficient action in the first trial. Instead, it was the repeated presentation of the agent's action—i.e. frequency information—that drove visual efficient action anticipations above chance level in 15-year-olds and adults. Interestingly, the 5-year-olds showed no looking bias towards the short path in the repeated presentation of the trials. This age difference in the utilization of frequency information is addressed in the General Discussion.
Importantly, our analysis showed that participants did, indeed, perform anticipatory eye movements already in the first trial. This excludes the concern that participants may just have stayed fixating the occluder, as they were unsure whether the agent would reappear. In contrast, participants tried to predict the path the agent would take already in the first trial.
It is noteworthy that the AOI placement in the initial analysis could have underestimated the participant's anticipation of the agent's efficient action. To take into account that the participants might have anticipated that the agent would reappear on the space between the two paths and thus predicted that the agent would take the most direct route, we performed a post-hoc analysis including an AOI covering this zone. Now, adults anticipated that the agent would take the short path at above chance level. However, this finding should be interpreted with caution. In this additional analysis also rather ballistic eye movements, echoing the agent's horizontal trajectory, would have been falsely encoded (i.e. false positives) as being predictive.
Even though we have good reason to assume that all participants should have easily been able to understand the protagonist's goal given that the turtle verbally stated his goal several times and his goal-directed behaviour was demonstrated in the introductory movies (similar to recent studies with much younger participants; e.g., Krogh-Jespersen et al., 2015; Paulus, 2011), one could have wished for direct evidence that participants did indeed understand the protagonist's goal. To this end, we decided to run a second experiment. Experiment 2 closely followed Experiment 1, with the difference that the test trials were replaced by explicit questions in which children were asked to name the protagonist's desire and goal. Moreover, we also added an additional test question in which we explicitly asked participants which path the protagonist would be going to take. Given findings that verbal responses and anticipatory gaze shifts do not necessarily converge to the same results (e.g., Clements & Perner, 1994), this question allowed us to examine whether participants would be able to reason explicitly about the efficiency of another's future actions.
We decided to examine a group of 5-year-old children, as this age group was the youngest age group examined in our first experiment. We reasoned that if this age group would be able to solve the task, there should be no doubt that the older participants would also be able to do so.
Experiment 2
In Experiment 2 we presented the identical introductory movies and the first trial to 5-year-old children. In the anticipatory period of the first trial (i.e. when the protagonist disappeared under the occluder), we paused the movie and asked the children explicitly (a) about the agent's desire, (b) the agent's goal, and (c) which path the agent will take.
Method
Participants
Twenty-one 5-year-olds (M = 5.3, SD = 0.25, 11 female) took part in Experiment 2. The participants were recruited via birth records and from a local kindergarten and received small gifts for their participation. The caregivers provided informed written consent. The study was approved by the local ethics committee and was conducted according to the principles set in the declaration of Helsinki.
Stimuli and procedure
The stimulus material and procedure was identical to Experiment 1, except that after the two introductory movies only the first trial of the test movie was shown (up or downward orientation of the paths was counterbalanced between participants). The child sat on chair in front of a desk. Stimuli were presented on a laptop on that desk. The test movie was stopped right after the turtle had completely disappeared under the occluder. Following a standardized protocol, the experimenter asked three questions. The first question was, “What does the turtle want?” The second question was, “Where does the turtle want to go?” The third question was, “Which path will the turtle take?”. The experimenter noted the participant's answer. Additionally, the test sessions were videotaped.
Data analysis
For the first question, all answers referring to lettuce were coded as correct (examples for the children's answers are “lettuce”, “eat lettuce”, or “wants to eat the lettuce”). For the second test question, all answers describing the turtle's desired goal state—i.e., get to the lettuce behind the branch point—were coded as correct (examples: “to the lettuce”, “there, to the lettuce”, “to the new lettuce”). For the third test question, verbal responses referring to the short path (examples: “the short one”, “the shortcut”) and unambiguous pointing to the short path was coded as correct response. If children did not give a verbal response directly, they were encouraged to point at the screen.
Results and discussion
The first question about the agent's desire was answered correctly by all 5-year-olds. In the second question about the agent's desired goal state, 16 out of 21 children gave a correct answer. In the 5 responses coded as incorrect, the children were referring to the path instead of the lettuce (n = 3; e.g., “follows the path”), gave no answer (n = 1), or stated that they did not know (n = 1). In the third question about the agent's upcoming action, 16 out of 21 children correctly referred to the short path (significantly different from chance, p = .027, binomial test, two-tailed; chance level of .5 determined on the fact that there were two paths available). Interestingly, one child gave the answer “It will walk straight ahead, on the grass”. This verbal response matches the rationale of the post-hoc analysis of Experiment 1 and was coded as a correct answer predicting the agent's efficient action (taking the most direct route to the lettuce is rational and very efficient). In the 5 wrong answers, the 5-year-olds referred to the long path either by verbally stating “the long path” or pointing at it. Overall, wrong answers were given unsystematically: the 5 children who answered the third question incorrectly gave a correct answer in the second test question, and vice versa. No child answered both the second and the third question incorrectly.
Findings from Experiment 2 confirmed our assumption that participants in Experiment 1 were provided with enough information about the agent's desire, goal, and situational constraints. Impressively, all 5-year-olds understood the agent's desire for lettuce. Furthermore, children were able to explicitly refer to the agent's goal and to predict the agent's action. This is in line with prior findings that at the age of around 5 children are able to explicitly reason about others’ goals and to predict following future actions (see e.g., Perner & Roessler, 2012). We discuss further implications of these findings in the General Discussion.
General Discussion
The present work investigated the processes subserving people's anticipatory gaze shifts during action perception. We introduced an agent to the participants who repeatedly stated the intent to reach a goal as fast as possible. Subsequently, to get to the goal, the agent could either take a short (efficient) or a long (inefficient) path. Analyses of participants’ anticipatory eye movements revealed two important findings.
First, contributing to a long-standing dispute about teleological reasoning in human children and adults (e.g., Bíró, 2013; Paulus, 2012b; Skerry et al., 2013), results showed that participants of neither age group visually predicted above chance level that the agent would produce the more efficient action in a novel situation. Although the agent expressed the intent to get to the goal as fast as possible, and the goal and situational constraints to achieve it were evident in the scene, participants did not spontaneously visually anticipate that the protagonist would take the shorter of two available paths. Second, the analyses revealed an impact of frequency information on proactive eye movements, supporting claims about the role of statistical learning in action prediction (e.g., Paulus, Hunnius, van Wijngaarden, et al., 2011; Ruffman et al., 2012).
Overall, the current results are highly relevant for recent theoretical debates on the mechanisms subserving (the development of) human action processing and action understanding (e.g., DeBruin & Newen, 2012; Springer et al., 2011; Uithol & Paulus, 2014). Recent evidence suggests that frequency learning plays a role in adults’ and infants’ action predictions (Boseovski & Lee, 2006; Paulus, Hunnius, van Wijngaarden, et al., 2011; Ruffman, 2014; Ruffman et al., 2012). To test the influence of frequency information on visual action prediction, we repeated the agent's efficient action three times. This accumulation of experience with the agent's action over the trials led to an increase in predictions of this action in 15-year-olds and adults. These results suggest that these age groups predicted the agent's action based on frequency information rather than on a principle of rational action. Notably, this is also in line with findings from a recent recurrent connectionist network approach (van Overwalle, 2010), which suggests that efficiency considerations can emerge from learning about trajectories without the need for a priori assumptions about efficiency.
Interestingly, frequency information had a different effect on age groups: 5-year-olds showed a significantly weaker looking bias towards the short path than the 15-year-olds and adults, who did not differ from each other in their action predictions across all trials. Furthermore, 5-year olds showed no above-chance looking bias towards the short path. It seems that the ability to exploit frequency information about observed actions to predict future actions might still develop after 5 years of age. This suggests that rapid statistical learning about contingencies in movements and actions, a mechanism we use from very early on (Kirkham, Slemmer, & Johnson, 2002; Ruffman et al., 2012; Saffran et al., 1996; Smith & Yu, 2008), continues to develop during childhood.
This developmental trend could partly also be explained by the fact that in the second trial the 5-year-old children showed a tendency to anticipate the long path. How can this phenomenon be explained? Note that we counterbalanced the position of the two paths across trials. This means that the 5-year-old children's eye movements went in the direction where in the previous trial the short path (that was taken by the agent) had been placed. When we reversed the position of the paths in the second trial, the younger children tended to anticipate that the agent would go in the same direction rather than taking the short path. This might indicate that in preschool children anticipatory eye movements are mainly driven by learning about frequent trajectories of an agent rather than learning about the frequency of particular paths. In contrast, our older participants rather relied on the frequency of particular paths. If this were true, then our results are highly informative for recent theoretical debates on how frequency learning contributes to social understanding, and how frequency learning might change in the course of development (Ruffman, 2014).
Notably, results from Experiment 2 ruled out the alternative explanation that participants were not provided with sufficient antecedent information to be able to form an expectation about the agent's action. More concretely, it demonstrated that the majority of children clearly understood the agent's desire and desired goal state.
Interestingly, in the second experiment 5-year-olds not only understood what the agent's goal was, but they also—when verbally asked to do so—explicitly predicted that the agent would choose the more efficient path. In other words, preschool children can already predict that others’ future actions follow efficiency considerations—but only in circumstances when preschoolers are verbally asked to reflect about others’ actions. This dissociation between implicit and explicit action prediction relates to research on Theory of Mind (ToM) development that demonstrats the existence of two dissociable systems to predict other people's actions based on their beliefs (e.g., Schuwerk, Vuori, & Sodian, 2014; Senju et al., 2009; Southgate, Senju, & Csibra, 2007). Notably, however, in contrast to these previous findings in the ToM-domain that claimed earlier competences in implicit measures (e.g., Southgate et al., 2007), our study did reveal an improved competence in the explicit, verbal measure.
Our findings give thus rise to the speculation that there are at least two distinct and rather poorly integrated mechanisms for action prediction: on the one hand, explicit action prediction that relies on verbal and cognitively effortful processing and, on the other hand, spontaneous action prediction that relies on associative and rather automatic processes. Our results could thus suggest that action prediction based on efficiency consideration is an effortful and cognitively demanding process that might rely on verbal reasoning, whereas statistical learning could have an impact on both systems. This dissociation is in line with two systems of social cognition and behaviour that stress the differentiation between impulsive and reflective processes (e.g., Strack & Deutsch, 2004). We have to leave it to future research to explore this possibility in greater detail.
Interestingly, our results converge with recent findings by Ambrosini et al. (2015), who assessed anticipatory gaze behaviour and explicit action predictions while participants watched videos of an agent grasping for one of two objects. This study showed that proactive visual attention allocation (a) flexibly exploits various social cues, such as gaze or hand pre-shape, (b) is distinct from explicit action prediction and (c) is modulated by frequency information. In concert with this study, our findings constitute mounting evidence that characterizes mechanisms underlying implicit and explicit action prediction.
In sum, our study suggests that 5-year-olds, 15-year-olds, and (to some extent) adults do not visually predict others’ actions based on a priori assumptions that the agent will act efficiently to reach the goal. Rather, our findings provide evidence for the impact of statistical learning on visual action prediction.
