Abstract
Findings from previous cross-sectional studies showed that while toddlers around their first birthday imitate selectively, that is, they systematically omit some kinds of target action steps or they copy only the goal, but not the means of the modeled actions, older toddlers imitate more exactly. The aim of the present article is to provide longitudinal evidence for this developmental trend and to investigate how imitation of different kinds of target action steps contributes to inter-individual differences in overall imitation performance. The present analysis of longitudinal deferred imitation data contrasted toddlers’ imitation of functional and relevant (FURE) versus arbitrary and irrelevant (ARIR) target action steps at the ages of 18 and 24 months. The results show that the difference between the imitation rates of these two kinds of target action steps decreased with age, supporting the developmental trend from selective towards more exact imitation. In addition, findings of the present analyses point to the prominent role of toddlers’ imitation of arbitrary and irrelevant target action steps in shaping inter-individual variability of overall deferred imitation performance.
Imitation is one of the central processes of early social-cognitive development, and its origins and mechanisms have been investigated widely in the past four decades. A seminal finding on newborns’ imitation of facial expressions (Meltzoff & Moore, 1977) posed the intriguing question how the visual information is translated into a motor program (correspondence problem, e.g. Heyes, 2015), which inspired a theoretical debate about whether imitation has inborn origins (Meltzoff, 2005; Meltzoff & Moore, 1997) or whether it is a result of general learning processes (Heyes, 2015; Heyes & Ray, 2000; Oostenbroek et al., 2016). The correspondence problem has less relevance for the imitation of object-directed actions, where children can see both the model’s and their own hands, and the imitated actions are built up of familiar elements applied in novel action–object relations (Csibra, 2008; Heyes, 2015). Imitation of such actions is assumed to rely on different memory and reasoning processes outside the scope of the correspondence problem (Subiaul, Anderson, Brandt, & Elkins, 2012). Different memory and reasoning processes implicated in imitation have been investigated in two lines of research: Imitation research focusing on memory processes uses deferred imitation tests to assess the number of target actions infants can retain and recall following various delay intervals, and research focusing on reasoning (action interpretation) processes usually uses immediate imitation tests to assess which parts or aspects of the target actions infants imitate under various circumstances. Findings of both of these lines of research show substantial age-related changes in imitation performance in the second year of life.
Developmental changes in toddlers’ imitation
Already at 6 months of age, infants are able to imitate object-directed actions after a delay of 24 hours (e.g., Barr, Dowden, & Hayne, 1996). At this age, the number of imitated target action steps is limited to one or two, but infants’ deferred imitation performance improves rapidly and becomes robust in the second year of life. Due to developing memory capacities, older infants are able to retain and recall target actions following fewer demonstrations (Barr et al., 1996) and longer delay intervals (Barr & Hayne, 2000; Herbert & Hayne, 2000), and to recall more target action steps than are younger infants (Kolling & Knopf, 2015). Parallel to the quantitative improvement of deferred imitation performance, a qualitative change has also been described. Younger toddlers have been repeatedly found to imitate selectively, while older toddlers imitate more exactly. For example, 12-month-olds were found to imitate only functional target actions, i.e., actions that require specific object properties, while 18-month-olds imitated also arbitrary ones, i.e., actions that could be performed with various kinds of objects (Óturai, Kolling, Rubio Hall, & Knopf, 2012). In another study, 12-month-olds only copied the goal of the target action, while 18-month-olds also copied the specific action when it was demonstrated by a model, but not in a “ghost condition” in which the objects seemed to be moved by invisible hands, and 24-month-olds copied the specific action in both conditions (Tennie, Call, & Tomasello, 2006). Bauer and Mandler (1989) also showed that the frequency of exact reproduction of a causally ordered action sequence containing an irrelevant step increases substantially from 19 to 25 months of age. At the age of 2 years, children often imitate all aspects of the model’s actions, regardless of the efficiency of these actions in obtaining the action goal (Call, Carpenter, & Tomasello, 2005; Nagell, Olguin, & Tomasello, 1993). Both of these changes, the quantitative increase in the number of imitated target action steps and the qualitative shift towards more exact imitation, i.e., imitating more different kinds of target action steps, lead to higher overall imitation rates with increasing age. Thus, investigation of different degrees of selective versus exact imitation at different ages is also relevant for memory-oriented deferred imitation research (Óturai et al., 2012). Therefore, the aim of the present study is to investigate the developmental trend from selective towards exact imitation in a longitudinal deferred imitation design.
The role of the social context
Various theories propose the emerging importance of the social context as an explanation for the developmental trend from selective towards exact imitation. Several studies have shown that if the model is acting socially, toddlers from the age of 16–18 months start to imitate unnecessary, irrelevant or ineffective action steps—the kinds that 1-year-olds usually omit from their target action reproduction—and 24-month-olds even imitate these action steps if the model is not acting socially (Brugger, Lariviere, Mumme, & Bushnell, 2007; Kotova, Yudina, & Kotov, 2014; Nielsen, 2006). Additionally, the model’s social cues held constant, 18-month-olds whose gaze patterns indicated higher levels of involvement in the interaction with the model were found to imitate more exactly than their peers who showed lower rates of involvement (Óturai, Kolling, & Knopf, 2013). According to Uzgiris (1981, see also Nielsen, 2006), a shift from cognitive to social motivations accounts for the age-related difference in imitation performance. Gergely (2003) explains the developmental trend with a change from a teleological action interpretation, in which toddlers interpret the action in terms of its goal and situational constraints, to a mentalizing interpretation, in which toddlers interpret the model’s communicative cues as an intention to teach them something relevant. Although they presume different mechanisms, both of these explanations imply that selective versus exact imitation are constrained by developmental changes. Over and Carpenter (2012) also argue for the role of the social context in imitation, however, their account implies the importance of situational rather than developmentally constrained factors, namely social and learning goals, identification with the model, and social pressure. Although they do not deny that the role of these factors might also change developmentally, they rely on studies in which these factors were experimentally manipulated, thus pointing to their situation-dependent nature (e.g., different games before the demonstration phase elicited different learning motivations).
Inter-individual differences
Both deferred imitation performance and the degree of selective versus exact imitation show high inter-individual variability in the second year of life. Inter-individual differences in deferred imitation performance were pronounced at the age of 18 months but evened out by the age of 24 months, and they were found to be related to self- and receptive language development (Kolling, Goertz, Frahsek, & Knopf, 2010). Additionally, former studies showed that as a group, 18-month-olds’ imitation is neither completely selective, nor completely exact (Nielsen, 2006; Óturai et al., 2012; Tennie et al., 2006). It has been suggested that such mixed imitation styles within the same age group can stem from substantial individual differences (Yu & Kushnir, 2015). In fact, about half of the 18-month-olds in the study by Óturai, Kolling, and Knopf (2013) imitated selectively (i.e., only functional target actions), while the other half imitated more exactly (i.e., both functional and arbitrary target actions). These two groups did not differ in their imitation rates of functional target actions, which indicates that inter-individual differences in this study resulted from toddlers’ different positions on the selective-to-exact imitation scale rather than from general imitative ability. Contrary to this, 24-month-olds have been reported to imitate exactly (Nagell et al., 1993; Nielsen, 2006; Tennie et al., 2006), which suggests that inter-individual variability of selective versus exact imitation also decreases with age.
The present study
The main goal of the present study is to provide longitudinal evidence for the developmental trend towards exact imitation between the ages of 18 and 24 months by analyzing toddlers’ imitation of functional and relevant versus arbitrary and irrelevant target action steps. Additionally, we investigated how different kinds of target action steps contribute to inter-individual differences in overall deferred imitation performance. The choice of age groups was motivated by previous findings showing that although 18-month-olds have already moved from entirely selective towards more exact imitation, their performance is still different from the exact imitation of 24-month-olds. As these findings stem from studies using only a few, similar test items (e.g., Nielsen, 2006), the question arises whether a more detailed assessment, using multi-item tests, would lead to the same conclusion.
The present study is a secondary analysis of longitudinal deferred imitation data that were collected in the Frankfurt Memory Study (Kolling & Knopf, 2015). In order to analyze selective versus exact imitation, the original imitation data were recoded according to the functionality and goal-relevance, respectively, of target action steps. Former studies have shown that toddlers’ imitation is guided by both the functionality (Óturai et al., 2012, 2013) and the goal-relevance of target action steps (Brugger et al., 2007), and that more exact imitation is characterized by the imitation of both kinds of target action steps, instead of the selective imitation of functional respectively relevant ones. Imitating only functional and relevant (FURE) target action steps can be considered selective imitation, while imitating also arbitrary and irrelevant (ARIR) target action steps can be considered more exact imitation. More specifically, we see selective versus exact imitation as the two end points of a dimension, where selective imitation means that only FURE target action steps are imitated, and exact imitation means that both FURE and ARIR target action steps are imitated, and their imitation rates do not differ significantly. With other words, FURE action steps will be imitated regardless of the degree of selective versus exact imitation, but ARIR action steps will be imitated only by children who do not imitate completely selectively. Thus, FURE action steps can differentiate among children only according to memory performance, and ARIR action steps can tell us something both about memory performance and about selective versus exact imitation.
Based on previous findings of cross-sectional studies (Bauer & Mandler, 1989; Nielsen, 2006; Tennie et al., 2006), we expected to find a developmental trend towards exact imitation between the ages of 18 and 24 months. Specifically, our hypotheses were that, first, imitation rates of ARIR target action steps would be higher at 24 months than at 18 months of age. Additionally, we expected the difference between the imitation rates of FURE versus ARIR target action steps to decrease with age (developmental trend hypothesis). Second, we expected the developmental trend to be largely consistent among toddlers, underlining the role of developmental changes in shaping different degrees of selective versus exact imitation (consistency hypothesis, see Gergely, 2003; Nielsen, 2006; Uzgiris, 1981). Third, in line with an earlier finding (Óturai et al., 2013), we expected to find higher inter-individual variability according to the imitation of ARIR target action steps than of FURE target action steps (variability hypothesis).
Method
Participants
The data analyzed in this article was collected for the Frankfurt Memory Study, a longitudinal study assessing, among others, the development of declarative memory by age-adapted deferred imitation tests (Kolling & Knopf, 2015). Participants were children from German middle-class families from a metropolitan area. Although some of the children were raised as bilinguals, all of them had German as one of their main languages. Parents of the participating children were informed about the rationale and procedure of the study, and they signed consent forms. 1 The initial sample consisted of N = 89 healthy, typically-developing children who were recruited via radio announcements and advertisements in child care centers and pediatrician’s offices. Data from four children, who did not complete the test at both measurement occasions, were excluded from the present analysis. The final sample thus consisted of N = 85 toddlers (38 girls and 47 boys), with a mean age of M = 18.1 months (SD = .25) at the first measurement occasion, and M = 24 months (SD = .29) at the second measurement occasion (interval between measurement occasions M = 5.9 months, SD = .38, min = 4.8, max = 6.9).
Material and target actions
The Frankfurt Imitation Tests for 18-month-old and 24-month-old children (FIT 18 and FIT 24) were developed in a larger longitudinal study (Frankfurt Memory Study). The FIT 18 consists of six items and a total of 12 object-directed target action steps, and the FIT 24 consists of eight items and a total of 29 object-directed target action steps (Kolling & Knopf, 2015). For the present analyses, target action steps of both deferred imitation tests were divided into two categories: The FURE (functional and relevant) category consisted of functional action steps of the FIT 18 and relevant action steps of the FIT 24. The ARIR (arbitrary and irrelevant) category consisted of arbitrary target action steps of the FIT 18 and irrelevant target action steps of the FIT 24. Target action steps of the FIT 18 are simple and independent, i.e., not constrained by overall goals. These action steps can be described in terms of functionality and divided into functional and arbitrary action steps. Functional action steps are hereby those that require specific object properties, while arbitrary action steps could be performed on a wide range of objects (Óturai et al., 2012). Contrary to this, the FIT 24 consists of longer actions that are often constrained by an overall goal. In this test, some target actions lead to a goal and others do not, and the action steps can be regarded as either relevant or irrelevant in terms of the overall goal (cf. Horner & Whiten, 2005). Relevant action steps hereby are those that are necessary to reach the goal of an action, and irrelevant action steps are either parts of actions that do not lead to a goal, or the unnecessary steps of actions that have an overall goal. Functional action steps differ from arbitrary ones in the specificity of their relations to the objects, while relevant action steps differ from irrelevant ones in their relations to the overall action goal. Nevertheless, these pairs of action steps are subject to the same predictions: functional and relevant action steps will be imitated regardless of whether toddlers imitate selectively or exactly, but arbitrary and irrelevant action steps will be imitated only when toddlers imitate exactly. Thus, the analyses will not involve this nuanced distinction, but they will be based on the composite categories FURE and ARIR. Target objects and action steps are presented in Table 1 (FIT 18) and Table 2 (FIT 24).
Objects and target actions of the FIT 18.
Note. aThis action step was excluded from the present analyses, because it is a necessary part of the second action step and thus in itself neither functional, nor arbitrary.
Objects and target actions of the FIT 24.
Note. aThis action step was excluded from the original analysis because the mannequin often fell out of the box when the box was opened (Kolling & Knopf, 2015).
bThe items Rabbit and Magnetic plate included two distractor objects each. Thus, not only did the actions not lead to a goal, the objects to be attached to the rabbit and the plate, respectively, were also chosen without any apparent reason.
Procedure
Toddlers were tested individually in a small room, where they were seated on their caregivers’ lap at a table, opposite the experimenter. In both tests, the experimenter presented the target actions three times in a social-communicative context, making eye contact with toddlers and saying “Look, [name]! I am going to show you something.” Following a delay of 30 minutes, the model handed the target objects to toddlers in the same order as they were shown in the demonstration phase, and she encouraged them to play. Both sessions were videotaped for subsequent coding of toddlers’ target action performance.
Data coding and analysis
Independent observers coded toddlers’ target action performance from the videotapes according to pre-defined operational definitions (yes/no decision for each target action step). Each videotape was coded by two observers, and all pairs of observers reached good inter-rater reliability (smallest κ = .87, Goertz, Kolling, Frahsek, & Knopf, 2008). Toddlers’ target action performance both at 18 and at 24 months of age was significantly above spontaneous target action performance of baseline control groups, thus target action performance in the longitudinal study can be interpreted as deferred imitation performance (Kolling & Knopf, 2015, Figure 1). For the purposes of the present analyses, each target action step was assigned to the FURE (functional and relevant) or the ARIR (arbitrary and irrelevant) category in agreement by the authors, based on theoretical considerations as described above.

Mean imitation rates of arbitrary / irrelevant and functional / relevant target action steps in the FIT 18 and the FIT 14 (opt.: 100%, N = 85). Error bars indicate standard deviations.
The present analyses were based on four main variables: number of imitated FURE action steps in the FIT 18 (5 action steps, Cronbach’s α = .12), number of imitated ARIR action steps in the FIT 18 (6 action steps, Cronbach’s α = .49), number of imitated FURE action steps in the FIT 24 (12 action steps, Cronbach’s α = .62), and number of imitated ARIR action steps in the FIT 24 (16 action steps, Cronbach’s α = .54). As the number of target action steps differed across both action step kinds and tests, percentages instead of raw sum scores were used as dependent variables, whereby the maximum value on each variable was 100%, which corresponded to the number of target action steps of a given kind (FURE or ARIR) modelled in a given test (FIT 18 or FIT 24). For the sake of easier readability, we will refer to this proportional imitation rate as imitation rate throughout the remaining parts of the article.
Because the data were not normally distributed, we used a nonparametric approach throughout the analyses. The effect of gender on the dependent variables was preliminarily analyzed by Mann-Whitney tests. Then, the differences between ARIR imitation rates as well as between the FURE minus ARIR difference scores at 18 and 24 months of age were compared by Wilcoxon signed rank tests to test the developmental trend hypothesis. The consistency of the developmental trend was computed based on the difference between the FURE minus ARIR difference score at 18 months and the same difference score at 24 months (a smaller FURE minus ARIR difference score at 24 than at 18 months is consistent with the developmental trend towards exact imitation). Finally, the variability hypothesis was tested by comparing the distances from the mean of the FURE and ARIR imitation scores at the same measurement occasions by Wilcoxon signed rank tests. Missing values were replaced by the item means.
Results
Preliminary analysis
A series of Mann-Whitney tests showed that toddlers’ gender did not have an effect on their imitation of FURE (functional and relevant) and ARIR (arbitrary and irrelevant) target action steps (smallest p = .179). Thus, gender will not be considered in further analyses. Descriptive statistics of imitation rates of ARIR and FURE target action steps at 18 and 24 months of age are shown in Table 3.
Means and standard deviations (percentages), raw means and standard deviations, optimums of raw scores, and relative standard deviations of the imitation rates of arbitrary / irrelevant and functional / relevant target action steps at 18 and 24 months of age.
Note. Raw imitation scores are the sums of target action steps within a given category performed by toddlers in the test phase at a given age (e.g., the number of arbitrary / irrelevant target action steps performed at 18 months). The percentage scores indicate the imitation scores as the percentage of the optimum raw scores (e.g., a percentage score of 50 in case of the arbitrary / irrelevant target action steps at 24 months corresponds to 50% of the optimum raw score, which is 8). N = 85.
Developmental trend
Imitation rates of FURE and ARIR target action steps at the two measurement occasions are shown in Figure 1. Wilcoxon signed rank tests were used to compare the imitation rates of ARIR target action steps, as well as the FURE minus ARIR difference scores at the two measurement occasions. Both differences were significant, showing that toddlers imitated more ARIR target action steps at 24 than at 18 months of age (Z = 6.40, p < .001, r = .69), and that the difference between FURE and ARIR imitation rates was smaller at 24 than at 18 months of age (Z = 6.94, p < .001, r = .75).
Consistency of the developmental trend
To analyze the consistency of the developmental trend, a new variable was computed by subtracting the FURE minus ARIR difference score at 24 months from the FURE minus ARIR difference score at 18 months. Positive values of this variable indicate a smaller difference between action step kinds and thus more exact imitation at the age of 24 months than at the age of 18 months, while negative values indicate less exact imitation at 24 months than at 18 months. Descriptive data show that out of the 85 toddlers, 73 had a positive value, 11 had a negative value, and one toddler had zero difference (M = 29.46, SD = 27.01, min = −35, max = 94).
Inter-individual variability
Descriptive statistics in Table 3 show that imitation rates of ARIR target action steps at 18 months had by far the largest dispersion, with the relative standard deviation being more than twice as large as the relative standard deviations of imitation rates of FURE target action steps at 18 months or both FURE and ARIR target action steps at 24 months. Wilcoxon signed rank tests on the distances from the mean confirmed that at 18 months, ARIR imitation scores showed a larger variability than FURE imitation scores, Z = 2.33, p = .020, r = .25. At 24 months, the difference was not significant, Z = 1.17, p = .243, r = .13.
Discussion
The present study investigated toddlers’ imitation of FURE and ARIR target action steps at 18 and 24 months of age by analyzing data from a longitudinal deferred imitation study. The main aim of the study was to provide longitudinal evidence for the developmental trend from selective towards exact imitation. We expected to find more exact imitation (i.e., higher imitation rates of ARIR action steps, as well as a smaller FURE minus ARIR difference) at 24 than at 18 months of age (developmental trend hypothesis), and that this difference would be fairly consistent among toddlers (consistency hypothesis). Additionally, we expected the variability of imitation rates of ARIR target action steps to be higher than the variability of imitation rates of FURE target action steps (variability hypothesis). Overall, the results were in line with these hypotheses.
First, the imitation rate of ARIR target action steps increased with age, and the difference between imitation rates of the two kinds of target action steps decreased, supporting the developmental trend hypothesis. This shows that as a group, toddlers imitated more exactly at the age of 24 months than at the age of 18 months, corroborating earlier findings on the developmental trend towards more exact imitation from cross-sectional studies (Bauer & Mandler, 1989; Nielsen, 2006; Tennie et al., 2006). Additionally, this trend was strongly consistent, with 86% of toddlers imitating more exactly at 24 than at 18 months. The strength of the longitudinal design is that it provides more direct evidence on the developmental trend than cross-sectional studies do; especially the high consistency of the finding shows that the majority of toddlers imitate more exactly as they get older. Nevertheless, the fact that the difference between the two imitation rates did not disappear at the second measurement occasion suggests that when presented with different actions on a number of different objects, even 24-month-olds imitate selectively to some extent despite the model’s sociability. Thus, while in studies involving a small variety of target actions selective versus exact imitation might seem like an either-or question, our findings support the idea that it is rather a dimension. “In-between” imitation styles are not only observed on a group level (e.g., 18-month-olds in the study by Nielsen, 2006), but also on an individual level in that 18- and 24-month-olds imitate both kinds, but still more FURE than ARIR target action steps.
Second, at 18 months, imitation rates of ARIR target action steps showed greater variability than imitation rates of FURE target action steps, partially confirming the variability hypothesis. This is in line with a previous finding showing that inter-individual differences at 18 months of age stem from the imitation of arbitrary, but not functional, target actions (Óturai et al., 2013). These findings are consistent with the idea that imitation of FURE action steps is influenced by memory processes, and imitation of ARIR action steps is influenced by both memory processes and the degree of selective versus exact imitation.
To our knowledge, these findings provide the first piece of longitudinal evidence for the developmental trend from selective towards exact imitation in the second half of the second year of life. Additionally, they show that inter-individual variability of overall imitation performance is differently shaped by different kinds of target action steps and toddlers’ selective versus exact imitation. Developing memory capabilities enable toddlers to retain and recall larger numbers of target action steps as they get older. At the same time, a change in their action interpretation schemes (Gergely, 2003) or predominant motivations (Nielsen, 2006; Uzgiris, 1981) results in a qualitative shift in imitation performance. Older toddlers do not only imitate more target action steps than younger ones, but they also imitate the kinds of target action steps that younger toddlers do not—action steps that only become meaningful in the social context of the imitation task. Both proposed mechanisms, action interpretations and motivations, predict the same behavioral findings, namely more exact imitation due to the enhanced role of the social context in older than in younger toddlers. Our data do not allow a distinction between these two possibilities, but they strengthen the position that selective versus exact imitation are constrained by developmental changes rather than situational factors such as, for example, social pressure (Over & Carpenter, 2012). Nevertheless, this conclusion has to be taken with caution, as the longitudinal design of the present study bears the possibility that repeated testing might have influenced toddlers’ behavior. Although the procedure and the model’s communicative cues were identical in the FIT 18 and the FIT 24, we cannot rule out the possibility that children had different understandings of what the model wanted them to do, possibly due to more test experience at 24 than at 18 months. Future research should compare findings from longitudinal and cross-sectional studies using the same imitation tests to disentangle developmentally constrained changes in imitation behaviour from testing effects.
Despite the strengths of a large sample size and the longitudinal design, the present study also has some limitations. Although the FITs are reliable and standardized tests of deferred imitation memory performance (Kolling & Knopf, 2015), the focus of the present article on different kinds of target actions and selective vs. exact imitation is new and was not considered during test development. Thus, the number of FURE and ARIR target action steps was not controlled, which made it necessary to use percentage scores instead of raw imitation scores as dependent variables, leading to some loss of information. In addition, while in the present analyses target action steps were categorized as FURE or ARIR based on theoretical considerations, in future studies it might be worth to assess the inter-rater reliability of this categorization in order to extend the strict psychometric perspective of the FITs as memory tests to the concurrent assessment of selective vs. exact imitation. Additionally, because of the near-ceiling performance on some FURE action steps in the FIT 18, we cannot exclude the possibility that the prominent role of ARIR target action steps in the inter-individual variability of imitation performance relied on characteristics of the test instead of theoretical constructs. However, as ARIR target action steps are supposed to be more difficult according to theories of selective and exact imitation, this finding reinforces the idea that characteristics of target action steps and how they contribute to overall (deferred) imitation performance at different ages should be taken into consideration in future test development.
Toddlers’ developing imitation performance is a result of different underlying changes, such as improving memory capabilities and changes in action interpretations or motivations. Findings of the present study strengthen the position that different degrees of selective versus exact imitation are constrained developmentally rather than by situational factors, and that this developmental trend interacts with memory development in shaping toddlers’ deferred imitation performance. Future research should extend the strong psychometric approach of deferred imitation memory tests to an assessment of selective vs. exact imitation processes in order to further clarify how these different processes contribute to toddlers’ deferred imitation performance.
Footnotes
Note
Acknowledgements
We would like to thank all participating families, especially the children, for making this work possible.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation [KN 275/3 -1].
