Abstract
The present article describes a new Persian elicited imitation test (EIT) developed for assessing the overall language proficiency of Persian-speaking preschoolers. It reports a study that investigated the power of the EIT in discriminating children’s linguistic abilities through having them repeat sentences of varying lengths and morphosyntactic complexity. The study also explored the relationship between children’s performance on the EIT and their free speech. A total of 119 three- to- six-year-old Iranian monolingual children participated in the study by completing the EIT and an oral narrative task. Results showed that the new EIT can discriminate children with different levels of language abilities. Moreover, positive correlations were found between children’s scores on the EIT and their performance on the oral narrative task. Results suggest that the EIT provides a reliable measure of overall language development, and it can be effectively used to evaluate children’s language proficiency in various contexts.
Keywords
Introduction
During the last few decades there has been a resurgent interest in the use of elicited imitation tests (EITs) for a variety of purposes including the assessment of language development. For example, in developmental and experimental studies, EITs are used by researchers as useful instruments for investigating the early stages of children’s language development (Vinther, 2002). In educational settings, they are used by language teachers to evaluate their students’ language skills. Additionally, in clinical contexts EITs are often used for identifying children with language problems and for determining the nature and severity of language disorders (Dockrell, 2001).
Some researchers believe that EITs are the best type of measures for identifying children with specific language impairments (Klem et al., 2015). Perhaps, part of this popularity is due to the inadequacies associated with other language assessment tools. For instance, to evaluate children’s spontaneous language production, researchers or practitioners need to record and transcribe a substantial number of utterances, which are often time consuming and labor intensive. In addition, analyzing children’s free speech is extremely difficult and poses various technical problems such as coding decisions, finding coherent criteria for speech segmentation, and identification of utterance boundaries (Devescovi & Caselli, 2007).
The widespread use of EITs is also because of their high sensitivity to residual language processing weaknesses which may not be detected through other receptive and productive language tasks (Conti-Ramsden, Botting, & Faragher, 2001; Klem et al., 2015). Some researchers argue that verbal imitation tasks can provide accurate accounts of children’s knowledge of language, as they reflect the degree to which children are able to assimilate the target structures into their internalized language ability (e.g. Gallimore & Tharp, 1981; Munnich, Flynn, & Martohardjono, 1994).
The purpose of the present study was to design an EIT in order to evaluate three- to six-year-old children’s language abilities. The utility of the test was evaluated by focusing on key developments in children’s linguistic abilities to repeat sentences of different length and morphosyntactic complexity. In addition, children’s performance on the EIT was compared against their performance on an oral narrative task to ascertain the validity of the test. In this study, only quantitative analyses are conducted on children’s language abilities in order to provide a test that can be quickly administered and scored.
Review of the literature
Theoretical background
An EIT presents participants with a number of aural stimuli that they must repeat as exactly as possible. Normally, as the test progresses, the length of the stimuli increases and more complicated morphosyntactic structures are used (Kim, Tracy-Ventura, & Jung, 2016). The theoretical rationale behind EITs, as delineated by Slobin and Welsh (1968/1973), is that they measure the global linguistic performance of test takers, and in order to be able to repeat the aural stimuli, test takers need to “comprehend and decode the sentence, recall and reconstruct it with their own grammar” (Wu & Ortega, 2013, p. 683). In other words, to imitate the test sentences accurately, participants need to receptively process the stimuli for meaning and productively reconstruct them using their phonetic, semantic, and syntactic knowledge of language (Kim et al., 2016). On this basis, the utterance elicited through an EIT can reflect the degree to which a test taker is able to assimilate the stimulus into his or her language competence (Munnich et al., 1994).
Theoretically, the use of EIT as a language assessment tool is supported by the regeneration hypothesis (Lombardi & Potter, 1992; Potter & Lombardi, 1990). According to this hypothesis, the process of repeating a sentence starts from a conceptual (meaning based) representation of the sentence to be recalled and essentially involves all levels of the language production system (Bock & Levelt, 1994). When prompted with a verbal imitation task, children have to draw on multiple language skills to be able to respond correctly. They first need to attend the three main components of the sentence—phonetics, semantics, and syntax—to be able to decode the sentence from its linguistic form in order to understand what is meant by the sentence. Then, they have to draw on many component skills including encoding of morphological segments and syntactic structures, articulatory planning, and speech production in order to repeat the sentence (Hagoort & Levelt, 2009; Mehrani, 2011; Mehrani & Peterson, 2017; Moll, Hulme, Nag, & Snowling, 2015). This is consistent with various definitions of language proficiency which includes control of phonemic, vocabulary, and grammatical elements (Bachman, 1990). In the present study, language proficiency is considered as individual’s ability to listen to verbal stimuli of varying lengths, understand their content, and accurately repeat them, using the same grammatical structures.
Some researchers argue that the way children repeat a sentence and the changes that they apply to the original model can provide useful clues about how they process the sentence (e.g. Devescovi & Caselli, 2007). For example, a child may modify the grammatical structure of a sentence, but keeps the message of the sentence, thus showing that she or he comprehended it (Slobin & Welsh, 1968/1973). Another test taker may leave out particular grammatical structures, displaying certain parsing problems. Others might be able to follow the grammatical word order of a sentence, but display certain phonological deficiencies. On this basis, various deficits in sentence repetition are considered a hallmark of specific language impairment (Conti-Ramsden et al., 2001).
Empirical investigations
There is a mounting literature in support of Slobin and Welsh’s (1968/1973) theoretical account and the view that sentence imitation is reconstructive and dependent on broad language skills (Klem et al., 2015). For instance, Moll et al. (2015) compared the performances of typically developing children and dyslexic children on a sentence repetition task and found that dyslexic children performed more poorly. More recently, Polisenska, Chiat, and Roy (2015) investigated the effects of different types of long-term linguistic knowledge (including morphosyntax, lexical phonology, semantics, and prosody) on English- and Czech-speaking children’s immediate recall. They found significant effects for all linguistic factors in both languages. Similarly, Moll et al. (2015) demonstrated that various aspects of language competence—morphological, grammatical, and phonological processes—are engaged in repeating sentences. Other empirical investigations that provide further evidence in support of this claim include studies that have shown that test takers experience greater difficulties when they are asked to repeat unfamiliar words or phrases (e.g. Gathercole & Baddeley, 1993). Ellis (2001) explains that this is because test takers have not had the opportunity to build up long-term memory representations of those words. He adds that this is why EITs serve so well as measures of language competence.
The potential of EITs as measures of language proficiency is also reported in second language acquisition research (see Jessop, Suzuki, & Tomita, 2007; Vinther, 2002). Ortega, Iwashita, Norris, and Rabie (2002) designed and validated versions of a sentence imitation test in four languages (English, German, Japanese, and Spanish) for cross-linguistic studies investigating syntactic complexity measures and their relationship to general language proficiency. Additional versions of the same test were also developed in Mandarin Chinese (Wu & Ortega, 2013), French (Tracy-Ventura, McManus, Norris, & Ortega, 2014), and Korean (Kim et al., 2016). Results are suggestive that these EITs can be used as reliable and valid tools to measure overall linguistic proficiency of language learners.
Some researchers have conducted comparative studies to see if children’s performance on EITs is related to their performance on spontaneous speech tests. Particularly, two studies (Gallimore & Tharp, 1981; Hood & Schieffelin, 1978) compared spontaneous and imitative performances of young children and found that children’s EIT scores and spontaneous speech test scores are related to their language behavior in natural settings. Similarly, Corrigan and Di Paul (1982) reported that sentence repetition tests are most useful in eliciting those relations that tended to be infrequent in spontaneous speech. Overall, these studies suggest that there is a high correlation between children’s scores on well-designed EITs and their scores on spontaneous language tests.
Working memory and EIT performance
Concerns of validity and reliability of EITs were addressed recently in a narrative review and meta-analysis by Yan, Maeda, Lv, and Ginther (2016). These authors investigated the construct validity of EITs used in 76 studies to measure global language proficiency or certain aspects of language proficiency. Their findings showed that EITs scores can discriminate between individuals with different proficiency levels, suggesting that these tests can be effectively used as measures of language proficiency. However, the literature on EITs as a measure of overall language proficiency is replete with numerous references to the role of working memory in repeating test items. This issue raises concerns about rote repetition which can be a threat to the validity of EITs (Kim et al., 2016). For example, some researchers have speculated that it might be possible that a testee repeats a sentence without understanding it. In particular, they have argued that test takers’ performance on EITs might be mediated by their working memory span, and thus, such tests cannot adequately measure individuals’ language proficiency.
To address this issue, some researchers have attempted to come up with a “magical number” (Vinther, 2002, p. 59) that exceeds working memory span and thus ensures that the stimulus is linguistically processed before it can be imitated. Although some researchers have asserted that adults are able to keep five to seven linguistic chunks in their working memory (e.g. Miller, 1956; Simon, 1974), others have recommended that test designers include stimuli of various lengths in EITs to address the rote repetition criticism (Kim et al., 2016). On the other hand, there are researchers who contend that an individual’s working memory span is not determined by a fixed number of items or chunks (Schweickert & Boruff, 1986), but by his stored knowledge of the language (Baddeley, Gathercole, & Papagno, 1998). This conclusion is supported by empirical investigations into the relationship between working memory and verbal imitation tasks. For example, Scott (1994) prompted English–Spanish bilingual and English monolingual participants with a Spanish elicited imitation task. He reported that bilingual participants, who were able to use meaning to aid retention, performed significantly better than their monolingual counterparts. Sachs (1967) and Potter and Lombardi (1990) also conducted similar studies and concluded that memory for sentences is meaning-based rather than form based.
Additional evidence for the minimal role of working memory in sentence repetition tasks comes from studies that included both correct and incorrect sentences in their test design. In these studies (e. g. Erlam, 2006; Hamayan, Saegert, & Larudee, 1977; Markman, Spilka, & Tucker, 1975; Munnich et al., 1994), participants were presented with both grammatical and ungrammatical sentences and they were simply asked to repeat the sentences. The results showed that a significant number of participants spontaneously corrected the ungrammatical sentences without being asked to. The findings of these studies suggest that sentences in EITs are not blindly repeated, but rather they are reconstructed and conceptually processed.
Another line of research that has attempted to address the potential role of working memory in test takers’ performance has focused on the insertion of a pause after the stimulus and before the response (Vinther, 2002). For example, Schweickert and Boruff (1986) pointed out that the capacity of short-term memory is determined by the limited time for which the verbal stimulus endures. Cowan et al. (1992) found that adult language users can “remember as many words as they can pronounce in about 1.5 to 2.0 seconds” (p. 15). Similarly, McDade, Simpson, and Lamb (1982) reported that participants could repeat sentences that they did not understand only when imitation was immediate, but they failed to do so after a 3.0-second pause. More recent studies (e.g. Bowden, 2016; Kim et al., 2016; Ortega et al., 2002) also suggest that the insertion of a short pause (i.e. 2.5 seconds) after each stimulus and before the cue is an effective strategy for minimizing the potential role of working memory in sentence repetition.
A further design element that is used in EITs to diminish the potential effect of working memory is the insertion of a pictorial aid after each test item. Devescovi and Caselli (2007) designed a 51-item EIT for measuring two- to four-year-old Italian children. Each item was accompanied with a picture reproducing its global meaning. These researchers concluded that the use of accompanying pictures in EITs maximize the likelihood that participants focus on meaning rather than on the form of the sentences that they hear.
To summarize, the literature suggests that well-designed EITs can be effectively used as a measure of overall language proficiency. To repeat a sentence, participants go through cognitive processes including comprehending the stimulus, reconstructing it with their own internalized knowledge of the language and reproducing it. Compelling research has shown that the capacity of working memory is determined by language users’ internalized knowledge of the language. Therefore, it is assumed that working memory does not influence participants’ responses to EIT items. Yet, the literature suggests effective strategies to ensure that performance on verbal repetition tasks is reconstructive and not rote repetition. These strategies include designing stimuli of varying lengths, inserting a short pause after hearing stimuli and before the response, and using accompanying pictures to reduce participants’ attention to form.
Motivation for the current study
The present study was conducted to design an oral elicited imitation measure in order to evaluate the language development of three- to six-year-old Persian-speaking children. The study specifically focused on Persian-speaking children because investigations show that there are psychometric inadequacies in the existing Persian language assessment tools (Farhady & Tavasoli, 2013). For example, some of the existing measures developed by the National Organization of Educational Testing are often considered unreliable and invalid. These tests have never been pretested, and unfortunately, no written report on the psychometric characteristics of these tests is available to independent researchers (Farhady & Hedayati, 2009). Some other existing tests (e.g. Persian Child Language Assessment Batteries) are costly and lengthy and thus are rarely used by clinicians or psycholinguist researchers (Hasanpoor, Jalilevand, Masumi, Ghorbani, & Kamali, 2015). The addition of a Persian EIT will, therefore, be an asset to various fields of study in Persian, including speech therapy, linguistics, and first- and second-language acquisition. The choice of age range of the participants in this study was motivated both by the empirical investigations that emphasize the importance of early identification of children with language impairment and also by the theoretical studies that suggest the development of children’s linguistic competence for producing multiple-word utterances occurs between three and six years (Owens, 2016).
The following research questions were raised to broadly guide the process of the study.
To what extent do Persian EIT scores discriminate three- to six-year-old children with different language abilities? What is the relationship between participants’ performance on the Persian EIT and their performance on an oral narrative task?
Method
Participants
A total of 119 Iranian children in four age-groups participated in this study. There were 26 three-year-olds (12 female and 14 male, age range = 36–47 months, M = 42.1 months, and SD = 4.1); 32 four-year-olds (16 female and 16 male, age range = 48–60 months, M = 51.1 months, and SD = 3.1); 29 five-year-olds (18 female and 11 male, age range = 62–71 months, M = 66.5 months, and SD = 2.9); and 32 six-year-olds (20 female and 12 male, age range = 72–83 months, M = 77.4 months, SD = 3.8). The participants were all typically growing monolingual Persian speakers and they were recruited from two kindergartens in Neyshabur, Iran. The sample was typical of a middle socioeconomic class as shown by the parents’ occupations and levels of education. A total of six other children also participated in this study, but they were excluded from further analysis because they refused to finish the tasks.
Instruments and procedure
Persian EIT
Children’s language abilities across age and the characteristics of test items.
In order to detect any ambiguity in the items and to ensure that children are able to correctly repeat them, the test was used in a small-scale pilot study with 16 preschoolers who were three- to six-year-olds. As a result of the pilot study, a few items were modified. Having ensured the precision and clarity of the items, the final version of the test was used to collect data (see the Appendix).
Description of each scoring category with participants’ responses to item 32.
To establish scoring reliability, the performance of each participant was independently scored by a second coder and was then compared against the results of scoring by the researcher. A two-way random intraclass correlation was run to estimate the reliability of scoring procedure. Results showed that the reliability between two raters is 0.85, with 95% CI (0.78, 0.89), which is quite wide. This shows that 85% of variance in the mean of these raters is real and there is a high level of agreement between two raters. Cases of disagreement were resolved by discussion.
Oral narrative task
The second instrument used in this study was an oral narrative task, specifically, a wordless pictorial story consisting of 10 linked illustrations that depicted a simple story of a dog chasing a cat. Children who participated in the EIT were instructed to look at the pictures for two minutes and prepare to describe what they saw. While describing the depicted story, they were allowed to look at the pictures and were asked to provide as many details as they could. Each child was individually tested and their performance was audio recorded. Then, the mean length of utterances (MLU) in morphemes was calculated for each to evaluate their morphosyntactic abilities. In doing so, each child’s performance on the oral narrative task, specifically the first 50 utterances, was transcribed. The number of morphological elements in each child’s utterances was summed and the total was divided by the total number of utterances (i.e. 50). Thus, the obtained MLU was a ratio between the number of individual morphemes and the number of utterances. A research assistant was then asked to double check the scoring procedure to assess the reliability of analysis. Two-way random intraclass correlation showed that there was .75, with 95% CI (0.74, 0.79) implying an ideal amount of reliability in coding procedure. Cases of discrepancy were resolved by discussion.
Results
Children’s mean scores and standard deviations on the EIT and oral narrative task at different age levels.
Analysis of children’s performance on the oral narrative test also showed that their MLU increased with age. As shown in Table 3, the mean of three-year-old children’s average length of utterances was 3.88 and children’s scores increased with age. To investigate the effects of age on children’s MLU scores, a univariate ANOVA was performed. A significant effect of age appeared on children’s MLU scores, F(3, 115) = 80.40, p < .001, η2 = .24. Post hoc analysis using the Tukey’s procedure revealed that children’s MLU mean scores in each age-group were significantly different from each of the other groups’ mean scores, p < .001.
The relationship between children’s performance on the EIT and the oral narrative task was examined to test the concurrent validity of the newly developed EIT. Pearson correlation coefficient was calculated and a significant positive correlation between the measures was found, (r = 0.679, p < 0.01). Taken together, the findings suggest that the newly developed EIT can measure children’s language proficiency and effectively discriminate among children with various levels of language competence.
Item statistics and measure order.
P.T. Measure: point-biserial and point-measure correlations
Model S.E. is the standard error of the estimate.
Measure: item difficulty.
As shown in Table 4, our analysis of item difficulty also showed that items 10, 21, and 30 had, respectively, higher logit than items 12, 22, and 31. On this basis, some grading modifications should be applied to test items so as to ensure that each item is more difficult than the preceding and easier than the succeeding item.
DIF-flagged items.
Discussion
Drawing on the previous studies investigating the effectiveness of EITs as valid and reliable measures of children’s overall language proficiency, the goal of this study was to develop an EIT and to evaluate its validity by comparing children’s performance against their performance on an oral narrative task. In general, the findings suggest that the newly developed Persian EIT reported here is an ecologically valid and reliable measure for effectively evaluating three- to six-year-old Persian-speaking children’s language abilities.
First, the analysis of the internal consistency of the test items showed a high level of reliability (i.e. 85% inter-rater agreement). This satisfactory level of reliability seems to stem from the employment of a well-designed, objective 5-point scoring rubric (developed by Ortega et al., 2002), and 40 test items covering a wide range of morphosyntactic difficulty. Second, children’s performance demonstrated that they could easily comprehend and follow the test instructions. The findings showed that test items were appropriately graded, matching children’s language abilities in each age-group. That is, in each age level, children’s responses to the test questions were mostly plausible repetitions of the target sentences, with few deviations. And younger children generally displayed a lack of linguistic competence to perform well when prompted with more complicated questions that were designed to capture older children’s language abilities. Furthermore, the findings of the study demonstrated the concurrent validity of the instrument by a strong positive correlation between the EIT scores and oral narrative task scores.
Examining the relationship between children’s scores on the EIT and on the oral narrative test was one of the objectives of the current study. Considering that some researchers have doubted the potential of EITs in effectively assessing individuals’ language knowledge (e.g. Bley-Vroman & Chaudron, 1994; McNamara, 1996), the results of this study lend support for the use of this particular EIT as an effective instrument for evaluating the development of children’s language abilities. Our findings suggest that a well-developed EIT can discriminate children with different language abilities.
Particular features of the design of the EIT used in this study were the insertion of a 3-second pause after each stimulus and before each response and the use of accompanying pictures. In addition, stimuli of varying lengths were used in order to ascertain that there are enough prompts for children with various levels of language proficiency. Recent studies show that sentence length is the most important predictor of item difficulty in EITs (Kim et al., 2016); however, the literature does not suggest any specific number of syllables or words for each test item (Vinther, 2002). In the current study, attempts were made to include sentence items of a range of 6 to 31 morphemes. This wide range of item length contributed to the discriminatory power of the test and it corroborated Yan et al.’s (2016) conclusion that EITs “with varied sentence length will more likely match the ability of speakers with different proficiency levels” (p. 26).
In designing the test, various measures were taken to minimize the potential effect of children’s working memory span on their responses to test items. As discussed above, compelling research has demonstrated that understanding the meaning of the items, focusing on accompanying pictorial aids, and a short pause between each stimulus and response would make it extremely difficult, if not impossible, for individuals to simultaneously memorize the form of the sentences and then repeat them through rote memorization. In addition, research has shown that the capacity of working memory in repeating verbal stimuli is determined by the language knowledge that is already constructed (Erlam, 2006). Thus, even if one speculates that the participants’ performance on the EIT might have been mediated through rote memorization, it is evident that the participants who “had the ability to memorize stimuli were indeed those who had internalized language, and, therefore, their superior performance on the test was an indication of this” (p. 486). Thus, the findings presented here provide suggestive evidence that the Persian EIT assesses children’s internalized knowledge of language. However, one suggestion for future studies is to directly tackle the issue of working memory by investigating the relationship between participants’ memory span and their performance on EITs.
The analysis of the results at item level demonstrated that the test is suitable and can discriminate well in the age range considered. We found that children’s responses to the items were mostly repetitions of the target sentences, with very few insertions of spontaneous language, such as making comments, asking questions, and spontaneous picture descriptions. However, our findings suggest that some minor modifications should be applied to the test. For instance, we could identify two malfunctioning items (i.e. items 2 and 17) that need to be modified. In addition, we found evidence that some of the items (i.e. items 10, 21, and 30) should be rearranged. Accordingly, we plan to modify the test and use a similar research design adopted in this study with a larger population. Although the results of this study seem to suggest that memory span was not related to children’s performance on the test, we plan to control for children’s verbal memory in our future study.
The following limitations to this research also need to be acknowledged. One is that the present exploratory study was exclusively conducted on typically developing children with no particular language impairment. Additional investigations on atypical children are required in order to examine the relationships between their language impairment and performance on EITs. In addition, in the present study, children’s performance was evaluated only by following the 5-point scoring scale developed by Ortega et al. (2002). Future researchers can use other scoring alternatives such as automated scoring or binary scoring. Studies can effectively examine whether different scoring options can contribute to the discriminatory power of the test.
In sum, despite these limitations, the results presented here demonstrate that this particular EIT can be used as a reliable and valid instrument for measuring Persian children’s language proficiency. The findings showed that the test can evaluate a wide range of grammatical structures at once, without being concerned with various interfering issues such as contextual variations that are associated with other verbal ability tests such as open-ended discussions and narratives.
Footnotes
Appendix
Example items from the Persian EIT.
| English translation | Persian wording | Item no. |
|---|---|---|
| Arash is cleaning his clothes. | 4 | |
| Mina put the ball on the table. | 7 | |
| The red cab is bigger than the yellow cab. | 14 | |
| Sasan can’t ride a bicycle yet. | 19 | |
| Ali asked Hamid, what time is it? | 26 | |
| Saeed is too young to drive a car. | 30 | |
| Elham was sick, so she could not participate in her friend’s birthday party. | 34 | |
| Kiyan gave two oranges to Samira, and got three apples instead. | 37 |

Tomorrow morning, Amir and his friends are going to buy a flower for their teacher.