An elicited imitation test for measuring preschoolers’ language development

Abstract

The present article describes a new Persian elicited imitation test (EIT) developed for assessing the overall language proficiency of Persian-speaking preschoolers. It reports a study that investigated the power of the EIT in discriminating children’s linguistic abilities through having them repeat sentences of varying lengths and morphosyntactic complexity. The study also explored the relationship between children’s performance on the EIT and their free speech. A total of 119 three- to- six-year-old Iranian monolingual children participated in the study by completing the EIT and an oral narrative task. Results showed that the new EIT can discriminate children with different levels of language abilities. Moreover, positive correlations were found between children’s scores on the EIT and their performance on the oral narrative task. Results suggest that the EIT provides a reliable measure of overall language development, and it can be effectively used to evaluate children’s language proficiency in various contexts.

Keywords

Elicited imitation test language assessment morphosyntactic abilities overall proficiency Persian

Introduction

During the last few decades there has been a resurgent interest in the use of elicited imitation tests (EITs) for a variety of purposes including the assessment of language development. For example, in developmental and experimental studies, EITs are used by researchers as useful instruments for investigating the early stages of children’s language development (Vinther, 2002). In educational settings, they are used by language teachers to evaluate their students’ language skills. Additionally, in clinical contexts EITs are often used for identifying children with language problems and for determining the nature and severity of language disorders (Dockrell, 2001).

Some researchers believe that EITs are the best type of measures for identifying children with specific language impairments (Klem et al., 2015). Perhaps, part of this popularity is due to the inadequacies associated with other language assessment tools. For instance, to evaluate children’s spontaneous language production, researchers or practitioners need to record and transcribe a substantial number of utterances, which are often time consuming and labor intensive. In addition, analyzing children’s free speech is extremely difficult and poses various technical problems such as coding decisions, finding coherent criteria for speech segmentation, and identification of utterance boundaries (Devescovi & Caselli, 2007).

The widespread use of EITs is also because of their high sensitivity to residual language processing weaknesses which may not be detected through other receptive and productive language tasks (Conti-Ramsden, Botting, & Faragher, 2001; Klem et al., 2015). Some researchers argue that verbal imitation tasks can provide accurate accounts of children’s knowledge of language, as they reflect the degree to which children are able to assimilate the target structures into their internalized language ability (e.g. Gallimore & Tharp, 1981; Munnich, Flynn, & Martohardjono, 1994).

The purpose of the present study was to design an EIT in order to evaluate three- to six-year-old children’s language abilities. The utility of the test was evaluated by focusing on key developments in children’s linguistic abilities to repeat sentences of different length and morphosyntactic complexity. In addition, children’s performance on the EIT was compared against their performance on an oral narrative task to ascertain the validity of the test. In this study, only quantitative analyses are conducted on children’s language abilities in order to provide a test that can be quickly administered and scored.

Review of the literature

Theoretical background

An EIT presents participants with a number of aural stimuli that they must repeat as exactly as possible. Normally, as the test progresses, the length of the stimuli increases and more complicated morphosyntactic structures are used (Kim, Tracy-Ventura, & Jung, 2016). The theoretical rationale behind EITs, as delineated by Slobin and Welsh (1968/1973), is that they measure the global linguistic performance of test takers, and in order to be able to repeat the aural stimuli, test takers need to “comprehend and decode the sentence, recall and reconstruct it with their own grammar” (Wu & Ortega, 2013, p. 683). In other words, to imitate the test sentences accurately, participants need to receptively process the stimuli for meaning and productively reconstruct them using their phonetic, semantic, and syntactic knowledge of language (Kim et al., 2016). On this basis, the utterance elicited through an EIT can reflect the degree to which a test taker is able to assimilate the stimulus into his or her language competence (Munnich et al., 1994).

Theoretically, the use of EIT as a language assessment tool is supported by the regeneration hypothesis (Lombardi & Potter, 1992; Potter & Lombardi, 1990). According to this hypothesis, the process of repeating a sentence starts from a conceptual (meaning based) representation of the sentence to be recalled and essentially involves all levels of the language production system (Bock & Levelt, 1994). When prompted with a verbal imitation task, children have to draw on multiple language skills to be able to respond correctly. They first need to attend the three main components of the sentence—phonetics, semantics, and syntax—to be able to decode the sentence from its linguistic form in order to understand what is meant by the sentence. Then, they have to draw on many component skills including encoding of morphological segments and syntactic structures, articulatory planning, and speech production in order to repeat the sentence (Hagoort & Levelt, 2009; Mehrani, 2011; Mehrani & Peterson, 2017; Moll, Hulme, Nag, & Snowling, 2015). This is consistent with various definitions of language proficiency which includes control of phonemic, vocabulary, and grammatical elements (Bachman, 1990). In the present study, language proficiency is considered as individual’s ability to listen to verbal stimuli of varying lengths, understand their content, and accurately repeat them, using the same grammatical structures.

Some researchers argue that the way children repeat a sentence and the changes that they apply to the original model can provide useful clues about how they process the sentence (e.g. Devescovi & Caselli, 2007). For example, a child may modify the grammatical structure of a sentence, but keeps the message of the sentence, thus showing that she or he comprehended it (Slobin & Welsh, 1968/1973). Another test taker may leave out particular grammatical structures, displaying certain parsing problems. Others might be able to follow the grammatical word order of a sentence, but display certain phonological deficiencies. On this basis, various deficits in sentence repetition are considered a hallmark of specific language impairment (Conti-Ramsden et al., 2001).

Empirical investigations

There is a mounting literature in support of Slobin and Welsh’s (1968/1973) theoretical account and the view that sentence imitation is reconstructive and dependent on broad language skills (Klem et al., 2015). For instance, Moll et al. (2015) compared the performances of typically developing children and dyslexic children on a sentence repetition task and found that dyslexic children performed more poorly. More recently, Polisenska, Chiat, and Roy (2015) investigated the effects of different types of long-term linguistic knowledge (including morphosyntax, lexical phonology, semantics, and prosody) on English- and Czech-speaking children’s immediate recall. They found significant effects for all linguistic factors in both languages. Similarly, Moll et al. (2015) demonstrated that various aspects of language competence—morphological, grammatical, and phonological processes—are engaged in repeating sentences. Other empirical investigations that provide further evidence in support of this claim include studies that have shown that test takers experience greater difficulties when they are asked to repeat unfamiliar words or phrases (e.g. Gathercole & Baddeley, 1993). Ellis (2001) explains that this is because test takers have not had the opportunity to build up long-term memory representations of those words. He adds that this is why EITs serve so well as measures of language competence.

The potential of EITs as measures of language proficiency is also reported in second language acquisition research (see Jessop, Suzuki, & Tomita, 2007; Vinther, 2002). Ortega, Iwashita, Norris, and Rabie (2002) designed and validated versions of a sentence imitation test in four languages (English, German, Japanese, and Spanish) for cross-linguistic studies investigating syntactic complexity measures and their relationship to general language proficiency. Additional versions of the same test were also developed in Mandarin Chinese (Wu & Ortega, 2013), French (Tracy-Ventura, McManus, Norris, & Ortega, 2014), and Korean (Kim et al., 2016). Results are suggestive that these EITs can be used as reliable and valid tools to measure overall linguistic proficiency of language learners.

Some researchers have conducted comparative studies to see if children’s performance on EITs is related to their performance on spontaneous speech tests. Particularly, two studies (Gallimore & Tharp, 1981; Hood & Schieffelin, 1978) compared spontaneous and imitative performances of young children and found that children’s EIT scores and spontaneous speech test scores are related to their language behavior in natural settings. Similarly, Corrigan and Di Paul (1982) reported that sentence repetition tests are most useful in eliciting those relations that tended to be infrequent in spontaneous speech. Overall, these studies suggest that there is a high correlation between children’s scores on well-designed EITs and their scores on spontaneous language tests.

Working memory and EIT performance

Concerns of validity and reliability of EITs were addressed recently in a narrative review and meta-analysis by Yan, Maeda, Lv, and Ginther (2016). These authors investigated the construct validity of EITs used in 76 studies to measure global language proficiency or certain aspects of language proficiency. Their findings showed that EITs scores can discriminate between individuals with different proficiency levels, suggesting that these tests can be effectively used as measures of language proficiency. However, the literature on EITs as a measure of overall language proficiency is replete with numerous references to the role of working memory in repeating test items. This issue raises concerns about rote repetition which can be a threat to the validity of EITs (Kim et al., 2016). For example, some researchers have speculated that it might be possible that a testee repeats a sentence without understanding it. In particular, they have argued that test takers’ performance on EITs might be mediated by their working memory span, and thus, such tests cannot adequately measure individuals’ language proficiency.

To address this issue, some researchers have attempted to come up with a “magical number” (Vinther, 2002, p. 59) that exceeds working memory span and thus ensures that the stimulus is linguistically processed before it can be imitated. Although some researchers have asserted that adults are able to keep five to seven linguistic chunks in their working memory (e.g. Miller, 1956; Simon, 1974), others have recommended that test designers include stimuli of various lengths in EITs to address the rote repetition criticism (Kim et al., 2016). On the other hand, there are researchers who contend that an individual’s working memory span is not determined by a fixed number of items or chunks (Schweickert & Boruff, 1986), but by his stored knowledge of the language (Baddeley, Gathercole, & Papagno, 1998). This conclusion is supported by empirical investigations into the relationship between working memory and verbal imitation tasks. For example, Scott (1994) prompted English–Spanish bilingual and English monolingual participants with a Spanish elicited imitation task. He reported that bilingual participants, who were able to use meaning to aid retention, performed significantly better than their monolingual counterparts. Sachs (1967) and Potter and Lombardi (1990) also conducted similar studies and concluded that memory for sentences is meaning-based rather than form based.

Additional evidence for the minimal role of working memory in sentence repetition tasks comes from studies that included both correct and incorrect sentences in their test design. In these studies (e. g. Erlam, 2006; Hamayan, Saegert, & Larudee, 1977; Markman, Spilka, & Tucker, 1975; Munnich et al., 1994), participants were presented with both grammatical and ungrammatical sentences and they were simply asked to repeat the sentences. The results showed that a significant number of participants spontaneously corrected the ungrammatical sentences without being asked to. The findings of these studies suggest that sentences in EITs are not blindly repeated, but rather they are reconstructed and conceptually processed.

Another line of research that has attempted to address the potential role of working memory in test takers’ performance has focused on the insertion of a pause after the stimulus and before the response (Vinther, 2002). For example, Schweickert and Boruff (1986) pointed out that the capacity of short-term memory is determined by the limited time for which the verbal stimulus endures. Cowan et al. (1992) found that adult language users can “remember as many words as they can pronounce in about 1.5 to 2.0 seconds” (p. 15). Similarly, McDade, Simpson, and Lamb (1982) reported that participants could repeat sentences that they did not understand only when imitation was immediate, but they failed to do so after a 3.0-second pause. More recent studies (e.g. Bowden, 2016; Kim et al., 2016; Ortega et al., 2002) also suggest that the insertion of a short pause (i.e. 2.5 seconds) after each stimulus and before the cue is an effective strategy for minimizing the potential role of working memory in sentence repetition.

A further design element that is used in EITs to diminish the potential effect of working memory is the insertion of a pictorial aid after each test item. Devescovi and Caselli (2007) designed a 51-item EIT for measuring two- to four-year-old Italian children. Each item was accompanied with a picture reproducing its global meaning. These researchers concluded that the use of accompanying pictures in EITs maximize the likelihood that participants focus on meaning rather than on the form of the sentences that they hear.

To summarize, the literature suggests that well-designed EITs can be effectively used as a measure of overall language proficiency. To repeat a sentence, participants go through cognitive processes including comprehending the stimulus, reconstructing it with their own internalized knowledge of the language and reproducing it. Compelling research has shown that the capacity of working memory is determined by language users’ internalized knowledge of the language. Therefore, it is assumed that working memory does not influence participants’ responses to EIT items. Yet, the literature suggests effective strategies to ensure that performance on verbal repetition tasks is reconstructive and not rote repetition. These strategies include designing stimuli of varying lengths, inserting a short pause after hearing stimuli and before the response, and using accompanying pictures to reduce participants’ attention to form.

Motivation for the current study

The present study was conducted to design an oral elicited imitation measure in order to evaluate the language development of three- to six-year-old Persian-speaking children. The study specifically focused on Persian-speaking children because investigations show that there are psychometric inadequacies in the existing Persian language assessment tools (Farhady & Tavasoli, 2013). For example, some of the existing measures developed by the National Organization of Educational Testing are often considered unreliable and invalid. These tests have never been pretested, and unfortunately, no written report on the psychometric characteristics of these tests is available to independent researchers (Farhady & Hedayati, 2009). Some other existing tests (e.g. Persian Child Language Assessment Batteries) are costly and lengthy and thus are rarely used by clinicians or psycholinguist researchers (Hasanpoor, Jalilevand, Masumi, Ghorbani, & Kamali, 2015). The addition of a Persian EIT will, therefore, be an asset to various fields of study in Persian, including speech therapy, linguistics, and first- and second-language acquisition. The choice of age range of the participants in this study was motivated both by the empirical investigations that emphasize the importance of early identification of children with language impairment and also by the theoretical studies that suggest the development of children’s linguistic competence for producing multiple-word utterances occurs between three and six years (Owens, 2016).

The following research questions were raised to broadly guide the process of the study.

To what extent do Persian EIT scores discriminate three- to six-year-old children with different language abilities?

What is the relationship between participants’ performance on the Persian EIT and their performance on an oral narrative task?

Method

Participants

A total of 119 Iranian children in four age-groups participated in this study. There were 26 three-year-olds (12 female and 14 male, age range = 36–47 months, M = 42.1 months, and SD = 4.1); 32 four-year-olds (16 female and 16 male, age range = 48–60 months, M = 51.1 months, and SD = 3.1); 29 five-year-olds (18 female and 11 male, age range = 62–71 months, M = 66.5 months, and SD = 2.9); and 32 six-year-olds (20 female and 12 male, age range = 72–83 months, M = 77.4 months, SD = 3.8). The participants were all typically growing monolingual Persian speakers and they were recruited from two kindergartens in Neyshabur, Iran. The sample was typical of a middle socioeconomic class as shown by the parents’ occupations and levels of education. A total of six other children also participated in this study, but they were excluded from further analysis because they refused to finish the tasks.

Instruments and procedure

Persian EIT

The Persian EIT for Iranian preschoolers (three- to six-year-olds) included 40 items that were designed to assess children’s morphological and syntactic ability to imitate verbal stimuli. In designing the test, a comprehensive review of the literature concerning children’s development of language abilities was conducted first in order to draw a propositional framework of complexity. In doing so, four stages of language development (with each stage matching one age level) were assumed and for each stage the characteristics of children’s morphological and syntactic abilities were charted. Particular attention was paid to sentence length, tense, use of function words, and the number of syllables that children at each stage can normally produce (see Table 1). Then, for each stage, 10 questions were designed, and following how the existing EITs in other languages were designed (e.g. Kim et al., 2016; Wu & Ortega, 2013), the items for each stage were graded in an easy-to-difficult fashion by increasing the morphosyntactic complexity and extending the number of syllables used in each item. The choice of words in designing the items was mainly motivated by the literature on children’s expressive vocabulary at early stages of language development (Kazemi et al., 2012). In an attempt to maximize the likelihood that children focus on the meaning of the items, a total of 40 pictures were prepared to accompany the test items. For each item, one picture was designed to reproduce its global meaning. In addition, to reduce the potential possibility of rote repetition, a pause of 3 seconds was inserted after each stimulus and before each response.

Table 1.

Children’s language abilities across age and the characteristics of test items.

Age	General language abilities	Characteristics of the test items
Age	General language abilities	Mean of morphemes per item	Mean of syllables per item
3-Year-olds	Have about 1,000-word expressive vocabulary Produce 3- to 4-word sentences Talk about the present MLU = 3.16–4.40	4.27	9.34
4-Year-olds	Have about 1,500-word expressive vocabulary Rely on word order for interpretation Produce 4- to 5-word sentences MLU = 4.41–5.63	5.49	13.5
5-Year-olds	Have about 2200-word expressive vocabulary Understand before and after, regardless of word order Produce 4- to 6 word sentences MLU = 5.64–6.79	6.54	18.92
6-Year-olds	Have about 2,600-word expressive ocabulary Speech is understandable to most strangers Produce 5- to 7-word sentences MLU = 6.80–7.43	7.35	22.45

In order to detect any ambiguity in the items and to ensure that children are able to correctly repeat them, the test was used in a small-scale pilot study with 16 preschoolers who were three- to six-year-olds. As a result of the pilot study, a few items were modified. Having ensured the precision and clarity of the items, the final version of the test was used to collect data (see the Appendix).

Written consent forms from the child care centers’ administrators and children’s parents were obtained before beginning data collection. A research assistant was asked to spend a few days in the children’s care centers for a rapport-building introduction. Then, she individually invited children into a separate room in the child care centers and explained the test process. In particular, she said: “Would you like to play with me? I am going to say something and you say it after me, ok?” Then, the assistant said the first item and showed the relevant picture to the child. After a pause of 3 seconds she asked “Now, you say it.” She waited a few seconds for the child’s response. If the child did not respond, the assistant moved on to the next item. Children were allowed to see the relevant accompanying picture while answering each test item. All items were presented following the same procedure and children’s responses were audio recorded. If a child lost interest or cried, the test was postponed to a later session. Overall, six children refused to finish the task in the second round, thus they were excluded from the study. For children who showed interest but could not say anything in response to five consecutive items, the assistant terminated testing; however, their responses were taken into account in the analysis. Participants’ recorded performance was accurately transcribed on data tabulation forms, and the following 5-point scoring rubric (developed by Ortega et al., 2002) was applied to evaluate their language abilities (Table 2).

Table 2.

Description of each scoring category with participants’ responses to item 32.

Score	Description of the performance	Examples of children’s responses Tomorrow morning, Amir and his friends are going to buy a flower for their teacher.
4	Perfect repetition, no discrepancies between the stimulus and the response	Tomorrow morning, Amir and his friends are going to buy a flower for their teacher.
3	Accurate content repetition with some minor changes of form	Tomorrow morning, Amir and his friends are buying a flower for their teacher.
2	Some changes in content or form that affect meaning	Tomorrow morning, Amir and his teacher are going to buy a flower.
1	Repetition of half of the stimulus or less	Tomorrow, Amir and his teacher
0	Silence, only one word repeated, or unintelligible repetition	His teacher

To establish scoring reliability, the performance of each participant was independently scored by a second coder and was then compared against the results of scoring by the researcher. A two-way random intraclass correlation was run to estimate the reliability of scoring procedure. Results showed that the reliability between two raters is 0.85, with 95% CI (0.78, 0.89), which is quite wide. This shows that 85% of variance in the mean of these raters is real and there is a high level of agreement between two raters. Cases of disagreement were resolved by discussion.

Oral narrative task

The second instrument used in this study was an oral narrative task, specifically, a wordless pictorial story consisting of 10 linked illustrations that depicted a simple story of a dog chasing a cat. Children who participated in the EIT were instructed to look at the pictures for two minutes and prepare to describe what they saw. While describing the depicted story, they were allowed to look at the pictures and were asked to provide as many details as they could. Each child was individually tested and their performance was audio recorded. Then, the mean length of utterances (MLU) in morphemes was calculated for each to evaluate their morphosyntactic abilities. In doing so, each child’s performance on the oral narrative task, specifically the first 50 utterances, was transcribed. The number of morphological elements in each child’s utterances was summed and the total was divided by the total number of utterances (i.e. 50). Thus, the obtained MLU was a ratio between the number of individual morphemes and the number of utterances. A research assistant was then asked to double check the scoring procedure to assess the reliability of analysis. Two-way random intraclass correlation showed that there was .75, with 95% CI (0.74, 0.79) implying an ideal amount of reliability in coding procedure. Cases of discrepancy were resolved by discussion.

Results

In order to evaluate the discriminatory power of the EIT, following the scoring procedure described above, an EIT score was calculated for each child. As shown in Table 3, children’s EIT mean scores increased as they developed. To investigate the effect of age on children’s language abilities, the mean scores were submitted to one-way analysis of variance (ANOVA) with four age levels as the independent variable. The ANOVA revealed a significant effect of age, F(3, 115) = 169.57, p < .001, η²= .31. Post hoc comparisons using the Tukey’s procedure confirmed significant differences among all age-groups, p < .001. Thus, the developed EIT seems to be sensitive to the changes that take place across all age-groups.

Table 3.

Children’s mean scores and standard deviations on the EIT and oral narrative task at different age levels.

Age	No.	EIT mean scores	SD	MLU mean scores	SD
3-Year-olds	26	24.92	19.33	3.88	2.46
4-Year-olds	32	39.65	16.57	4.93	1.57
5-Year-olds	29	91.89	23.69	5.67	1.82
6-Year-olds	32	116.18	20.03	6.44	2.61

Analysis of children’s performance on the oral narrative test also showed that their MLU increased with age. As shown in Table 3, the mean of three-year-old children’s average length of utterances was 3.88 and children’s scores increased with age. To investigate the effects of age on children’s MLU scores, a univariate ANOVA was performed. A significant effect of age appeared on children’s MLU scores, F(3, 115) = 80.40, p < .001, η²= .24. Post hoc analysis using the Tukey’s procedure revealed that children’s MLU mean scores in each age-group were significantly different from each of the other groups’ mean scores, p < .001.

The relationship between children’s performance on the EIT and the oral narrative task was examined to test the concurrent validity of the newly developed EIT. Pearson correlation coefficient was calculated and a significant positive correlation between the measures was found, (r = 0.679, p < 0.01). Taken together, the findings suggest that the newly developed EIT can measure children’s language proficiency and effectively discriminate among children with various levels of language competence.

To examine the contribution of each individual item, in the next step, the analysis was expanded to item level. As shown in Table 4, the data model fit was analyzed using Rasch model.First, the parameters for the items were examined and it was found that items 2, 14, and 17 were unfit to the model because the mean square value for these items exceeded 1.3. That is, for these items, more unexpected patterns of performance by participants were investigated. However, all other items were ideally fitted to the model.

Table 4.

Item statistics and measure order.

Item	Measure	Model S.E.	Infit		P.T measure-A.L
Item	Measure	Model S.E.	MNSQ	ZSTD	Corr.	Exp.
40	1.73	.13	.91	−.4	.73	.70
39	1.55	.13	.97	−.1	.74	.72
38	1.53	.13	.83	−1.0	.76	.72
37	1.40	.12	.90	−.6	.77	.74
36	1.40	.12	.77	−1.5	.79	.74
35	1.34	.12	.72	−2.0	.81	.74
34	1.23	.12	.93	−.4	.78	.76
33	1.16	.12	.86	−.9	.80	.76
32	1.04	.11	.82	−1.2	.82	.78
30	1.03	.11	.69	−2.3	.85	.78
31	1.01	.11	.69	−2.3	.85	.78
29	.93	.11	.69	−2.3	.85	.79
28	.89	.11	1.17	1.1	.79	.80
27	.74	.11	.92	−.5	.82	.81
26	.49	.11	.90	−.6	.85	.81
25	.45	.11	.60	−2.8	.87	.81
24	.39	.11	.1	−2.7	.87	.81
23	.31	.11	.63	−2.6	.87	.81
21	.28	.11	.57	−3.1	.87	.81
22	.20	.11	.83	−1.0	.81	.81
20	.06	.11	.75	−1.7	.85	.81
19	.04	.11	.85	−.9	.84	.81
18	−.05	.11	1.27	1.7	1.14	.80
17	.01	.11	1.53	3.0	.73	.81
16	−.18	.11	.89	−.7	.79	.80
15	−.34	.10	.99	0	.80	.79
14	−.32	.10	1.71	4.3	.65	.79
13	−.45	.10	.99	0	.76	.78
10	−.71	.10	1.06	.5	.69	.76
12	−.83	.10	.80	−1.6	.76	.75
11	−1.11	.10	.88	−.9	.71	.71
9	−1.13	.10	.87	−1.0	.70	.71
8	−1.37	.10	.75	−1.9	.67	.68
7	−1.55	.11	.74	−2.	.63	.65
6	−1.63	.11	1.12	−.9	.42	.64
5	−1.74	.11	.72	−.2	.66	.63
4	−1.78	.11	1.14	.9	.39	.62
3	−1.90	.12	.96	−2.1	.59	.61
2	−2.0	.12	1.45	2.5	.24	.59
1	−.20	.12	1.0	.1	.42	.59

P.T. Measure: point-biserial and point-measure correlations

Model S.E. is the standard error of the estimate.

Measure: item difficulty.

As shown in Table 4, our analysis of item difficulty also showed that items 10, 21, and 30 had, respectively, higher logit than items 12, 22, and 31. On this basis, some grading modifications should be applied to test items so as to ensure that each item is more difficult than the preceding and easier than the succeeding item.

In addition, the differential item functioning analysis was run to examine whether items function differently across different age-group of participants. As shown in Table 5, results indicated that in response to item 2, three-year-old children showed a better performance than their four-year-old counterparts. Likewise, item 17 was found biased in favor of four-year-old children.

Table 5.

DIF-flagged items.

Item no.	Age-group	DIF measure	Age-group	DIF measure	DIF contrast	Rasch–Welch
Item no.	Age-group	DIF measure	Age-group	DIF measure	DIF contrast	t	df	Prob
2	3-Year-olds	−2.88	4-Year-olds	−2.15	−.73	−2.29	52	.0231
17	3-Year-olds	−42	4-Year-olds	−.93	1.34	3.04	31	.0048

Discussion

Drawing on the previous studies investigating the effectiveness of EITs as valid and reliable measures of children’s overall language proficiency, the goal of this study was to develop an EIT and to evaluate its validity by comparing children’s performance against their performance on an oral narrative task. In general, the findings suggest that the newly developed Persian EIT reported here is an ecologically valid and reliable measure for effectively evaluating three- to six-year-old Persian-speaking children’s language abilities.

First, the analysis of the internal consistency of the test items showed a high level of reliability (i.e. 85% inter-rater agreement). This satisfactory level of reliability seems to stem from the employment of a well-designed, objective 5-point scoring rubric (developed by Ortega et al., 2002), and 40 test items covering a wide range of morphosyntactic difficulty. Second, children’s performance demonstrated that they could easily comprehend and follow the test instructions. The findings showed that test items were appropriately graded, matching children’s language abilities in each age-group. That is, in each age level, children’s responses to the test questions were mostly plausible repetitions of the target sentences, with few deviations. And younger children generally displayed a lack of linguistic competence to perform well when prompted with more complicated questions that were designed to capture older children’s language abilities. Furthermore, the findings of the study demonstrated the concurrent validity of the instrument by a strong positive correlation between the EIT scores and oral narrative task scores.

Examining the relationship between children’s scores on the EIT and on the oral narrative test was one of the objectives of the current study. Considering that some researchers have doubted the potential of EITs in effectively assessing individuals’ language knowledge (e.g. Bley-Vroman & Chaudron, 1994; McNamara, 1996), the results of this study lend support for the use of this particular EIT as an effective instrument for evaluating the development of children’s language abilities. Our findings suggest that a well-developed EIT can discriminate children with different language abilities.

Particular features of the design of the EIT used in this study were the insertion of a 3-second pause after each stimulus and before each response and the use of accompanying pictures. In addition, stimuli of varying lengths were used in order to ascertain that there are enough prompts for children with various levels of language proficiency. Recent studies show that sentence length is the most important predictor of item difficulty in EITs (Kim et al., 2016); however, the literature does not suggest any specific number of syllables or words for each test item (Vinther, 2002). In the current study, attempts were made to include sentence items of a range of 6 to 31 morphemes. This wide range of item length contributed to the discriminatory power of the test and it corroborated Yan et al.’s (2016) conclusion that EITs “with varied sentence length will more likely match the ability of speakers with different proficiency levels” (p. 26).

In designing the test, various measures were taken to minimize the potential effect of children’s working memory span on their responses to test items. As discussed above, compelling research has demonstrated that understanding the meaning of the items, focusing on accompanying pictorial aids, and a short pause between each stimulus and response would make it extremely difficult, if not impossible, for individuals to simultaneously memorize the form of the sentences and then repeat them through rote memorization. In addition, research has shown that the capacity of working memory in repeating verbal stimuli is determined by the language knowledge that is already constructed (Erlam, 2006). Thus, even if one speculates that the participants’ performance on the EIT might have been mediated through rote memorization, it is evident that the participants who “had the ability to memorize stimuli were indeed those who had internalized language, and, therefore, their superior performance on the test was an indication of this” (p. 486). Thus, the findings presented here provide suggestive evidence that the Persian EIT assesses children’s internalized knowledge of language. However, one suggestion for future studies is to directly tackle the issue of working memory by investigating the relationship between participants’ memory span and their performance on EITs.

The analysis of the results at item level demonstrated that the test is suitable and can discriminate well in the age range considered. We found that children’s responses to the items were mostly repetitions of the target sentences, with very few insertions of spontaneous language, such as making comments, asking questions, and spontaneous picture descriptions. However, our findings suggest that some minor modifications should be applied to the test. For instance, we could identify two malfunctioning items (i.e. items 2 and 17) that need to be modified. In addition, we found evidence that some of the items (i.e. items 10, 21, and 30) should be rearranged. Accordingly, we plan to modify the test and use a similar research design adopted in this study with a larger population. Although the results of this study seem to suggest that memory span was not related to children’s performance on the test, we plan to control for children’s verbal memory in our future study.

The following limitations to this research also need to be acknowledged. One is that the present exploratory study was exclusively conducted on typically developing children with no particular language impairment. Additional investigations on atypical children are required in order to examine the relationships between their language impairment and performance on EITs. In addition, in the present study, children’s performance was evaluated only by following the 5-point scoring scale developed by Ortega et al. (2002). Future researchers can use other scoring alternatives such as automated scoring or binary scoring. Studies can effectively examine whether different scoring options can contribute to the discriminatory power of the test.

In sum, despite these limitations, the results presented here demonstrate that this particular EIT can be used as a reliable and valid instrument for measuring Persian children’s language proficiency. The findings showed that the test can evaluate a wide range of grammatical structures at once, without being concerned with various interfering issues such as contextual variations that are associated with other verbal ability tests such as open-ended discussions and narratives.

Footnotes

Appendix

Table 1A.

Example items from the Persian EIT.

English translation	Persian wording	Item no.
Arash is cleaning his clothes.		4
Mina put the ball on the table.		7
The red cab is bigger than the yellow cab.		14
Sasan can’t ride a bicycle yet.		19
Ali asked Hamid, what time is it?		26
Saeed is too young to drive a car.		30
Elham was sick, so she could not participate in her friend’s birthday party.		34
Kiyan gave two oranges to Samira, and got three apples instead.		37

Article Notes

References

Bachman

(1990) Fundamental considerations in language testing, Oxford, England: Oxford University Press.

Baddeley

A. D.

Gathercole

S. E.

Papagno

(1998) The phonological loop as a language learning device. Psychological Review 105: 158–173. doi: 10.1037/0033-295X.105.1.158.

Bley-Vroman

Chaudron

(1994) Elicited imitation as a measure of second-language competence. In: Tarone

Gass

Cohen

(eds) Research methodology in second-language acquisition, Mahwah, NJ: Erlbaum, pp. 245–261.

Bock

Levelt

W. J. M.

(1994) Language production: Grammatical encoding. In: Gernsbacher

M. A.

(ed.) Handbook of psycholinguistics, San Diego, CA: Academic Press, pp. 945–984.

Bowden

(2016) Assessing second-language oral proficiency for research: The Spanish elicited imitation task. Studies in Second Language Acquisition 38: 647–675. doi: 10.1017/S0272263115000443.

Conti-Ramsden

Botting

Faragher

(2001) Psycholinguistic markers for Specific Language Impairment (SLI). Journal of Child Psychology and Psychiatry 42: 741–748. doi: 10.1111/1469-7610.00770.

Corrigan

Di Paul

(1982) Measurement of language production in two-year-olds: A structured laboratory technique. Applied Psycholinguistics 3: 223–242. doi: 10.1017/S0142716400001405.

Cowan

Day

Scott Saults

Keller

T. A.

Johnson

Flores

(1992) The role of verbal output time in the effects of word length on immediate memory. Journal of Memory and Language 31: 1–17. doi: 10.1016/0749-596X(92)90002-F.

Devescovi

Caselli

(2007) Sentence repetition as a measure of early grammatical development in Italian. International Journal of Language & Communication Disorders 42(2): 18–208. doi: 10.1080/13682820601030686.

10.

Dockrell

J. E.

(2001) Assessing language skills in pre-school children. Child Psychology and Psychiatry Review 6(2): 74–85. doi: 10.1017/S1360641701002532.

11.

Ellis, N. (2001). Memory for language. In P. Robinson (Ed.). Cognition and second language instruction (pp. 33–68). Cambridge, MA: Cambridge University Press.

12.

Erlam

(2006) Elicited imitation as a measure of L2 implicit knowledge: An empirical validation study. Applied Linguistics 27: 464–491. doi: 10.1093/applin/aml001.

13.

Farhady

Hedayati

(2009) Language assessment policy in Iran. Annual Review of Applied Linguistics 29: 132–141. doi: 10.1017/S0267190509090114.

14.

Farhady, H., & Tavassoli, K. (2013). Assessing Farsi. In A. J. Kunnan (Ed.), The companion to language assessment (4 Vols., 16: 112, pp.1790–1798). New York: John Wiley & Sons, Inc.

15.

Gallimore

Tharp

(1981) The interpretation of elicited imitation in a standardized context. Language Learning 31: 369–392. doi: 10.1111/j.1467-1770.1981.tb01390.x.

16.

Gathercole

S. E.

Baddeley

A. D.

(1993) Working memory and language, Hove, England: LEA.

17.

Hagoort

Levelt

W. J. M.

(2009) The speaking brain. Science 326(5951): 372–373. doi: 10.1126/science.1181675.

18.

Hamayan

Saegert

Larudee

(1977) Elicited imitation in second language learners. Language and Speech 20: 86–97. doi: 10.1177/002383097702000109.

19.

Hasanpoor

Jalilevand

Masumi

Ghorbani

Kamali

(2015) Development of a picture receptive vocabulary test and evaluation of its validity & reliability for normal 36-71 months Persian children. Journal of Paramedical Science and Rehabilitation 4(3): 34–43.

20.

Hood

Schieffelin

B. B.

(1978) Elicited imitation in two cultural contexts. Quarterly Newsletter of the Institute for Comparative Human Development 2(1): 4–12.

21.

Jessop

Suzuki

Tomita

(2007) Elicited imitation in second language acquisition research. The Canadian Modern Language Review 64: 215–220. doi: 10.3138/cmlr.64.1.215.

22.

Kazemi

Taheri

Kianfar

Shafiei

Eslamifard

Pirmoradian

Nourian

(2012) Mean length of utterance (MLU) in typically-developing 2:6-5:6 year-old Farsi-speaking children in Iran. Journal of Research in Rehabilitation Science 8(5): 1–10.

23.

Kim

Tracy-Ventura

Jung

(2016) A measure of proficiency or short-term memory? Validation of an elicited imitation test for SLA research. The Modern Language Journal 100(3): 655–673. doi: 10.1111/modl.12346.

24.

Klem

Melby-Lervag

Hagtvet

Lyster

S. H.

Gustafsson

Hulme

(2015) Sentence repetition is a measure of children’s language skills rather than working memory limitations. Developmental Science 18(1): 146–154. doi: 10.1111/desc.12202.

25.

Lombardi

Potter

M. C.

(1992) The regeneration of syntax in short term memory. Journal of Memory and Language 31(6): 713–733. doi:10.1016/0749-596X(92)90036-W.

26.

Markman

B. R.

Spilka

I. V.

Tucker

G. R.

(1975) The use of elicited imitation in search of an interim French grammar. Language Learning 25: 31–41. doi: 10.1111/j.1467-1770.1975.tb00107.x.

27.

McDade

H. L.

Simpson

M. A.

Lamb

D. E.

(1982) The use of elicited imitation as a measure of expressive grammar: A question of validity. Journal of Speech and Hearing Disorders 47: 19–24. doi: 10.1044/jshd.4701.19.

28.

McNamara

(1996) Measuring second language performance, London, England: Longman.

29.

Mehrani

M. B.

(2011) What is biased: Children’s strategies or the structure of yes/no questions? First Language 31: 214–231. doi: 10.1177/0142723710391886.

30.

Mehrani, M. B., & Peterson, C. (2017). Children's recency tendency: A cross-linguistic study of Persian, Kurdish and English. First Language, 37, 350–367. doi: 10.1177/0142723717694055.

31.

Miller

G. A.

(1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63(2): 81–97. doi: 10.1037/h0043158.

32.

Moll

Hulme

Nag

Snowling

M. J.

(2015) Sentence repetition as a marker of language skills in children with dyslexia. Applied Psycholinguistics 36(2): 203–221. doi: 10.1017/S0142716413000209.

33.

Munnich

Flynn

Martohardjono

(1994) Elicited imitation and grammaticality judgment tasks; what they measure and how they relate to each other. In: Tarone

E. E.

Gass

Cohen

(eds) Research methodology in second-language acquisition, Mahwah, NJ: Erlbaum, pp. 227–243.

34.

Ortega, L., Iwashita, N., Norris, J. M., & Rabie, S. (2002, October). An investigation of elicited imitation tasks in cross-linguistic SLA research. Paper presented at the Second Language Research Forum, Toronto, Canada.

35.

Owens

R. E.

(2016) Language development: An introduction, 9th ed. New York, NY: Pearson.

36.

Polisenska

Chiat

Roy

(2015) Sentence repetition: What does the task measure? International Journal of Language & Communication Disorders 50(1): 106–118. doi: 10.1111/1460-6984.12126.

37.

Potter

Lombardi

(1990) Regeneration in the short term recall of sentences. Journal of Memory and Language 29: 633–654. doi: 10.1016/0749-596X(90)90042-X.

38.

Sachs

(1967) Recognition memory for syntactic and semantic aspects of connected discourse. Perception and Psychophysics 2: 437–442. doi: 10.3758/BF03208784.

39.

Schweickert

Boruff

(1986) Short–term memory capacity: Magic number or magic spell? Journal of Experimental Psychology: Learning, Memory, and Cognition 12(3): 419–425. doi: 10.1037/0278-7393.12.3.419.

40.

Scott

M. L.

(1994) Auditory memory and perception in younger and older adult second language learners. Studies in Second Language Acquisition 16(3): 263–281. doi: 10.1017/S0272263100013085.

41.

Simon

H. A.

(1974) How big is a chunk? Science 183: 482–488. doi: 10.1126/science.183.4124.482.

42.

Slobin, D. I., & Welsh, C. A. (1968). Elicited imitation as a research tool in developmental psycholinguistics (Working Paper No 10, pp. 485–497). Berkeley: Language Behavior Research Laboratory, University of California. (Reprinted in C. Ferguson & D. I. Slobin (Eds.). (1973). Studies of child language development (pp. 485–489). New York, NY: Holt, Rinehart and Winston.).

43.

Tracy-Ventura

McManus

Norris

Ortega

(2014) “Repeat as much as you can”: Elicited imitation as a measure of oral proficiency in L2 French. In: Leclercq

Edmonds

Hilton

(eds) Measuring L2 proficiency: Perspectives from SLA, Bristol, England: Multilingual Matters, pp. 143–166.

44.

Vinther

(2002) Elicited imitation: A brief overview. International Journal of Applied Linguistics 12: 54–73. doi: 10.1111/1473-4192.00024.

45.

S. L.

Ortega

(2013) Measuring global oral proficiency in SLA research: A new elicited imitation test of L2 Chinese. Foreign Language Annals 46: 680–704. doi: 10.1111/flan.12063.

46.

Yan

Maeda

Ginther

(2016) Elicited imitation as a measure of second language proficiency: A narrative review and meta-analysis. Language Testing 33(4): 497–528. doi:10.1177/0265532215594643.