Abstract
In the context of increasing legislative emphasis on universal screening for reading problems, the accurate and equitable assessment of English learners (ELs) remains a pressing concern. This study examines how kindergarten and first-grade students’ performance on early literacy measures in English is affected by their English proficiency. In this paper, we report on performance on measures of deletion, picture naming, sentence repetition, letter naming fluency, word and nonword reading, and rapid object naming across the school year. Drawing on a diverse and representative sample of 3,064 students across 31 U.S. Californian schools, we addressed two main research questions. First, we compared the performance of English-only students (EO) to ELs and to students identified as English-proficient (EP) but speaking another language at home. Findings indicated that ELs consistently scored lower than their EO and EP peers across all assessments. Second, we compared growth patterns. While most measures showed similar growth rates, a significant performance gap remained for ELs (p < .001). Notably, EP students displayed distinct performance patterns, outperforming EO students in most tasks, except for those demanding more vocabulary. Our findings emphasize the importance of tailored assessment approaches and consideration of English proficiency when interpreting ELs’ performance.
Over the last 10 years, nearly all states in the United States have passed legislation either mandating or recommending universal screening for reading problems in Grades K–3, with the goal of identifying children who may be at risk for reading difficulties (Petscher et al., 2019). Universal screening has been found to improve reading outcomes because it allows for the early identification of students who are falling behind benchmark levels of reading development with the goal of intervening before delays become entrenched (Linan-Thompson et al., 2022; McIntosh & Goodman, 2016). This is particularly important for populations who have historically experienced inequities in reading outcomes, such as English Learners (ELs; Goldenberg, 2020; Umansky et al., 2015). However, screening relies on fair and accurate measurement, which is why, in this paper, we investigate performance and growth differences on a suite of English early literacy measures (the Reach Every Reader English reading assessment battery; Petscher & Catts, 2022) tapping into constructs commonly used for reading screening.
We assert that it is important to compare differences in performance between ELs and English-proficient students on these measures, given that screening for reading problems in the United States is primarily conducted in English, with little guidance on how to interpret performance based on English proficiency (Brown et al., 2023; Francis et al., 2020). This is a critical oversight given that ELs make up 10% or more of the student body in most states, all of which have some form of legislation encouraging or mandating universal screening for reading problems (National Center on Improving Literacy, 2023).
Current recommendations from leading national organizations such as the International Dyslexia Foundation and the National Center on Intensive Intervention emphasize the importance of selecting assessments that have included ELs in the normative samples and that provide evidence of validity and reliability with ELs (Brown et al., 2023; International Dyslexia Association, 2023). This focus on fairness is in line with the Standards for Educational and Psychological Testing (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014) in which fairness is described as a necessary condition in testing to reduce bias and potential harm from discriminatory testing practices (Herman & Cook, 2022).
Situating assessment within the context of an individual’s experience, community, culture, and language is integral to accuracy. The construct of fairness in testing includes evidence of validity and reliability, but also adds the important nuance of the cultural and linguistic match with the population being measured (Sireci & Randall, 2021). Comparing the performance between ELs and English-proficient students on widely used measures of early literacy in English provides an opportunity to explore linguistic fairness in reading assessments.
Early Literacy Assessment
Universal screening is an integral component of multi-tiered systems of support (MTSS) and data-based decision making (McIntosh & Goodman, 2016), and it involves the assessment of all students in the classroom, usually in Grades K–3 (Petscher et al., 2019). Therefore, it is imperative that assessments provide information about students’ abilities that can be used with confidence. It generally involves some combination of tasks that measure phonological awareness, decoding, spelling, listening comprehension, vocabulary, fluency in letter names and sounds, rapid automatized naming, and working memory (Gaab et al., 2020).
Performance on Early Literacy Measures as a Function of English Proficiency
The reality is that U.S. schools have not adequately supported the reading outcomes of ELs for the last 30 years, as indicated by nearly static scores that are significantly lower than White English-speaking peers on the National Assessment of Educational Progress (U.S. Department of Education, n.d.). Since the 1960s, ELs also tend to be over-identified for learning disabilities in the United States particularly after the third grade (National Center for Learning Disabilities, 2020; Rodríguez & Rodríguez, 2017). These trends are largely due to a lack of effective instruction that is designed specifically to meet the educational needs of ELs, which includes a lack of attention to English language development (Rojas et al., 2019). Being bilingual in itself is not a risk factor for poor reading outcomes, but in the context of the United States, with primarily English-only instruction and limited language scaffolding in classrooms, students labeled as ELs based on low performance on an English proficiency assessment find themselves in the difficult position of acquiring English while simultaneously being tasked with learning to read in English (Goldenberg, 2020).
It is widely accepted that the identification of reading problems and the delivery of evidence-based reading interventions particularly in kindergarten and first grade are promising approaches to improving reading outcomes (Gaab et al., 2020), and this is also true for ELs if identification is accurate and the instruction is effective (Project ELITE2 et al., 2018). Benchmarking decisions about ELs based on performance on English assessments need to be informed by the student’s level of English proficiency (Mancilla-Martinez, 2020; Project ELITE2 et al., 2018). Students need to develop enough English proficiency to understand the instructions, vocabulary, and content of reading assessments (Baker et al., 2023) and interventions need to attend to English oral language development in addition to explicit reading instruction (Cho et al., 2021; Goldenberg, 2020; Keller-Margulis et al., 2023; Linan-Thompson et al., 2022; Solari et al., 2022).
Given that universal screening requires the testing of all students, it is important to consider the implications for students who are identified as ELs, since they are a large and growing student population (National Center for Educational Statistics, 2024). Even though their performance is likely affected by their English proficiency (Mancilla-Martinez, 2020; Project ELITE2 et al., 2018), it is standard practice across the United States to screen for reading problems only in English, with little guidance for teachers about how to interpret their performance in order to avoid over- and under-identification (EdWeek, 2023; Linan-Thompson et al., 2022; Ortiz et al., 2018).
Students who are designated as ELs by definition have limited English proficiency. Based on the Elementary and Secondary Education Act Title III law all states must identify ELs by administering an English proficiency test to children whose parents indicate that a language other than English is spoken at home (Office of Elementary and Secondary Education, 2020). In California, the English Language Proficiency Assessment (ELPAC) is administered to all incoming students whose parents report that they are not monolingual English (EO) speakers, and then repeated annually. The ELPAC provides a measure of English proficiency and is used to determine the student’s English proficiency designation. Students who perform below benchmark levels set by the state are identified as English learners (ELs). Those students who speak a language other than English, but demonstrate above benchmark English proficiency levels based on initial or follow-up testing are (re-)classified as English-proficient (EP). The EP students are also included as a subpopulation of interest in this analysis in addition to the ELs, in contrast with previous research where this group has been excluded or combined with monolingual English-speaking students (Rhinehart & Gotlieb, 2023). Few studies have examined differences in performance between monolingual EP and EL students on English reading screening measures. Cummings et al. (2021) explored the role of English proficiency in determining the optimal screening thresholds between students who were identified as EL and EP students. They found that the receiver-operating curves demonstrated few differences between ELs and EPs, as the confidence bounds for the curves for ELs mostly contained the curve for EPs, in overall accuracy of the screeners across risk levels. Another study did not find statistical differences between EL and EP groups in first grade on measures of letter naming and sounds, blending, nonword repetition, following directions, nonword reading, real word reading and rapid object naming, but significant differences were found on vocabulary, rhyming, and oral comprehension (Rhinehart & Gotlieb, 2023). It is important to notice that the latter study was also conducted with a small sample of ELs (n = 24) which likely did not capture the full range of English proficiency found in larger samples. These researchers also explored whether there was a difference in classification rates. Although they did not find a statistically significant difference between the EL and EO risk identification rates, they reported that their screener identified 26% of EO students versus 43% of ELs. This 17 percentage-point difference may not have been statistically significant, but it is meaningful. Given the small sample size, the results from this pioneer study should be interpreted with caution, and more research is needed with larger samples to provide more confidence in the use of English reading screening with children who are designated as ELs.
Other studies have further explored the differences in diagnostic accuracy for reading risk between monolingual English speakers and English learners using various universal screeners. Keller-Margulis and colleagues (2023) examined the effectiveness of two screening assessments at different performance levels on a standardized reading measure for monolingual and Spanish-English bilingual students (N = 96). They observed differences in accuracy between English learners and monolingual students, with one of the screening cut-offs showing the lowest accuracy in identifying reading risk. These findings are consistent with previous studies (e.g., Vanderwood et al., 2008).
Growth in Early Literacy Measures as a Function of English Proficiency
There is also debate about how quickly young children acquire English and how this might be reflected on reading and language assessments. Researchers previously found that ELs vary in their growth on reading versus language measures. Mancilla-Martinez and Lesaux (2011) found that Spanish-speaking ELs of ages 4 to 11 reached national norms in word reading but were still performing below average in English language development. These findings suggest variability in growth between word reading and language development, but these researchers did not disaggregate growth based on initial levels of proficiency. In the current study, we add to these findings by exploring how growth varies by the child’s initial level of English proficiency between language and literacy measures.
There is a convergence of evidence that studying and modeling growth is an important approach in predicting which children are at risk for reading difficulties (Hoff, 2013). Although the evidence is mixed (Petscher et al., 2022) and some have found that growth predicts risk beyond status (e.g., Yeo & Park, 2014), while others have not (e.g., Brown Waesche et al., 2011). In EL populations specifically, growth models have been shown to have value above and beyond that of single data points on standardized measures (Rojas & Iglesias, 2013). To date, there is a growing body of longitudinal dual language growth trajectory studies that have been conducted (e.g., Al Otaiba et al., 2009; Baker et al., 2012; Collins, 2014; Davison et al., 2011; Durán et al., 2013; Hammer et al., 2009; Hoff et al., 2014; Hoff & Ribot, 2017; Jackson et al., 2014; Rojas & Iglesias, 2013; Solari et al., 2014). Overall, the corpus of Spanish-English growth studies includes children beginning in the preschool years (Hammer et al., 2009; Hoff & Ribot, 2017; Jackson et al., 2014; Páez et al., 2007), with others focusing on school-age children (Collins, 2014; Rojas & Iglesias, 2013). Across all studies, performance and growth was found to be below age-expected norms in English and Spanish, with some studies’ sample means as low as two standard deviations below the mean in both Spanish and English on both language and literacy measures (Páez et al., 2007).
There are common misconceptions that children learn language “like a sponge” and that they will simply pick up English through passive exposure in the classroom setting (Espinosa, 2013). However, although ELs are growing in language and literacy skills, longitudinal growth studies (Durán & Wackerle-Hollman, 2018) show their growth rates are not robust enough for them to catch up to their monolingual or English-proficient peers. In this study, we will also explore growth as a function of initial proficiency and compare ELs’ growth trajectories on commonly used early literacy measures to those of monolingual English speakers and former ELs, now deemed English-proficient.
Research Questions
Method
Design and Setting
This is a 1-year longitudinal study. Trained assessment administrators collected data throughout California in fall, winter, and spring of the 2021/2022 academic years. Notably, this was a unique period in that it was the first year that students returned to school after participating in distance learning activities due to the COVID-19 pandemic.
Sample
Our sample includes 3,064 students in kindergarten (n = 1,502) and first grade (n = 1,562), from 31 schools in 14 Californian school districts, both rural and urban (see Table 1 for sample demographics). California’s EL population is among the largest in the United States, with more than 1.3 million ELs, making up 19.1% of the state’s total public elementary and secondary school enrollment (California Department of Education, 2023). The schools in our sample primarily provided instruction in English with less than 20% of the sample receiving some reading instruction in Spanish in either 90:10 or 50:50 programs. Programs self-reported their language of instruction model and there was no opportunity to observe the quality of reading instruction or the actual adherence to the reported language of instruction.
Sample Demographics, Separately for Kindergarten (n = 1,502) and First Grade (n = 1,562).
Note. Counts are presented with column-wise percentages in parentheses. “English-proficient” is a combined category including students initially classified as fully English-proficient or reclassified as such.
On average, each school had an enrollment of 412 students, with 32% of students speaking a language other than English at home. School-level SES was measured as the proportion of students in each school eligible for free or reduced-price meals, or those whose parents/guardians have not attained a high school diploma (M = 0.50, SD = 0.30). The sample size across semesters varied, because nine schools entered the study in winter. However, the proportions of students with different English proficiency designation labels based on their ELPAC scores (English only [EO]; initially or re-classified as fully English-proficient [EP], and English learner [EL]) was consistent across these different time points (see Table 1 for sample breakdown).
Measures
We used the Reach Every Reader Assessments, which were the earlier prototypes of the Interstellar Express assessments (Petscher & Catts, 2022), developed by researchers at the Florida Center for Reading Research. Most of the assessments involved computer-adaptive testing (CAT). Computer-adaptive testing is an assessment approach that estimates each student’s ability level by selecting items from a pre-calibrated item pool based on their responses (Wainer, 2000). Previous studies have not yet been conducted exploring the use of these assessments with ELs. In the current study, the following reading and literacy tasks were administered (see Table 2 for information on reliability and validity of the measures; see Tables S1 and S2 for correlations between the measures):
•
•
•
•
•
•
•
Description, Reliability, and Criterion Validity Evidence for the Pool of Screening Tasks From a Pilot Version of the Reach Every Reader Assessment (Petscher & Catts, 2022).
Note. K = Kindergarten; G1 = Grade 1; EO = English-only; EP = (re-)classified English-proficient; EL = English learner; WPPSI- 4 = Wechsler Preschool and Primary Scale of Intelligence (4th ed.); CELF-5 = Clinical Evaluation of Language Fundamentals (5th ed.); CTOPP-2 = Comprehensive Test of Phonological Processing (2nd ed.); KTEA-3 = Kaufman Test of Educational Achievement (3rd ed.).
Procedures
The literacy measures were administered in fall 2021, winter 2022, and spring 2022 by university-employed proctors, who were trained to fidelity and monitored for accuracy by site coordinators. For CAT tasks, each measure begins with a fixed number of five items. On average, students received six to nine additional items per task. All tasks were administered using iPads that displayed the task directions and stimuli. A trained proctor used a second iPad, synchronized with the child’s device, to evaluate responses based on a predefined set of acceptable answers. As proctors scored the child’s oral response, the child’s assessment experience automatically advanced to the next item. This process continued until a reliable estimate of the child’s ability was determined or the maximum number of items was reached (ranging from 10 to 12).
Data Analysis
All analyses were conducted in R version 4.3.2. The examination of missing data revealed that data was not missing completely at random (MCAR) for both kindergarten (Little’s MCAR test: χ2(177) = 896.6, p < .001) and first grade (Little’s MCAR test: χ2(290) = 1,036, p < .001). However, further examination of the data led to the determination that the missing values were missing at random (MAR). The missing data was primarily attributable to the inclusion of nine new schools during the winter assessment. Consequently, the percentage of missing data varied within the range of 20% to 39%. After determining that the data was missing at random, full information maximum likelihood (FIML) was used in the estimation of our model to handle the missing data.
RQ1: Performance Analyses
For each of the measures, we provide either theta scores, derived from CAT measures, or z-scores, for the fluency measures, to ensure a consistent scale for comparison across different tasks. We computed means and standard deviations separately for each timepoint (fall, winter, and spring) and English proficiency designation (EO, EP, and EL). In addition, we ran an analysis of variance (ANOVA) to detect performance differences based on English proficiency at each time point. To further investigate the location of potential differences, we performed post hoc analyses using the Bonferroni–Hochberg correction to account for and manage family-wise error rates in the significance tests. To quantify the magnitude of these group differences, we calculated the effect sizes (Cohen’s d).
RQ2: Linear Growth Models
We built linear mixed-effects models, separately for kindergarten and first grade, with full information maximum likelihood estimation to analyze and compare growth patterns, using the “lmer” function from the lme4 package in R (Bates et al., 2015). Initial unconditional growth models contained three random effects: district, school, and student. The only fixed effect was time, given that data was collected at three different time points. To ensure that our coefficients reflect the average monthly change, time was treated as a continuous variable. Given the intervals between assessment periods, time was represented as 0 (fall), 3 (winter), and 6 (spring).
When building the unconditional growth models, we followed McNeish’s (2014) guidance and removed any random effect with an intraclass correlation coefficient (ICC) below 0.05 to account for significant clustering effects—as a low ICC indicates that less of the observed variation is due to differences between these groups. The ICC, in this context, quantifies the proportion of variance in student assessments that can be attributed to differences between districts, schools, or students. Moreover, when models were singular—indicating that certain variance components in the random effects structure were estimated to be zero, reflecting potential overparameterization—we removed random slopes for school and district.
In the conditional models, we included English proficiency designation fixed effects, as well as an interaction term for English proficiency designation and time to examine the differences in growth patterns among the different proficiency groups (EO, EP, and EL) over time. In addition, to account for differences in socioeconomic status (SES) among schools, we included a school-level fixed effect into our models. This addition aims to control the influence of variations in SES. A student-level SES variable was not included due to large amounts of missing data at the individual level (see Table 1). Finally, to further quantify the change in the amount of variance explained by each random effect through the inclusion of the English proficiency groups and SES, we computed pseudo-R2s, which represent the variance reduction between the unconditional and conditional models due to the addition of a covariate.
Results
RQ1: Performance Analyses
Kindergarten
Table 3 shows the mean and standard deviations for each measure given in kindergarten by English proficiency designations, and the ANOVA results. We found significant overall differences for all measures; picture naming showed the strongest overall differences, followed by sentence repetition and deletion. Post hoc analyses showed that both the EO and EP groups outperformed ELs on all measures. Regarding the comparisons between EP and EO students, the EP group outperformed the EO group in letter naming fluency at the beginning of the school year, but the difference was not significant at the end of the year. Conversely, for picture naming, the EO group outperformed the EP group in winter and spring.
Performance on Reading Measures by Timepoint and English Proficiency Designation for Kindergarten.
Note. Means are presented with standard deviations in parentheses. For post hoc comparisons, we corrected for family-wise error rates using the Bonferroni–Hochberg correction; EO = English-only, EP = (re-)classified English-proficient, EL = English learner.
p < .05. **p < .01. ***p < .001.
First Grade
Table 4 shows the results for the sample in first grade. The results followed a trend similar to that observed in kindergarten; picture naming still showed the strongest overall differences between ELs and the EO and EP groups, followed by sentence repetition and deletion, although differences in deletion were smaller in winter and spring of first grade. Post hoc analyses showed that both the EO and EP groups outperformed ELs in all measures, with particularly large differences in picture naming and sentence repetition. We also found large differences between EP students and ELs in word reading, compared with the difference between ELs and their EO peers. Aligned with this finding, the EP group outperformed the EO group on most measures except for sentence repetition and picture naming.
Performance on Reading Measures by Timepoint and English Proficiency Designation for First Grade.
Note. Means are presented with standard deviations in parentheses. For post hoc comparisons, we corrected for family-wise error rates using the Bonferroni–Hochberg correction; EO = English-only, EP = (re-)classified English-proficient, EL = English learner.
p < .05. **p < .01. ***p < .001.
RQ2: Linear Growth Models
Kindergarten
Initial unconditional growth models can be found in the supplemental materials (see Table S3). Most models contain random intercepts and slopes for students, and random intercepts for school and district, except for deletion, RON, and letter naming, where the district random intercept showed an ICC lower than 0.05. Intraclass correlation coefficients attributable to school ranged from 6% to 16%, while the ICC associated with the district was around 12% for picture naming and sentence repetition. Correlations between the student intercept and slope suggest a negative relationship between initial performance and growth rate within students. In other words, students with a higher initial score tended to exhibit a slower growth rate, and vice versa.
Table 5 shows the conditional growth models, using the EO group as the reference group; Table S4 shows the same models with EP students as a reference group (to obtain EP-EL contrasts). School-level SES was significant and negative for all tasks, suggesting that schools with a higher number of students eligible for free or reduced-price meals, or whose parents/guardians have lower educational levels, tend to have lower scores, indicating a negative correlation between SES and student performance. Linear growth plots by group are presented in Figure 1 (panel A). All measures showed a positive and significant effect of time, suggesting that the measures were able to capture growth throughout the semesters in the EO group. The EL fixed effect, compared against both the EO and EP group, was negative and significant for all measures (p < .001), suggesting that ELs scored notably lower on all measures in fall. In contrast, the fixed effect of the EP group compared with EO was only significant and positive for letter naming (0.26, p < .05). This result implies that the EO and EP groups exhibited similar performance levels during the fall assessment, with the exception of letter naming, where the EP group outperformed the EO group.
Conditional Models for Kindergarten Outcomes, Using the English-Only Group as the Reference Group.
Note. SE = standard error; EO = English-only; EP = (re-)classified English-proficient; EL = English learner; RON = Rapid object naming; SES = socio-economic status. Pseudo-R2s were computed using the following formula: (τ00 unconditional growth model- τ00 conditional growth model) / τ00 unconditional growth model × 100.
p < .001.

One-Year Growth on Reading Tasks by English Proficiency Designation for Kindergarten (A) and First Grade (B).
We further examined the interactions between time and English proficiency designation and found that only deletion showed a significant interaction term (0.17, p < .001) with EO as the reference group. The non-significant interaction effects, along with the significant effect of time for picture naming, sentence repetition, letter naming, and RON imply that growth patterns did not significantly differ between the groups. However, in the case of deletion, the positive and significant interaction term indicated that the EL group experienced a greater growth rate compared with the EO group, though this was not significant when compared with the EP group.
First Grade
Final unconditional growth models are shown in Table S5. ICCs for school ranged from 12% to 5%, for district from 17% to 7%. Correlations between student intercept and slope were all close to 0, indicating no linear relationship between the initial status and growth rate within students, with the exception of deletion that had a negative correlation of −.17.
Conditional models are presented in Table 6, with the EO group serving as the reference group, and in Table S6 with the EP group as reference. Linear growth plots for each group can be found in Figure 1 (panel B). School-level SES was also significant and negative for all tasks. We also found a significant and positive effect of time for all measures. The EL fixed effect, when compared against the EL and EP group, was negative and significant for all measures (p < .001), indicating once again that the EL group had significantly lower scores than the EO group in fall. The fixed effect of the EP group, when compared against the EO group, was significant and positive for deletion, word and nonword reading, and RON, indicating that the EP group scored higher than the EO group in fall. However, for picture naming, the EP group’s performance was lower than the EO group’s (−0.16, p < .05), and for sentence repetition, their performance was similar.
Conditional Models for First Grade Outcomes, Using the English-Only Group as the Reference Group.
Note. SE = standard error; EO = English-only; EP = (re-)classified English-proficient; EL = English learner; RON = Rapid object naming; SES = socio-economic status. Pseudo-R2s were computed using the following formula: (τ00 unconditional growth model- τ00 conditional growth model) / τ00 unconditional growth model × 100.
p < .05. **p < .01. ***p < .001.
Regarding the interactions between time and EL group, only deletion—in line with the results from kindergarten—and nonword reading exhibited a significant and positive interaction when the reference group was EO. These results suggest that the EL group demonstrated a higher growth rate on these tasks compared with the EO group, while exhibiting comparable growth on the remaining tasks. When using the EP group as a reference, a similar trend was observed for deletion: the EL group grew faster than the EP group (0.04, p = .016).
Discussion
ELs are a heterogeneous population with a range of proficiency in English and a range of early literacy skills (López & Foster, 2021; Vargas et al., 2023). They are also more likely to experience poverty and live in neighborhoods with underperforming schools (National Center for Education Statistics, 2022; Quintero & Hansen, 2021). It is well established that the challenges associated with growing up in poverty adversely affect children’s performance on language and literacy measures (Bhattacharya, 2014; Dolean et al., 2019). With ELs, there is a risk of confounding the effects of poverty with emerging English proficiency on performance on literacy measures. However, English proficiency has been found to uniquely affect performance on English measures in previous studies (Baker et al., 2023).
This paper contributes to the emerging evidence on the role of English proficiency on performance on early literacy measures. Universal screening only holds the promise of the early identification of reading problems if the individual measures within have technical adequacy and can accurately identify students in need of additional instructional support. There is a growing population of students nationally that speak languages other than English at home and estimating their reading and language abilities must play a central role in developing early literacy assessment solutions that are accurate, fair, and scalable. Even though it is widely accepted that students with emerging English proficiency may not be fairly assessed on English assessments, the practice widely continues across the United States. The purpose of this research was to explore how ELs perform compared with English-proficient students and to provide information about how quickly they might catch up to their peers. Our findings may inform the interpretation of ELs’ performance and guide state policy makers when deciding on measures and on the language of assessment for ELs. These decisions will affect ELs’ performance and potentially identification rates. Our data provide evidence that ELs score significantly lower than their English-proficient peers across all measures in kindergarten and first grade.
In addition, because there are no statistical differences in growth rates between the ELs, EP, and EO students, ELs start off significantly lower and this parallel development does not result in ELs catching up to their more English-proficient peers. In both the kindergarten and first grade sample, the greatest differences between EO and EL students were on the picture naming task, with sentence repetition and deletion close behind. The differences found on the measure of expressive language (picture naming and sentence repetition) are convergent with an abundance of evidence indicating that multilingual children will remain behind their same age English-proficient peers in English language development (Durán & Wackerle-Hollman, 2018).
These data support prior evidence that, when possible, we should assess children in both English and their home language to obtain more accurate estimates of their ability levels. It is well documented that bilingual children will demonstrate lower performance in each language when compared with monolingual norms, rather than when their total language ability is considered (De Houwer, 2023; Gross et al., 2014). Furthermore, we ought to collect other forms of converging evidence including information about home language of exposure and language of instruction to make informed decisions about the identification of language and reading risks or delays (Baker et al., 2022; Castilla-Earls et al., 2020; Francis et al., 2020; Mancilla-Martinez et al., 2020). This is particularly important when measuring vocabulary as multilingual children are known to have vocabulary distributed across their languages and measurement in only one language may underestimate the overall size of their vocabulary (Anaya et al., 2018; Gross et al., 2014).
We also considered EP students, multilinguals (re-)classified as English-proficient, as a separate group. This population has not been sufficiently studied to understand how they may differ both from the EO and EL populations. Our results indicate that, contrary to popular expectations, EP students do exhibit some important performance differences when compared with the EO group. Even if a student is classified using Language Proficiency Assessment (e.g., ELPAC) as English proficient (EP), this study also suggests that students exposed to multiple languages in their environment may still require additional support, particularly in oral language skills such as expressive vocabulary, especially at the start of their schooling. In kindergarten, EP students scored lower on expressive language (picture naming) than their EO peers, but outperformed them on deletion and letter naming. In first grade, they outperformed the EO group on all measures except for sentence repetition and picture naming. Although we cannot provide a definitive explanation for these findings, we speculate two possible reasons. First, while we controlled for school-level socio-economic status, it could be that the population of EP students may have come from families with higher levels of education that led to more opportunities for literacy practice at home (Sénéchal & LeFevre, 2014). Second, this may be related to the hypothesis that students with higher proficiency in both languages can draw on linguistic resources from both, thereby supporting metalinguistic skills such as phonological awareness and facilitating the cross-linguistic transfer of knowledge (Bialystok & Barac, 2012). Nonetheless, these trends suggest a need for more nuanced approaches to assessing multilingual students, recognizing that differences in performance are to be expected across measures and across different levels of English proficiency.
Furthermore, there are common misconceptions about how quickly ELs catch up to their English-speaking peers. We, therefore, explored how likely it was for ELs to catch up to their EO peers within 1 year and whether or not early literacy measures in English are appropriate for use in first grade. We found that the EL-EO performance gap persisted on all reading and language measures even after 1 year of reading instruction in English. We only found two exceptions: In kindergarten, ELs had statistically significantly higher growth rates in deletion; in first grade, in deletion and nonword reading. Importantly however, these differences did not result in a closing of the EL-EO performance difference. Our findings show that it is crucial to develop early literacy measures that reflect the complexity of language variation experienced by multilingual children and consider and/or assess all languages they are exposed to (Castilla-Earls et al., 2020). To more effectively assess ELs, we must consider three levels of influence on language and reading development: home language exposure, language of instruction, and ability level (Baker et al., 2022; Francis et al., 2020; Mancilla-Martinez, 2021). Over-reliance on English assessments can lead to underestimating ELs’ abilities, perhaps ultimately leading to misinterpretation of limited English proficiency as a reading delay. This could result in over-identification of reading problems and many reading intervention practices may not effectively address the underlying English language learning needs of these students. Such misidentification, resulting from failure to consider the systematic performance differences reported here, can inadvertently perpetuate deficit-based thinking about ELs and hinder their overall progress (Project ELITE2 et al., 2018; Mancilla-Martinez et al., 2020).
Although it is not surprising that students with developing English proficiency would score significantly lower on English early literacy measures, this is rarely taken into consideration when states adopt universal screening procedures that emphasize English assessment with little guidance about how to interpret the performance of ELs. In fact, many states have approved lists of reading screeners that only include English assessments (i.e., Michigan, Kansas; National Center on Improving Literacy, 2023). Research is needed to guide an approach grounded in what we know about bilingual and biliteracy development. Ultimately, the development of approaches that will more fairly and accurately identify ELs at risk for reading delays are needed.
Implications for Practice
By measuring children’s skills in their home language and English, we ensure that language, as the foundation of literacy, is adequately accounted for. In addition, we gain a comprehensive understanding of their abilities across all languages which, in turn, supports effective interventions for children with reading difficulties, particularly ELs. Not only should language be considered in assessment, but also in instruction. Once a language need is identified teachers should provide instruction that scaffolds and fosters language development in addition to reading instruction.
Based on the findings from this study and aligned with previous research, educators should attend to the English language proficiency of their multilingual students to better understand and interpret their performance on early literacy measures administered in English. Poor performance in these tasks should not automatically be interpreted as indicative of a reading problem, but could be more of an indication of emerging English proficiency. Including strategies to support English language acquisition during their reading instruction can better support the reading development of ELs.
Many studies have documented that ELs lag behind their same-age EO peers largely because of the difficulty of learning to read in a language they are still in the process of acquiring (Mancilla-Martinez et al., 2020). Our findings align with this research finding, showing that while ELs demonstrate growth across all early literacy measures, significant performance gaps exist. This suggests that developing English proficiency interferes with understanding instruction and with comprehending text, even if ELs are able to decode the text (Grimm et al., 2018; Mancilla-Martinez et al., 2020). Therefore, teachers responsible for reading instruction should embed evidence-based strategies that are also focused on scaffolding English language development through the use of realia, visual images, opportunities for practice, and role-play (Cardenas-Hagan, 2020; Herrell & Jordan, 2016).
Our findings complement those of a recent meta-analysis (Ludwig et al., 2019), which found that targeted interventions for ELs had strong effects improving word reading skills and moderate effects in improving in reading comprehension. This suggests that while ELs may not fully catch up to their peers, especially in oral language tasks, early intervention focusing on phonological awareness, phonics, and vocabulary could accelerate their growth trajectories in foundational reading skills and potentially narrow the performance gaps identified in our study.
Limitations and Future Research
Multilingual students are a heterogeneous population. Therefore, our analyses would have benefited from richer information on children’s home language environments and the variability in English proficiency in the EL population. Future research should address this limitation by incorporating student’s ELPAC scores as a continuous variable, as opposed to the categorical English proficiency designation. More detailed linguistic exposure information can also help us to better understand strengths and weaknesses in both English and the child’s home language(s).
The study would have also benefited from more precise information about the language of reading instruction these children were receiving and the amount of time and focus of instruction. We do have information about what percent of the sample were in dual language programs, but there is no information on the curriculum used in these schools or the amount of instructional time devoted to English and Spanish instruction. It is likely that language and quality of instruction play an important role on children’s performance on these literacy measures and in future research it would be important to control for this effect. In future research it would also be informative to assess children in both English and Spanish and compare performance across measures in both languages.
In addition, while this study employed early literacy measures commonly used in screening batteries, it did not examine their classification accuracy (e.g., AUC, sensitivity, and specificity), particularly across the different groups in our study. Future research should assess the effectiveness of these measures, individually or combined, in accurately identifying students who need additional support, but also including measures in both English and Spanish. This would provide more data to guide instructional decision-making and benchmarking decisions in English and Spanish and more guidance about how to interpret performance across languages.
Furthermore, we continue to follow these students longitudinally. This will (a) shed light on whether ELs eventually “catch up” with their EO and EP peers, as suggested by others (e.g., Mancilla-Martinez & Lesaux, 2011), which could not be assessed over the course of this study’s 1-year period, and (b) address likely cohort effects observed due to the data collection taking place immediately after COVID-19-related distance education measures were lifted, which likely had profound effects on the learning of this particular cohort of students. Recent reading performance data from the National Assessment of Educational Progress (U.S. Department of Education, n.d.) indicated that ELs were disproportionately negatively affected by the COVID-19 pandemic, Villegas and Garcia (2022) also showed that the pandemic exacerbated pre-existing gaps between ELs’ and EO students’ performance on early reading screening measures, which we know already existed before the pandemic (Grimm et al., 2018). Distance education during the pandemic was particularly difficult for EL students given language barriers that were even more difficult to overcome online. In addition, distance education required access to a reliable internet connection and devices that could support online learning platforms. Given that EL students are more likely to experience poverty (National Center for Education Statistics, 2022; Quintero & Hansen, 2021) access to these resources is likely also more limited in this population.
Finally, other researchers could add to these findings from California by conducting similar investigations in other states and regions, in order to create a body of research that generalizes across different educational systems, demographics, and policies.
Conclusion
ELs are a heterogeneous population but, by definition, they all have emerging English proficiency, and they speak languages other than English at home. Understanding the performance of ELs on early literacy measures, as well as the interpretation and use of their scores are particularly important in light of the broad adoption of universal screening for reading problems across the United States, much of which is done using tasks such as the ones described in this study. States tasked with providing guidance on the implementation of universal reading screening ought to pay more careful attention to the language of assessment to ensure ELs are fairly tested and not inaccurately classified.
ELs have been shown to be over-identified with specific learning disabilities, particularly in the area of reading (Institute of Education Sciences, 2021, U.S. Department of Education, 2017). This could further perpetuate inequitable educational outcomes. Instead, we need to acknowledge the limitations of assessing ELs only in English and develop tasks and approaches to assessment that more accurately reflect their bilingual language learning experience and ability.
Supplemental Material
sj-docx-1-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-1-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-docx-2-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-2-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-docx-3-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-3-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-docx-4-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-4-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-docx-5-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-5-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-docx-6-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-docx-6-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Supplemental Material
sj-txt-7-ldx-10.1177_00222194251339470 – Supplemental material for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures
Supplemental material, sj-txt-7-ldx-10.1177_00222194251339470 for Comparing the Performance and Growth of Linguistically Diverse and English-Only Students on Commonly Used Early Literacy Measures by Lillian Durán, Julian M. Siebert, Mónica Zegers, Nuria Gutiérrez, Francesca Pei, Hugh Catts, Yaacov Petscher and Maria Luisa Gorno-Tempini in Journal of Learning Disabilities
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
Drs. Durán, Zegers, Pei, Gorno Tempini, and Siebert’s time was supported by the State of California through Senate Bills 109 and 129, and Assembly Bill 103 of 2019, 2021, and 2022, respectively, in item 6440-001-0001, support for the University of California San Francisco; the Charles and Helen Schwab Foundation; the National Institute on Deafness and Other Communication Disorders (K24C015544); the National Institute of Neurological Disorders and Stroke (RF1NS050915). Drs. Petscher, Catt, and Gutierrez’s time was supported, in part, by the Chan Zuckerberg Initiative.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
