State Implementation of Language Acquisition Policies and Reading Achievement Among Hispanic Students

Abstract

National Assessment of Educational Progress data were analyzed to assess differences in reading achievement for Hispanic fourth graders across states with varying policies on language acquisition, controlling for student and school characteristics. Results indicated that (a) both Hispanic English language learner (ELL) and non-ELL students in states with stronger bilingual emphasis and more Title III funding have significantly higher reading achievement, (b) more preservice training on ELL issues and more professional development for working with diverse students had a significant effect on reading achievement among non-ELL Hispanics only, and (c) additional time to institutionalize immersion approaches did not narrow the reading achievement gap.

Keywords

English language learners educational policy achievement English-only education education reform

The history of ill-defined and regularly shifting laws that demand equitable educational opportunities for minority-language students is well documented (Crawford, 2000, 2005; Deschenes, Cuban, & Tyack, 2001; Leibowitz, 1971; Lyons, 1995; Ricento, 1996; Wiley & Wright, 2004). One of the key issues contributing to the failure to address the needs of English language learners (ELLs) is that the implementation of language acquisition models depends much on the political and ideological context of individual school systems and on the part of individual educators. Within this context, proponents of sheltered/structured English immersion (SEI), the language acquisition approach that replaced bilingual education in Arizona and California, claim that immersion settings promote English acquisition and achievement more quickly than bilingual education (e.g., Rossell, 2002). There is research, however, favoring bilingual programs (e.g., Slavin & Cheung, 2003). With 22% of Hispanic youths (U.S. Department of Education, 2007) and 59% of Hispanic ELLs (Fry, 2003) dropping out of school, a paramount issue is to evaluate how differential educational reform efforts have influenced achievement for this population.

In this article, we present findings from an analysis of 2005 and 2007 National Assessment of Educational Progress (NAEP) data used to assess the differences in reading achievement for Hispanic students—both ELLs and non-ELLs—across states with varying language acquisition policies. Our study is the first to examine student outcomes across states with the largest Hispanic populations while considering factors related closely both to state policies and language methodology implementation (e.g., teacher training). This study exploits the fact that 2005 is the first year wherein fourth-grade cohorts in all states included in the analyses have been exposed to a single language acquisition policy (i.e., students included in the study did not experience the policy prior to California’s Proposition 227 or Arizona’s Proposition 203, electoral initiatives that marked the beginning of SEI). Analysis of 2007 data allowed for not only replication but also a closer examination of the effects of model institutionalization, comparing changes over time.

Much of the previous research has centered on smaller scale evaluations at the school or school district level. By examining reading achievement scores in the context of both the level and longevity of implementation of language acquisition policies across states, findings from the present study provide a higher degree of generalizable evidence regarding the effectiveness of efforts aimed at closing the achievement gap for Hispanic ELLs. The findings from the present study can guide future policy toward considering factors necessary for sustained, long-term achievement.

Specifically, our research examined the following questions: (a) Does a stronger bilingual education emphasis at the state level have a significant effect on reading achievement among Hispanic ELLs and Hispanic non-ELLs? (b) What are some of the key variations among states in the ways teachers are prepared to teach ELLs, and to what extent do these variations account for differences in reading achievement? (c) To what extent does professional development in working with students from diverse backgrounds contribute to reading achievement? (d) Is reading achievement positively related to state-level per ELL pupil Title III funding? and (e) To what extent does longer institutionalization of a new model, in this case structured English immersion, improve outcomes?

Review of the Literature

The focus of the present study is on the relationship between state policies on language acquisition and student outcomes. Nevertheless, the distinctions and commonalities among state policies, the quality or fidelity of the implementation of models (or lack thereof) by schools and teachers, and the extant research that has established the rationale undergirding the various models necessitates, at the very least, a brief contextualization. In the sections that follow, we provide a succinct review of federal language acquisition policies, a description of some of the different conceptualizations of language models implemented at the state level, and a summary of the conflicting research on the effectiveness of different language acquisition methods. We then turn to evidence supporting the notion that Hispanic non-ELLs may also be affected by policies aimed at Hispanic ELLs.

Federal Language Acquisition Mandates

Critics of the language models that replaced bilingual education charge that its emphasis on English acquisition is rooted in policies that have been used to exert social control via language to produce subordination or assimilation, policies reminiscent of earlier nationalist periods of our history (see Wiley & Wright, 2004). Many others have contributed to the extant literature on language contact, language and identity, and the political and ideological dimensions of de jure and de facto national and state educational language policies (e.g., González & Melis, 2000; Lyons, 2005; Madrid, 1990; May, 2003, 2005; Wiley & Wright, 2004). Also, well documented are the issues related to the implementation of language acquisition policies (e.g., Combs, Evans, Fletcher, Parra, & Jiménez, 2005; Gándara, Rumberger, Maxwell-Jolly, & Callahan, 2003). Space limitations preclude a comprehensive review of the history of language acquisition policies in the United States; however, to contextualize the state-level policies examined here, we provide an overview of the federal mandates addressing the needs of ELLs in the United States.

The first legislation in our recent history that focused on the rights of minority-language students was the Bilingual Education Act of 1968. The law did not prescribe a specific language acquisition program but provided funding for the support of educational resources and teacher training. Participation by states and school districts, however, was voluntary. Language policy consolidated quickly in 1974 with the Supreme Court decision in Lau v. Nichols, which held that the lack of language accommodations for students with limited English constituted a violation of the 1964 Civil Rights Act. Remedies were quickly codified into law that year as the Equal Educational Opportunity Act (EEOA), which prohibited segregation of students based on race and national origin, while at the same time mandating school district action to overcome linguistic barriers. Capping these developments was the amendment of the Bilingual Education Act in 1974 to clarify and strengthen the original legislation in light of legal developments; it was at this time that bilingual education was mentioned explicitly in the law.

Successive reauthorizations of the Bilingual Education Act between 1978 and 1988 have changed the language of the law from mandating exclusively bilingual education strategies to including immersion strategies (see Gándara & Rumberger, 2009, p. 765; Ramirez, Yuen, Ramey, & Pasta, 1991). The Fifth Circuit Court of Appeals interpreted the provisions of EEOA in Castañeda v. Pickard (1981) and required that English acquisition programs be (a) scientifically based and supported by experts in the field, (b) implemented with adequate resources and personnel, and (c) evaluated for effectiveness. The lack of specificity in requirements, however, has resulted in a great deal of heterogeneity in the policies undergirding each state’s language acquisition model as well as in the ways each state has implemented its approach to instruction of ELLs (Combs et al., 2005; Gándara et al., 2003; Ovando, 2003).

State-Level Language Acquisition Mandates

In an apparently accelerating trend across the country, states have been eliminating bilingual education programs (see Crawford, 2007). In 1998, California voters replaced bilingual education with SEI via Proposition 227. Soon after, Proposition 203 (2000) replaced bilingual education with SEI in Arizona. Massachusetts, the first state in the nation to enact bilingual education in 1971 (Transitional Bilingual Education Bill), eliminated bilingual education with the passage of ballot measure Question 2 in 2006. Some researchers claim that the move away from bilingual education has been based on a flawed analysis of academic failure among ELLs that did not take into account the quality of implementation of the bilingual model (e.g., Combs et al., 2005; Wiley & Wright, 2004; Wright, 2005). For example, the initiatives that began in California attributed the poor performance and high drop-out rates among ELLs to bilingual education (Rossell, 2002). Approximately, 70% of the students who qualified for linguistic support, however, were not receiving it (Gándara et al., 2003).

The goals of most language acquisition programs are to transition minority-language students into English (Ruiz, 1984). Despite similar goals across numerous language acquisition programs, SEI and bilingual programs have been conceptualized as disparate language models (e.g., Rossell, 2002). In part, this is a result of expansive programmatic labels (i.e., labeling English-only programs “SEI” and programs that incorporate native language “bilingual education”) and a focus on the extremes across the variability of programs.

SEI

Although the language acquisition programs in Arizona and California are both referred to as “SEI” and were both authored by Ron Unz, the ballot initiative in California specified sheltered English immersion as the language model that would replace bilingual education, whereas the ballot measure in Arizona used the term structured English immersion. According to Krashen (1997b), SEI was a term that had not been “in current use in the language education profession but is a confusing combination of terms” (Krashen, 1997b). Nevertheless, the guidelines for both sheltered English immersion and structured English immersion in California and Arizona are nearly identical and used interchangeably as “SEI” (Parrish et al., 2006). The models outlined in California and Arizona, however, diverge significantly from both the sheltered English (e.g., Krashen, 1985, 1991, 1997a, 1997b; Ramirez et al., 1991) and structured English immersion (Baker, 1998; Baker & de Kanter, 1983; Ramirez et al., 1991) models proposed in the extant literature.

Krashen (1981) was the first to conceptualize sheltered English as a key component of bilingual education. According to Krashen (1985, 1991, 1997a), the best bilingual programs not only focus on primary instruction in students’ native language but also use English as a second language (ESL) instruction (i.e., direct English instruction focusing on grammar and usage) and sheltered English to support students’ English acquisition. Once students have reached an intermediate level of proficiency in English, students are placed in classes that are composed of ELLs with similar English proficiency (i.e., “sheltered”) across content areas, where they receive content-specific instruction in English. Sheltered English is thus the emphasis of English acquisition via content (e.g., science) with native language and direct English support. Moreover, a key component of sheltered English is linguistic support in students’ native language as necessary (Echevarria & Graves, 2010; Ramirez et al., 1991).

Although sheltered English (Krashen, 1981) was introduced before SEI (Baker & de Kanter, 1983), Baker (1999) asserts he and his colleague were the first to recommend the program (Baker & de Kanter, 1983), which they conceptualized based on successful immersion programs in Canada. Nevertheless, the earlier SEI program descriptions (Baker, 1999; Baker & de Kanter, 1983; Ramirez et al., 1991) are indistinguishable from the guidelines of sheltered English (Krashen, 1981). For example, Baker (1998) defines SEI as a program where “English is used and taught at a level appropriate to the class of English learners . . . and teachers are oriented toward maximizing instruction in English and use English for 70% to 90% of instructional time, averaged over the first three years of instruction” (para. 5).

There are clear differences in the conceptualization of SEI (Baker, 1998; Baker & de Kanter, 1983; Ramirez et al., 1991) and the mandates in California and Arizona. One notable difference is the discrepancy between the amount of time and the language in which students in SEI receive support. As previously mentioned, SEI as conceptualized includes native language support for a number of years (Baker, 1998), whereas the SEI model mandated in Arizona and California is not intended to exceed 1 year and limits native language support (see Wright, 2005). Despite Baker’s support for the initiatives that replaced bilingual education in California (Baker, 1998), he states,

California’s Proposition 227 imposes the added constraint of requiring LEP students to be mainstreamed after one year. Although many of the SEI programs described so far mainstream their students in two to three years, compared to the five to eight years called for by a full bilingual education program, the only SEI program I know that can satisfy California’s new law is Seattle’s Newcomers Program. (para. 16)

The Seattle program Baker (1998) refers to is also distinct from the SEI model in Arizona and California in that it involves 0.5 year to 1 year of intense instruction in English (Baker, 1998). Thereafter, students continue to receive assistance via ESL and native language support as well as bilingual teacher aides in the mainstream classroom (Baker, 1998). Thus, when other states list SEI as one of the evidence-based language acquisition models they use to meet the needs of ELLs, it is not the same model defined by the initiatives in California and Arizona, despite the fact that they share the SEI label. Indeed, the programs in California and Arizona are not SEI programs but English-only programs, with no or minimal emphasis on bilingual approaches.

Bilingual education

Although school districts in Colorado, New Mexico, Nevada, Texas, Florida, and Wisconsin use bilingual education models, the laws addressing the linguistic provisions for instruction of ELLs vary substantially across those states. In the Method section, we discuss in more detail the law of each of the states in the present study as well as our rationale for coding the states’ language acquisition policies.

The theoretical framework undergirding bilingual education is based on the Interdependence Hypothesis (IH; Cummins, 1979). The IH asserts that for individuals who have not had formal schooling in their native language, academic instruction in the native language fosters academic proficiency (in the native language). The effective transfer of knowledge or proficiency to the second language is enabled, given sufficient exposure to the second language. Although bilingual education programs tend to share the basic premise of providing academic instruction in the student’s native language as they also acquire English, there is also much variation across programs (Freeman, 1998). Some programs that are labeled transitional early-exit bilingual programs are aimed at achieving English fluency by third grade, whereas transitional late-exit bilingual programs incorporate an increasing amount of academic instruction in English until fifth or sixth grade (see Ramirez et al., 1991). An approach that has accumulated empirical evidence for improved student learning (e.g., Collier & Thomas, 2004) is two-way immersion bilingual programs (also called dual immersion). Dual immersion programs serve both minority-language and majority-language students, with approximately 50% of students from each group. Dual immersion aims to shelter the environment of the second language being learned and encourage an equitable balance of the two languages represented in classrooms (see Lindholm-Leary, 2001). Thus, one challenge to evaluating state-level language acquisition policies is that the two basic models (SEI and bilingual education) are implemented in a variety of forms. We address this through operationalizing the “bilingual emphasis” in the state policy rather than simply categorizing states as either bilingual or SEI.

Previous Attempts to Compare Language Acquisition Models

A focus on the effectiveness of language acquisition methods has been the chief means of addressing the problem of Hispanic ELL achievement. As a result, many researchers have conducted studies in an attempt to find out whether there are differences in the effectiveness of language acquisition methods on achievement outcomes, with conflicting results. For example, Baker and de Kanter (1981) conducted a narrative review of 28 studies and concluded that the effectiveness of bilingual programs was not supported empirically, particularly for teaching nonlanguage subjects. Willig (1985), however, conducted a meta-analysis of the studies included in the Baker and de Kanter report and found that the effects of bilingual education programs were positive. In a separate narrative review, Rossell and Baker (1996) examined 75 studies that compared students in a bilingual program with students receiving instruction primarily in English. Overall, Rossell and Baker found more studies favored all-English programs when compared with bilingual programs. However, Greene (1998) replicated this meta-analysis using only those studies determined to be methodologically sound and found that the use of native language instruction among ELLs showed moderate benefits when compared with ELLs who were taught only in English.

More recent meta-analyses indicate that in general, research favors bilingual programs over English-only programs (Rolstad, Mahoney, & Glass, 2005; Slavin & Cheung, 2003). Moreover, other studies have found that the effects of bilingual education appear to strengthen with time (Gersten & Woodward, 1995; Salazar, 1998; Thomas & Collier, 1997).

Ethnic Identity and Stereotype Threat

The type of language acquisition model used may influence academic achievement by contributing to school environments that exacerbate or mitigate stereotype threat, which is a risk of fulfilling a negative stereotype about the group to which one belongs (Steele & Aronson, 1995). Although the inability to infer stereotypes among very young children may limit their susceptibility to stereotype threat, it appears that for those who are most susceptible, the ability to infer stereotypes occurs earlier. In a series of experiments, McKown and Weinstein (2003) found that children’s ability to infer stereotypes increases linearly approximately after age 6, peaking at age 10. The researchers found that children from academically stigmatized groups, however, “show earlier and greater awareness of broadly held stereotypes” (p. 511) and attributed their findings to prior research demonstrating that ethnic identity among ethnic minority children is in part shaped by their minority status, making broad stereotypes more salient at an earlier age. McKown and Weinstein explain that children’s awareness of broadly held stereotypes can influence stereotype-laden situations such as standardized testing conditions. The phenomenon of stereotype threat has been confirmed not only for stereotypes based on ethnicity but also for social class (Croizet & Claire, 1998). This makes the issue of stereotype threat for Hispanics—both ELL and non-ELL—a particularly salient issue.

Summary

Overall, there has been considerable conflict generated in the public realm about school-based language acquisition programs. The dilemma is not whether ELLs should learn English but whether language policies address the needs of ELLs. For this reason, careful attention to scientific evidence, as opposed to heated rhetoric, is clearly needed. Moreover, evaluation of the impact of language acquisition policies should not be limited to their effect on direct users of the service (i.e., ELLs) but extend to students who may be indirectly influenced. Indeed, the social and cultural knowledge that is enhanced by specific language acquisition models and the promotion of a supportive school culture that views Hispanic students’ ethnic identities as additive rather than subtractive (Trueba, Guthrie, & Au, 1981; Valenzuela, 1999) may not be limited to those who have not yet become proficient in English.

The Present Study

In contrast to the highly contentious and politicized public discourse regarding educational reform aimed at the language acquisition of ELLs, the academic debate concerning the best method to address the language acquisition of ELLs is not dichotomous. One of the reasons interpreting the effectiveness of instructional programs as mandated by Castañeda v. Pickard (1981) has been problematic is that policy makers have had to rely on findings focusing on academic achievement that is defined differentially across state jurisdictions. To evaluate the effectiveness of educational policies, it is necessary to examine multiple facets of educational reform efforts and achievement among Hispanic ELLs across a single metric.

The fourth-grade NAEP reading assessments

Fourth-grade NAEP assessments are administered in reading, mathematics, geography, science, civics, and U.S. history. In the present study, we focused only on Hispanic students’ NAEP English reading achievement across different states to examine claims regarding the superiority of SEI versus bilingual education in promoting ELLs’ achievement in English. Although focusing on an assessment in English administered to ELLs may introduce validity issues (see Abedi, Hofstetter, & Lord, 2004), other nationally representative data (the Early Childhood Longitudinal Study–Kindergarten Class of 1998-1999, [ECLS-K]) suggests that English proficiency among all Hispanic students is 99% by the second semester of third grade (Reardon & Galindo, 2009). Thus, by focusing only on the fourth-grade reading assessment, we limited issues with interpretability across states with differing language acquisition models while not excluding a substantial number of ELLs.

We also limited the analyses to fourth grade despite the availability of NAEP reading scores for eighth and twelfth grades. We did this because Propositions 203 and 227 had not been in place long enough to be able to include eighth and twelfth grade data for both Arizona and California as those cohorts would not have been exposed to the new language acquisition approaches from the start of formal schooling, as is the case for fourth graders. In addition, the population of interest is one that is markedly at risk, and including eighth and twelfth grade data, would likely introduce a selection threat. Namely, despite findings suggesting that ELLs are not performing as well in eighth grade when compared with fourth grade, Fry (2007) found that the widening NAEP ELL achievement gap from fourth to eighth grade is at least in part attributable to the changing ELL population across grades. Higher achieving ELL students are excluded from ELL data as they are often reclassified as non-ELL in earlier grades, whereas new immigrant are added to the later grades (Fry, 2007). Moreover, whereas most language acquisition models include expectations of proficiency by the end of elementary school, students classified as ELL in eighth grade are often new immigrants and are not a comparable group to the fourth-grade ELL cohort (Fry, 2007). Consequently, to limit issues regarding the changes in the ELL population across Grades 4, 8, and 12 (the grades assessed with reading NAEP), we included only fourth-grade students in all analyses for the present study.

Although NAEP assessments are administered roughly every 2 years, the cohort of students is not the same from one administration to another. This precludes our ability to examine the trajectory of Hispanic ELLs across their schooling experiences to assess more precisely the impact of different language acquisition methods across jurisdictions. Other nationally representative databases that provide the necessary details to examine Hispanic achievement longitudinally are available (ECLS-K; see Reardon & Galindo, 2009). ECLS-K data were designed to be a nationally representative sample of students who were tested in math and reading multiple times between 1998 and 2004. ECLS-K (as well as other ECLS data sets), however, were “not designed to support state-level estimates, as the sample is not representative of particular states” (J. Carlivati, personal communication, October 19, 2009). Although NAEP cannot address longitudinal research questions, it is currently the best data available for examining state-level effects on achievement while controlling for student, teacher, and school characteristics.

Other variables to consider in the evaluation of model effectiveness

Despite an abundance of studies examining the effectiveness of bilingual education, there are limitations in attempting to generalize prior findings to all educational contexts. Extraneous variables such as different curricula and ways of measuring achievement across the states are examples of the numerous variables that can influence empirical results. Thus, critical to the evaluation of the language acquisition models implemented by states is a set of common metrics that provide representative data—a key feature of NAEP. In addition, newly implemented reforms may need time to become institutionalized (Desimone, 2002; Tyack & Cuban, 1995), an effect that single-site evaluations at a single time point might miss.

Language acquisition models are not monolithic in character; details about the variation in implementation may well influence ELL student motivation and achievement. For example, although teachers are required to demonstrate knowledge about the diversity of student backgrounds to meet the proficiency guidelines outlined in most states (Gollnick, 2004), there is likely much variation across states on teacher education in the language acquisition model (precertification), professional development (postcertification), and funding levels for implementation of the particular language acquisition model. Moreover, there is a paucity of research on whether providing teachers with specific training to work with students from culturally diverse backgrounds improves student outcomes (Cochran-Smith, Davis, & Fries, 2003; Furman, 2008; Hollins & Guzman, 2005). To address these issues, we include a measure of the level of required preservice teacher preparation on ELL issues for the eight states included in the study. Moreover, NAEP asks participating teachers to characterize the amount of professional development received on diversity issues; we include the mean level of diversity training at each school in our analyses. Finally, another key indicator of the quality of implementation of the language acquisition program (and indeed, virtually any educational reform initiative) is the amount of spending per pupil. The most important federal funding source for school-based language acquisition is Title III (Office of English Language Acquisition, 2009). Title III funding per ELL student not only reflects resources invested in language acquisition but also a measure of state “effort” invested in language acquisition. As a federal program, all states would have opportunities to apply for and secure this funding, regardless of how wealthy a state is.

Within-group differences

Meece and Kurtz-Costes (2001) assert that one of the most pervasive limitations in research on the achievement among minority populations has been the lack of focus on the within-group differences among minorities. Research has not traditionally focused on the variability within specific ethnic or cultural populations, though many studies have investigated the differences between minorities and their White counterparts. Researchers most often investigate these differences using a deficiency model based on White student performance versus disadvantaged and/or ethnic minority student performance instead of investigating the factors that contribute to variability within a specific population of interest. Thus, findings based on a deficiency model have limited utility.

In line with Meece and Kurtz-Costs’ contentions, Ramirez and Carpenter (2005) analyzed the NELS:88 database and found that differences among Hispanic student academic achievement by family income level were greater than those between Hispanics and White students. Hence, consolidating all minorities when analyzing student achievement data ignores the variability among and within the various minority groups. That variability may provide more information than the current oversimplified comparisons. In the present study, we included only Hispanic students and disaggregated participants by their background as reported to NAEP (Cuban, Puerto Rican, Mexican, or Other).

To summarize our analytical strategy, the states with substantial Hispanic populations included in our analysis were coded according to the respective state law regarding language acquisition for ELLs (see the coding description in the Method section below). Our first task was to determine whether the degree of bilingual education emphasis is positively related to reading achievement. Second, we looked at the effect of variations in the implementation of the program, including state-level mandates for preservice training on ELL issues, reported school-level professional development regarding diversity, and state-level per ELL student Title III spending. Third, we considered whether the effect of bilingual programs varied depending on specific characteristics of the individual Hispanic students in our sample. We conducted this set of analyses first on Hispanic ELLs (the direct recipients of linguistic support services) as well as with Hispanic non-ELLs (who may benefit more indirectly from language acquisition models that simultaneously preserve or are perceived to preserve minority culture, thereby mitigating stereotype threat.) Finally, we replicated analyses conducted on NAEP 2005 for NAEP 2007 data to assess the effect of institutionalization; that is, we examined whether giving the newer SEI programs (i.e., minimal bilingual emphasis) more time to get established closes the gap (or extends their lead) relative to bilingual education programs.

Based on our review of the literature, we used the aforementioned analytical strategy to investigate the following hypotheses, which correspond in order to the research questions previously introduced:

Hypothesis 1: Reading achievement is higher in states with stronger bilingual education emphasis, controlling for student and school characteristics.

Hypothesis 2: There is a positive relationship between the level of state-mandated training for preservice teachers in ELL issues and reading achievement scores, net of other effects.

Hypothesis 3: There is a positive relationship between the mean amount of professional development in working with students from diverse backgrounds and reading scores, net of other effects.

Hypothesis 4: There is a positive relationship between reading achievement and state-level per ELL pupil Title III funding.

Hypothesis 5: The effect of stronger bilingual education emphasis will decline over time, given the institutionalization of language acquisition approaches that replaced bilingual education.

Method

Our core data source was restricted-license data from the 2005 and 2007 Grade 4 NAEP. The NAEP program, which touts itself as “the Nation’s Report Card,” was first instituted in 1969 and collects extensive data on a national and state-representative sample of schools and teachers on a biennial basis. The focus of the assessment is reading and mathematics achievement, although the program measures achievement in other subjects periodically. As noted previously, data are collected on fourth-, eighth-, and twelfth-grade students (as well as their schools and teachers). Thus, the core of our data consisted of measures at two levels of analysis: the student and his or her school. Although not offering perfect measures of student achievement in reading and other subject areas, Hombo (2003) claims that NAEP’s measures of academic achievement are “as precise and reliable as the current state of research . . . can make them” (p. 62). Jones and Olkin (2004, p. 9) support the view of NAEP as among the most innovative and significant large-scale assessment programs of its kind. As a result, NAEP is widely considered “the gold standard” for assessments of achievement (Hombo, 2003).

As previously mentioned, we augmented NAEP data with state-level details on implementation of language acquisition policies drawn from state profiles reported by the U.S. Department of Education’s National Clearinghouse for English Language Acquisition (NCELA, 2009) and other archival sources as needed (e.g., state departments of education). These state-level measures constitute a conceptual third level of analysis (i.e., student, school, and state); however, with only eight states in our sample, estimating effects in an unbiased and efficient manner with a three-level model is problematic¹. Instead, we chose to view the state-level measures (e.g., language acquisition mandates, per ELL student Title III spending) as characteristic of schools in each state. To examine the effectiveness of education reform efforts across states, we had to ensure that the fourth-grade samples included in the study had the opportunity to receive the method of language acquisition in question for an extended amount of time. The year 2005 was the first year wherein all states in the present study had implemented the respective method of language acquisition for at least 5 years (kindergarten through fourth grade), precluding our examination of NAEP from years prior to 2005. The U.S. Department of Education obtains NAEP data from a complex cross-sectional sampling scheme with multistage sampling at the levels of geographic area, schools, and students. The NAEP sample design is specified to draw samples that are representative not only of the nation but also of individual states. Thus, NAEP includes a large representative sample of Hispanic students, both ELL and non-ELL, in the states included in the analyses, providing sufficient power to conduct the analyses.

Compared with other large-scale studies, NAEP’s measures of reading achievement (as well as achievement in other subject areas) are broad and multifaceted; the approach relies on item response theory (Lord, 1980) in using matrix sampling from approximately 150 items, with each student participant answering only a subset of the achievement items. For reading, students answer about 20 to 25 items, with each item exposed to about one fourth of the sample. Following administration of the text, NAEP uses marginal maximum likelihood and conditioning techniques to generate five “plausible values” that represent an estimate of the student’s reading achievement had he or she answered all of the items and not merely a subset. Although this strategy does give analysts a set of measures that represents a broad view of reading achievement, working with plausible values poses some computational challenges. Not all multilevel modeling software can handle a set of plausible values as the dependent variable. In addition, the conditioning technique used to generate the plausible values in NAEP restricts the choice of independent and control variables to only those that were included in the conditioning process. Technically speaking, no new variables (non-NAEP) should be introduced into analyses of plausible values.

SPSS version 17.0 was used to manage and clean the data. To address the design effects inherent in the complex sampling (Raudenbush & Bryk, 2001; Skinner, Holt, & Smith, 1989; Snijders & Bosker, 1999), we used AM statistical software to calculate descriptive statistics and HLM 6.0 to estimate the two-level models that included additional measures of the quality and implementation of each state’s ELL policy. This version of HLM also permitted us to take the five plausible values of reading achievement as the dependent variable. Weighting at the student and school level was also applied, given the stratified sampling and to adjust for nonresponse (Johnson, 1989; Zwick, 1992).

Sample

Our analyses included all Hispanic fourth-grade students who took the 2005 and 2007 NAEP reading assessment in the seven states with the highest proportion of Hispanics in the state population as well as the highest number of Hispanic ELLs in the United States: Arizona, California, Colorado, Nevada, New Mexico, Florida, and Texas (Lazarín, 2006; Pew Hispanic Center, 2007). Although Wisconsin ranks 31st in terms of Hispanics as a percentage of the state population (Pew Hispanic Center, 2007), we also included the state in our analyses because of the explicit law regarding bilingual education (see Independent Variable section below) and to introduce more regional variation. For the Hispanic ELL sample in 2005, there were approximately 5,650 students and 1,030 schools (weighted N) included in the analysis. For the Hispanic non-ELL sample, approximately 10,500 students from 1,630 schools were included. In 2007, we had data for 7,730 ELL students sampled from 980 schools and 11,210 non-ELL students from 1,350 schools.

Reclassification rates of ELLs to English proficient status have been one of the key indicators of the effectiveness of language acquisition methods (Gándara & Merino, 1993). This practice is problematic because the resulting figures “may bear little relationship to the language and academic skills of the students or the effectiveness of the program in teaching those skills” (Gándara & Merino, 1993, p. 333). Regardless of the issues in reclassification rates—particularly across states using differential criteria—proponents of SEI claim that immersion settings promote English acquisition more quickly than bilingual education, pointing to reclassification rates as evidence (e.g., Rossell, 2002). As we are interested in the effect of bilingual education emphasis on the English reading achievement among Hispanic ELL and non-ELLs, students from all Hispanic categories were retained in the analyses. To address the issue with reclassification across numerous states with variable criteria for the ELL analysis, we included students who were classified as ELLs at the time of testing as well as those who had been classified as ELL (“formerly ELL”) prior to testing.

Level 1: NAEP Student-Level Variables

Dependent variable: Reading achievement

As described above, this measure consists of the set of five plausible values from NAEP for student reading achievement, as generated through marginal maximum likelihood techniques based on an individual student’s answers to a subset of the total pool of reading items. As part of the assessment, students read and answered questions about a variety of complete texts representing literary and informational texts. The assessment included both multiple-choice questions and constructed-response questions designed to assess three different contexts for reading: reading for literary experience, reading for information, and reading to perform a task. Students were also assessed on four aspects of reading: forming a general understanding, developing interpretation, making reader/text connections, and examining content and structure (National Center for Education Statistics [NCES], 2009a). In 2005 and 2007, NAEP assessments included vocabulary as a target of item development but not the use of vocabulary in reading comprehension (National Assessment Governing Board [NAGB], 2009).

Level 2: School-Level Variables

Independent variables: Bilingual education emphasis of state in which school is located (0-5)

A review of published work indicated that there was no existing ranking of the degree to which states emphasize a bilingual education approach. Therefore, we conducted an emergent content analysis (Neuendorf, 2002) to develop a ranking for schools in the eight states in this study. We reviewed the laws independently to determine the degree to which bilingual education was emphasized, subsequently discussing any discrepancies in our coding, which resulted in the criteria explained below.

States assigned the highest rankings were those mandating instruction using students’ native language for ELLs; Texas and New Mexico were both assigned a ranking of “5.” In New Mexico, the Bilingual Multicultural Education Act (1978) asserts, “The state’s bilingual multicultural education program goals are for all students, including English language learners” (p. 6.32.2, NMAC [New Mexico Administrative Code]). Texas law (Acts of the 67th Texas Legislature, 1981) requires each school district with an enrollment of 20 or more limited English proficient students in the same grade level district-wide to offer a bilingual education program.

Wisconsin law (2002) requires schools within a school district to design a program, prepare a formal plan of services, and staff respective classrooms with licensed bilingual teachers when there are at least 10 ELL students in grades K-3 or 20 in Grades 4 and higher within a single school building. Wisconsin was ranked a “4,” however, because districts are not obligated to provide services when student numbers across different schools within the same district would meet the state minimum threshold for mandated assistance, as is the case in Texas.

Florida law (2002) and Colorado law (2002) outline linguistic provisions for ELLs; however, language programs may include ESL, immersion, or bilingual education. Given that Florida and Colorado neither mandate nor outlaw bilingual education, they were assigned a rank of “3.” Although Nevada law (NRS 388.405) also includes linguistic provisions that may include ESL, immersion, or bilingual education, Nevada was categorized as a “2” because “Nevada has very few bilingual programs. The few that exist are located in the Clark County School District” (Nevada State Board of Education, 2002, p. 54, para. i)

California law (1998) replaced bilingual education with SEI; however, the law allows bilingual education in cases where parents of ELLs have signed a waiver. In such cases, schools must provide bilingual education when there are least 20 ELLs with waivers within a grade level; however, students with waivers in schools where the minimum of 20 students is not met must be allowed to attend other schools that provide bilingual education. Given the allowances made by the law (see Rossell, 2002), California was assigned a rank of “1.”

Bilingual education was also replaced by SEI in Arizona (Arizona Revised Statutes, 2000). To receive a waiver in Arizona, however, students must demonstrate proficiency in English, be at least 10 years old, or be identified as having special needs with confirmation that instruction in English is not the best option for the child. The differences in waiver requirements between California and Arizona resulted in a rank of “0” for Arizona.

To assess reliability of the coding of bilingual education emphasis, two doctoral-level students were also asked to rank the de-identified laws (i.e., the state names were removed) and supplemental notes (e.g., waivers or the extent to which bilingual education is actually used across schools). The scorers were provided the following rules for scoring: “Rank the following laws in terms of the stringency with which bilingual education is not allowed. The law that explicitly prohibits bilingualism the most should be coded a ‘0.’ Similar laws can receive the same code.” The three sets of scores showed high interrater reliability, with 91.7% agreement across all pairs of ratings. A measure of reliability recommended for content analysis and adjustable for ratings at different levels of measurement is Krippendorf’s alpha (Grayson & Rust, 2001; Hayes & Krippendorf, 2007). Our ratings resulted in an alpha of .983 (assuming ratings are ordinal, with alpha for perfect agreement equal to 1.0).

Level of state-mandated training for preservice teachers in ELL issues

In smaller observational studies, some have found that teachers spend the same amount of time using English across classrooms with different expectations regarding the use of students’ native language (Dolson & Mayer, 1992; Escamilla, 1992; Strong, 1986), which at times may involve an unawareness on the part of teachers of the degree to which they use students’ native language in instruction (Irby, Tong, Lara-Alecio, Meyer, & Rodríguez, 2007). Issues with program fidelity hold true not only for bilingual education programs but also for SEI (e.g., Combs et al., 2005). In larger studies, however, others have found that, overall, the proportion of English and students’ native language used in instruction is consistent with the language models (Ramirez et al., 1991). To partially address the bias introduced by the absence of the degree to which language programs were adhered to in each state, we included a measure reflecting the stringency of guidelines for addressing the needs of ELLs used by each state for teacher training and certification (see Table 1). Menken and Antunez (2001) analyzed state mandates for teacher certification, scoring states on a scale of 0 (no requirement) to 4 (entire course required) on 12 dimensions related to training of teachers of ELL students in the areas of teaching methods, curriculum, assessment, and practice. In addition, Menken and Antunez rated states on one dimension related to the level of training regarding ELL issues that is required of all certified teachers, not only ELL teachers. This dimension was scored 0 (no requirement) to 3 (specific coursework or certification is required). For this analysis, we summed state scores for all 13 dimensions, leading to a possible range of 0 to 51. Among the states included in our analysis, Arizona and Florida have the most comprehensive ELL training (27 points) and New Mexico and Nevada have the lowest level of mandated training with 7 and 9 points, respectively. (See Table 1 for details on the construction of this index.)

Table 1.

Construction of State-Mandated ELL Training (Data Obtained From Menken and Antunez (2001), NCELA, and State Departments of Education)

	AZ	CA	CO	FL	NM	NV	TX	WI
Preservice teacher requirements for teachers of ELLs^a
Methods
Native language literacy	3	2	2	4		1	3
ESL/English language development	4	2	2	4	2		3	3
Content in L1		2	2			1	3	3
Bilingual methods	4		2	4
Curriculum
Materials adaptation	3	2			2	1
Bilingual curriculum	3		2			1
Assessment
Content in L1 or English				4		1
English literacy		2	2	4
L1 literacy			2			1
LEP/language assessment	3	2	2		2	1
Practicum
Culturally/linguistically diverse setting								3
Bilingual education setting	4			4			3	3
Preservice teacher requirements for all teachers^b	3	3	2	3	1	2	0	0
Total score	27	15	18	27	7	9	12	12

Note: ELL = English language learner; ESL = English as a second language; LEP = limited English proficient.

Preservice teacher requirements for teachers of ELLs were scored as follows: 0 = not required, 1 = elective, 2 = demonstrated competence, 3 = required course topic, and 4 = required course.

Preservice teacher requirements for all teachers were scored as follows: 0 = states where there is no requirement that all teachers have expertise or training in working with ELLs, 1 = states where teacher certification standards for all teachers contain reference to “language” as an example of diversity, 2 = knowledge of second language acquisition and strategies to support ELLs and/or strategies or accommodations for ELLs, and 3 = states with specific coursework or certification requirements for all teachers.

Professional development in diversity issues

A single item on NAEP’s teacher questionnaire in both 2005 and 2007 was used to indicate the school-level emphasis on diversity. Teachers were asked to rate, from 1 (not at all) to 4 (large extent), how much professional development they had received in the past 2 years on “strategies for teaching mathematics to students from diverse backgrounds (including ELLs).” To create a school-level measure, we took the mean on this item for all fourth-grade teachers at the school participating in NAEP. Although the NAEP item asks about this preparation in terms of mathematics pedagogy (no similar item is posed to teachers regarding reading), we see this as an indicator of the school’s concern for diversity issues generally. That is, most fourth-grade teachers teach both math and reading; in-service training on teaching mathematics in the context of diversity will likely influence diversity issues in other aspects of the curriculum and teacher practice, including reading pedagogy.

State-level Title III spending per ELL pupil enrolled in program

Another potentially important aspect of the implementation of the specific language acquisition model involves financial resources. The federal government’s primary means for funding K-12 language acquisition is through Title III grants to the states. State departments of education must apply for such grants in a competitive process, and so Title III funding can be seen as a measure of state effort to bolster school-based language acquisition programs. (See Table 2 for a list by state of Title III funding.)

Table 2.

Average Per Student Title III Spending, National Assessment of Educational Progress 2005 and 2007 Analyses

	State spending per student enrolled in Title III ELL programs
	Average 2000-2005	2006-2007
Arizona	US$107.07	US$105.22
California	US$102.17	US$95.53
Colorado	US$77.43	US$118.84
Florida	US$164.35	US$160.93
Nevada	US$79.06	US$92.76
New Mexico	US$78.28	US$84.01
Texas	US$116.04	US$129.01
Wisconsin	US$121.28	US$194.08

Note: ELL = English language learner. Data retrieved from Education Counts Research Center (2009).

Level 2: School-Level Control Variables

Percent racial/ethnic minority

For each school, NAEP provides a measure of percentage of enrolled students who are African American, Hispanic, and Native American/American Indian.

Percent eligible for free/reduced lunch (1-9)

To control for the contextual impact of socioeconomic status at the school level (Porfeli, Wang, Audette, McColl, & Algozzine, 2009; Wilson, 1987), we included the percentage of enrolled students who are eligible for the federally funded free or reduced school lunch program. NAEP reports this measure based on school records, using the following categories: 1 = 0%, 2 = 1% to 5%, 3 = 6% to 10%, 4 = 11% to 25%, 5 = 26% to 34%, 6 = 35% to 50%, 7 = 51% to 75%, 8 = 76% to 99%, and 9 = 100%.

School enrollment category (1-4)

NAEP measures the size of the school based on four categories: 1 = less than 300 students, 2 = 300 to 499, 3 = 500 to 699, and 4 = 700 or more students.

Mean years of teacher experience (1-4)

Teachers participating in NAEP were asked for the total number of years of teaching experience at either the primary or secondary level, with 1 = 0 to 4 years, 2 = 5 to 9 years, 3 = 10 to 19 years, and 4 = 20 or more years. As a school-level measure, we aggregated the responses of all teachers at a school to determine the school mean of this item².

Level 1: Student-Level Control Variables

Within-Hispanic ethnic background

As noted above, only Hispanic students are included in our analysis. NAEP collects detailed information about ethnicity beyond the “Hispanic” designation, however, and we controlled for this aspect of a student’s ethnic background with a set of dummy variables for Cuban, Puerto Rican, mixed Hispanic, and other Hispanic, with Mexican as the omitted category.

Student socioeconomic status (0-2)

This is based on a student’s eligibility for the federal National School Lunch Program, coded on a scale of 0-2, with 0 = not eligible, 1 = eligible for reduced price lunch, and 2 = eligible for free lunch.³

Home Resources Index (0-5)

Family educational resources have been shown to play a critical part in children’s academic development (Lareau, 2000). Following Lubienski and Lubienski (2006), we constructed this index from a series of five NAEP items regarding educationally beneficial resources in the child’s home, such as regularly receiving subscription to a newspaper or magazine and how many books were available in the home. Students who reported having more than 100 books at home received a full point on the index, with fewer books meriting fractions of a point.

Along these lines, it is worth noting that we do not control for parent education in our model, even though many studies show that it is a highly significant predictor of academic achievement. NAEP does not collect this information from its fourth-grade samples due to low reliability of reports about parents’ educational attainment from students of this age. Taken together, however, eligibility status for the school lunch program and the Home Resources Index can serve as reasonable proxies for parental education.

Female (0-1)

A dummy variable indicating whether the student was female was included in the analysis.

Individualized education program (IEP; 0-1)

Another dummy variable was used to control for whether school records showed that the student was in a special education program as evidenced by the filing of an IEP.

Descriptive statistics for all variables for the ELL and non-ELL samples are shown in Appendices A and B.

With the above measures for both individual student and school-level attributes in mind, we estimated intercepts-as-outcomes models in hierarchical linear modeling (HLM), assuming that the Level 1 (student-level) intercept varies across schools (Level 2), whereas the slopes of control variables at Level 1 do not. Moreover, we assumed that all Level 2 effects except for the Bilingual Emphasis Index were fixed (Luke, 2004, pp. 11-13).⁴ The following equation summarizes the full mixed model of reading achievement:

Y_{i j} = γ_{00} + γ_{01} B i l E m p h + \sum_{1}^{k} γ_{0 k} O t h e r S c h o o l L e v e l + \sum_{2}^{m} γ_{m 0} S t u d e n t L e v e l + u_{0 j} + r_{i j}

where Y_ij represents individual student reading achievement, based on the five plausible values, with error terms u_0j showing error associated at the school level in estimating the effect of bilingual emphasis and r_ij the error associated with individual student i at school j.

Results

Hispanic ELL Students

For multilevel models, the fit is measured using a proportional reduction of error approach, comparing each substantive model with the null model (1) which estimates only the intercept. In Table 3, the more comprehensive models including student demographics, school characteristics, and ELL program implementation variables (Models 5 and 6) show modest reductions in error of about 14% at the student level and 20% at the school level. Clearly, there are other factors to consider if we are to understand thoroughly why some Hispanic ELL students score higher on reading achievement than others. The intraclass correlation of the null model, .128, provides statistical support for the decision to use the multilevel modeling approach, suggesting that there is substantial variance to be explained at the school level (Luke, 2004, p. 22) and that failure to model at both student and school levels would bias estimates of coefficient standard errors downward.

Table 3.

HLM Models of National Assessment of Educational Progress Reading Achievement for Hispanic ELL Students, 2005 (5,650 Students, 1,030 Schools)

	Model 1: Null	Model 2: State bilingual program emphasis	Model 3: Bilingual program and student characteristics	Model 4: Bilingual program, student characteristics, school characteristics	Model 5: Bilingual program, student and school characteristics, state ELL implementation	Model 6: Model 5 with variables p < .05
Fixed effects
Intercept: Mean reading achievement	189.86*** (0.802)	191.34*** (0.795)	191.32*** (0.991)	192.23*** (1.05)	191.70*** (1.05)	183.9*** (1.31)
School-level effects
Bilingual Emphasis Index (0-5)		3.16*** (0.470)	3.23*** (0.467)	3.54*** (0.450)	2.69*** (0.559)	2.84*** (0.457)
% African American				−0.132* (0.064)	−0.169* (0.066)	−0.230*** (0.056)
% Latino				−0.114* (0.046)	−0.105* (0.047)	−0.164*** (0.030)
% American Indian				−0.356*** (0.086)	−0.294** (0.090)	−0.366*** (0.084)
% lunch program category (1-9)				−1.56 (0.915)	−1.49 (0.919)	—
School enrollment category (1-4)				1.27 (0.780)	0.376 (0.808)	—
Mean teacher experience (years)				1.44 (1.13)	1.27 (1.10)	—
Level of preservice ELL training in state					−0.112 (0.192)	—
Per ELL student Title 3 spending in state					0.216*** (0.050)	0.200*** (0.041)
Professional development for diversity					−0.344 (1.08)	—
Student-level effects
Mixed ethnicity			2.14 (3.13)	2.16 (3.15)	1.59 (3.18)	1.57 (3.17)
Cuban			−10.50* (4.67)	−10.08* (4.76)	−12.83* (4.56)	−12.55* (4.56)
Puerto Rican			−9.67** (3.63)	−10.37** (3.52)	−13.23** (3.56)	−13.35*** (3.57)
Other (non-Mexican) Latino ethnicity			4.83** (1.57)	4.41** (1.57)	3.39* (1.58)	3.46* (1.57)
Lunch program eligibility (0-2)			−5.73*** (0.933)	−4.11*** (0.961)	−4.03*** (0.965)	−4.42*** (0.939)
Female			2.61** (1.03)	2.91** (1.02)	2.93** (1.02)	2.93** (1.03)
IEP (0-1)			−26.34*** (2.62)	−26.81*** (2.60)	−27.03*** (2.59)	−26.99*** (2.58)
Home Resources Index (0-5)			2.61*** (0.505)	2.44*** (0.505)	2.38*** (0.505)	2.47*** (0.505)
Random Effects
Intercept (variance between schools)	148.0	109.2	96.1	83.5	74.8	73.7
Level 1 (variance within schools)	1,007.2	1,012.4	922.1	916.5	916.3	917.8
Intraclass correlation (proportion of variance between schools)	.128	.097	.094	.084	.075	.074
Student-level proportional reduction of error (%)	NA	2.9	11.9	13.4	14.2	14.2
School-level proportional reduction of error (%)	NA	6.6	15.5	18.2	19.8	19.8

Note: HLM = hierarchical linear modeling; ELL = English language learner; IEP = individualized education program; NA = not applicable. Restricted maximum likelihood estimates with robust standard errors in parentheses.

The control variables at the school and student level have the expected effects, for the most part. The contextual effects of higher minority enrollment (percent African American, percent Hispanic, and percent American Indian/Native American) on reading achievement at the school are negative and significant, although the effect of the proportion of students eligible for the federal school lunch program is not significant. The size of the school and mean teacher experience at the school were not significant. At the individual student level, having an IEP is associated with large and significantly negative effects on reading, as is being eligible for the school lunch program. Girls score slightly but statistically significantly higher than boys, and ELL students reporting a variety of educational home resources also score significantly higher on reading achievement. Students’ reported within-Hispanic ethnicity showed some statistically significant differences. Compared with Mexican American students (the omitted category), Cuban and Puerto Rican students have significantly lower reading achievement, whereas students from other, non-Mexican Hispanic backgrounds score somewhat higher than the Mexican American comparison group. ELL students of mixed Hispanic ethnicity performed about the same as their Mexican American counterparts, after controlling for other effects.

Looking at Table 3, we see that the key independent variable, being in a state with a stronger bilingual emphasis, has a significantly positive effect on reading achievement scores across all the models, controlling for other effects. The full model (Model 5) shows that NAEP scores for ELL (and formerly ELL) students were an average of 2.69 points higher for each additional point on the Bilingual Program Emphasis Index (p < .001). In other words, controlling for other factors, an ELL in a school in Arizona, with the weakest emphasis on bilingual education (0), would on average score about 13.5 points lower in reading achievement than a student in New Mexico or Texas, states with the strongest bilingual education emphasis (5). A 10-point deficit on the NAEP achievement test suggests that a student is approximately 1 year behind in terms of his or her comprehension of content expected of fourth graders, so these differences are not only statistically significant but also substantively significant.⁵

Scanning across the table at the estimates for bilingual emphasis, we see that controlling for student characteristics and school characteristics (i.e., enrollment demographics, school size, and teacher experience; Models 3 and 4) generally improves the model but it does not reduce the effect of bilingual emphasis. In contrast, when we include factors measuring salient aspects of the implementation of ELL programs (such as state requirements for preservice training, professional development for working with diverse populations, and Title III funding), the magnitude of the effect of bilingual emphasis is reduced from 3.54 to 2.69 in the full model (5) and to 2.84 in the model retaining only significant effects (Model 6). This reduction suggests that part of the reason that students in states with stronger bilingual education emphasis have higher reading achievement is related to how the program is implemented, in particular, the per ELL student Title III funding. In the full model (5), the effect of this targeted funding was .216 (p < .001). After controlling for other variables, Hispanic ELL students from a high-funding state like Florida (US$164 per student) would on average score about 19 points higher on reading achievement than comparable students in the state with the lowest Title III funding, Colorado (US$77 per student). Other aspects of ELL program implementation were not significant. The measure of level of state-required preservice training in ELL issues was significantly related neither to reading achievement (–.112, ns) nor to the school-level mean for professional development for working with diverse students (–.344, ns).

Although our hypotheses find mixed support based on the analysis of NAEP data from these Hispanic ELL students, our first hypothesis receives strong support. A strong statewide bilingual education emphasis is associated with higher reading achievement among Hispanic ELL students compared with a very weak bilingual emphasis (i.e., an SEI approach). Hypotheses 2 and 3 were not supported for Hispanic ELL students: more state-mandated preservice training on ELL issues and more professional development for working with diverse students had no significant effect on reading achievement, although it is worth remembering that the measure of diversity professional development comprises a single item pertaining to mathematics. The fourth hypothesis regarding the positive effect of Title III funding on reading achievement garnered clear support, according to our analysis. We address the findings regarding Hypothesis 5 shortly.

Hispanic Non-ELL Students

Our review of the research literature suggested that the type of language acquisition model might generate a more culturally responsive school environment that would have a diffusely positive effect on reading achievement for all Hispanic students, not only those who had received a designation as ELL, by mitigating the stereotype threat. In Table 4, we repeated the analysis, focusing only on these non-ELL Hispanic students. The modeling started with an intraclass correlation of .124, and, in general, the fit of the final models was somewhat better than for the ELL models, with proportional reduction of error at the student level equal to 19.8% and at the school level around 32.4%.

Table 4.

HLM Models of NAEP Reading Achievement for Hispanic Non - ELL Students, 2005 (10,500 students, 1,630 schools)

	Model 1: Null	Model 2: State bilingual program emphasis	Model 3: Bilingual program and student characteristics	Model 4: Bilingual program, student characteristics, school characteristics	Model 5: Bilingual program, student and school characteristics, state ELL implementation	Model 6: Model 5 with variables p < .05
Fixed effects
Intercept: Mean reading achievement	213.78*** (0.613)	213.97*** (0.603)	213.38*** (0.766)	213.04*** (0.774)	212.89*** (0.754)	212.90*** (0.758)
School-level effects
Bilingual Emphasis Index (0-5)		1.50*** (0.307)	1.93*** (0.273)	2.24*** (0.271)	2.48*** (0.362)	2.43*** (0.348)
% African American				−0.119* (0.044)	−0.163** (0.044)	−0.174*** (0.039)
% Latino				0.009 (0.024)	0.008 (0.024)	—
% American Indian				−0.317** (0.099)	−0.252** (0.090)	−0.256** (0.087)
% free or reduced lunch				−1.84*** (0.432)	−2.19** (0.424)	−2.01*** (0.317)
School enrollment category (1-4)				1.18* (0.532)	0.311 (0.535)	—
Mean teacher experience (years)				0.824 (0.599)	0.751 (0.584)	—
Level of preservice ELL training in state					0.350** (0.112)	0.345** (0.111)
Per ELL student Title 3 spending in state					0.090** (0.027)	0.093** (0.026)
Professional development for diversity					1.90** (0.721)	1.94** (0.712)
Student-level effects
Mixed ethnicity			0.649 (1.91)	0.484 (1.93)	0.010 (1.93)	0.019 (1.94)
Cuban			−3.75 (2.02)	−3.75 (2.05)	−6.27** (2.02)	−6.19** (2.03)
Puerto Rican			−6.61** (1.95)	−6.24** (1.97)	−8.33*** (2.00)	−8.30*** (2.00)
Other (non-Mexican) Latino ethnicity			6.38*** (1.01)	6.05*** (1.04)	5.30*** (1.04)	5.31*** (1.04)
Lunch program eligibility (0-2)			−7.43*** (0.496)	−5.85*** (0.507)	−5.87*** (0.507)	−5.87*** (0.508)
Female			3.68*** (0.724)	3.82*** (0.718)	3.82*** (0.713)	3.82*** (0.713)
IEP (0-1)			−29.14*** (1.72)	−29.35*** (1.72)	−30.19*** (1.72)	−30.18*** (1.71)
Home Resources Index (0-5)			2.99*** (0.365)	2.71*** (0.374)	2.71*** (0.373)	2.72*** (0.371)
Random effects
Intercept (variance between schools)	137.59	127.87	66.53	55.20	42.21	42.46
Level 1 (variance within schools)	975.97	977.22	854.18	849.92	850.44	850.30
Intraclass correlation (proportion of variance between schools)	.124	.116	.072	.061	.047	.048
Student-level proportional reduction of error (%)	NA	0.76	17.3	18.7	19.8	19.8
School-level proportional reduction of error (%)	NA	2.4	26.0	29.1	32.4	32.3

A central finding of the analysis shown in Table 4 is that non-ELL Hispanics in states with stronger bilingual education emphasis have significantly higher reading achievement. In the full model (5), the effect was 2.48 (p < .001), an effect that is stable in the final model (6) that omits nonstatistically significant effects.⁶ This is a substantively significant, and perhaps somewhat surprising, effect rivaling that of the effect for ELLs, with non-ELL Hispanic students in high bilingual education emphasis states (Texas and New Mexico) scoring on average about 11 points higher (about one grade level) than students in Arizona, the SEI state with the weakest incorporation of bilingual strategies. Interestingly, looking across Table 4, we find that the positive effect of bilingual emphasis in this group is not reduced after controlling for implementation factors in Model 5, as was the case for the effect for ELL students. This is not to say that implementation of the language acquisition model does not matter for non-ELL Hispanic students. Title III funding has a smaller effect than what we found in the analysis with ELL students, but the effect was still significant (.090, p < .01).

Teacher training, both in the arena of preservice training (.350, p < .01) and in-service professional development (1.90, p < .01), had significantly positive effects for reading achievement among non-ELL students, effects that were not significant for the ELL analysis. Net of other effects, Hispanic non-ELL students in states requiring a wide range of preservice ELL training, such as Arizona and Florida (27 on the preservice index), have achievement scores about 7 points higher, after controlling for other factors, compared with students in states like New Mexico (preservice index = 7) with less extensive mandates for preservice training on ELL issues. Similarly, non-ELL Hispanic students had a mean reading achievement about 5.7 points higher in schools where all teachers surveyed reported receiving “a large extent” of training compared with those in schools were all teachers responded “not at all” to the item regarding this type of professional development.

In brief, the control variables for the analysis of non-ELL Hispanic students have similar effects as seen for the ELL Hispanic sample, except that the school-level contextual effect of socioeconomic status (percent eligible for free or reduced school lunch) has a significantly negative effect and the percentage of Hispanic students at the school has no significant effect on reading achievement.

To summarize our findings for the non-ELL Hispanic students at this point, we found strong statistical support for hypotheses one through four when we consider the 2005 NAEP reading scores for non-ELL Hispanic sample of fourth graders. A statewide bilingual education emphasis, intensive preservice and in-service training as well as funding for language acquisition in the form of per ELL student Title III funds at the state level all have significantly positive effects, which are diffuse and perhaps not completely intended or expected by policy makers, on non-ELL Hispanic students.

Institutionalization Effects for ELL and Non-ELL Hispanic Students

Finally, we turn our attention to the institutionalization effect proposed in Hypothesis 5. As explained previously, many theorists and empirical researchers have highlighted the difficulties of implementing major school reform, such as the move away from bilingual education toward SEI in states such as Arizona and California. Problems of teacher resistance, scaling up, and reform fidelity may take time to resolve before the full effect of a reform becomes measurable with respect to student achievement. To assess whether the move toward SEI in some states (i.e., a nonexistent or lower emphasis on bilingual education) produced better results with more time to institutionalize, we reran all analyses for both ELL Hispanic students and non-ELL Hispanic fourth-grade students surveyed as part of the 2007 NAEP study. If the newer SEI model benefits from additional time to institutionalize, we should see a reduction over time in the positive effect of the alternative approach, emphasis on bilingual education. Table 5 compares the coefficients from the full HLM model representing the effect of bilingual emphasis for the two different samples in the 2005 and 2007 NAEP. The table shows that the effect of bilingual education emphasis remained stable in 2007 for both Hispanic ELLs and Hispanic non-ELL students. Thus, there was no evidence that additional time to institutionalize allows students in states with the most stringent mandates against bilingual education to begin to close the reading achievement gap with their counterparts in states that have stronger bilingual emphasis. As a result, Hypothesis 5 is not supported for either group. With only two time data points separated by 2 years, however, it is possible that longer time is necessary to produce measurable results.

Table 5.

Comparing the Effect of State Bilingual Education Model Over Time

	Estimated Bilingual Emphasis Index coefficient (from full model) with robust standard error
	2005	2007
ELLs (including formerly ELL)	2.69 (.56)	3.01 (.71)
Non-ELLs	2.48 (.36)	2.88 (.44)

Note: ELL = English language learner. t tests show no significant differences between coefficients.

Discussion

Unlike state assessments that reflect individual student achievement based on state standards, NAEP scores reflect aggregated achievement (NCES, 2009b). In the present study, to ensure our findings were not a function of more or less stringent curricula across jurisdictions, we examined both the stringency of state assessments (NCES, 2009b) and whether there were differences in achievement among non-at-risk students in each participating state. According to the latest NAEP mapping results (NCES, 2009b), California, Nevada, New Mexico, and Florida had the most stringent minimum state assessment standards as mapped on to NAEP proficiency levels with NAEP score equivalents between 207 and 210 points. Arizona and Wisconsin had lower proficiency cut scores of 198 and 193. Texas and Colorado had the least stringent state assessments, with 188 and 187 points, respectively. Moreover, an analysis of the mean fourth-grade reading scores among non-at-risk students across the states in the present study revealed that there were negligible to small standardized mean differences (Cohen’s d) across states, suggesting that our results are not likely due to variations in the curriculum.

In research funded by the Public Policy Institute of California, Rossell (2002) presented evidence that the SEI model implemented in California (i.e., English only) was superior to bilingual education by citing research showing that the percentage of students enrolled in bilingual education was positively related to reductions in reading and math achievement. Rossell further asserts that Proposition 227 in California has had a positive effect on students given the “small improvement in redesignation rates” (p. iii).

Have the current educational reform efforts focused on immersion improved educational outcomes for ELLs? Contrary to Rossell’s assertions, our findings indicate that mandates explicitly prohibiting the use of students’ native language have failed to improve the achievement outcomes for Hispanic ELLs. Our study, however, is not the first to find that native language instruction promotes ELL achievement. Prior studies examining explicitly the programmatic differences between bilingual education and immersion have established the general support in favor of programs that incorporate native language support (see review of the literature). Moreover, Padilla and Gonzalez (2001) found that California high school students who received some ESL/bilingual education had higher grades than students who did not receive linguistic support, regardless of their immigrant status (i.e., born outside of the United States, second, or third generation).

Gándara and Rumberger (2009) caution that when language is the only focus in addressing ELLs’ needs, inequities in educational opportunity endure (pp. 774-775). When the focus on acquiring English occurs in an additive environment that views home language and culture as educational assets as opposed to a subtractive environment that views ELLs as deficient (see Stritikus & Garcia, 2003), ELL achievement is more likely (e.g., Gándara & Rumberger, 2009). Indeed, our results suggest that the benefits in additive environments are not restricted to ELLs.

Benefits for Hispanic Non-ELLs

The benefits for students in states employing a stronger bilingual approach are not limited to ELL students. NAEP reading scores for Hispanic fourth graders in the present study who have never been designated ELL were also significantly higher in states that use bilingual programs than in states minimizing the use of bilingual education, controlling for school- and student-level factors. This finding may seem somewhat surprising because these students are not direct users of language acquisition services. Bilingual education, however, supports students’ social and cultural knowledge (Trueba et al., 1981) and may promote a school culture beyond specific classroom services that is supportive of academic achievement for all Hispanic students (Cummins, 1986). Moreover, the notion that non-ELL fourth-grade students would internalize social beliefs that may not appear to have been directly aimed at them (given their English proficiency) is consistent with findings on the development of children’s ability to infer social beliefs about stigmatized groups (McKown & Weinstein, 2003).

The results also suggest that laws prohibiting the use of students’ native language in instruction support the formation of school cultures that are internalized even by students who are not directly targeted (i.e., Hispanic non-ELLs). That is, because the laws in Arizona and California were promulgated on a platform blaming the low performance of Hispanic ELLs on bilingual education, there may be a large-scale stereotype threat (Steele & Aronson, 1995) priming lower performance among Hispanic non-ELLs when compared with Hispanic non-ELLs in states that do not restrict language acquisition methods incorporating native language. In the present case, the scenario was a national standardized test used for the Nation’s Report Card; the stigma for Hispanic students (regardless of ELL status) was belonging to a group that is among those labeled “at risk.” Notably, our findings that stronger emphasis on bilingual approaches is associated with higher reading scores for both non-ELL and ELL Hispanic students are supported by studies that have found positive ethnic identity is related to academic achievement (Altschul, Oyserman, & Bybee, 2006; Chavous et al., 2003; Oyserman, Harrison, & Bybee, 2001; Wong, Eccles, & Sameroff, 2003).

Although both Hispanic ELLs and non-ELLs tend to achieve higher reading scores in the context of state policy emphasizing bilingual education, only the non-ELL students gained when teachers receive training in working with diverse learners as well as preservice training and in-service professional development mandates by states (e.g., Arizona and Florida). Notably, both Arizona and Florida mandate such trainings for all teachers. The effect on non-ELLs in Arizona may be a result of the training that is sufficient to improve outcomes for Hispanic students who are already proficient in English but insufficient to ameliorate achievement gaps for ELLs.

Time Required to Institutionalize Reform

Advocates of SEI might argue that major educational reform efforts, such as SEI, require time to establish appropriate and effective norms, policies, and procedures in the initial stages of implementation. If this argument regarding the need for time to institutionalize reform were valid, analysis should show a decreasing effect over time of bilingual emphasis on the reading performance of students. After repeating the analysis for the subsequent administration of NAEP in 2007, we found that the effect of an emphasis on bilingual education was stable from 2005 to 2007 for both Hispanic ELLs (2.69 to 3.01 for each additional point on the Bilingual Education Emphasis Index) as well as Hispanic non-ELLs (2.48 to 2.88 points). Thus, there is no evidence for a positive effect of institutionalizing the relatively recent reforms in California and Arizona, at least over the course of 2 years. English-only approaches may in fact have a diffusely cultural and negative effect on Hispanic students’ reading achievement, a process that appears to be institutionalizing and strengthening over time.

State-Level Variations in Implementation Related to Achievement

Which aspects of the implementation of a language acquisition program significantly affect reading achievement among Hispanic fourth graders? Our analysis found that although money does in fact matter, the effects of statewide mandates for teacher training in ELL and diversity issues were mixed. More demanding requirements were actually associated with slightly (though not significantly) lower achievement among ELL students in 2005, perhaps indicating that more stringent teacher training mandates came in response to a problem of low achievement among ELL students, rather than a cause of it, an explanation that we cannot evaluate, given the limitations of a cross-sectional research design.

Limitations of the Study

Although this study offers a foundational move toward quantifying the bilingual emphasis of state language acquisition policies and its effect on reading achievement, it is quite plausible that these policies are inherently multidimensional, and future research should consider additional ways to measure this construct. Another limitation of the present study involves the lack of measures regarding the implementation of the state language acquisition policy at the school level. However, if the quality/fidelity of implementation is roughly normally distributed across the levels of bilingual education emphasis, with some schools implementing the policy well and others not so well, the findings presented here would probably not be affected substantially. Although there are pervasive limitations when not including the consideration of the curriculum, it is important to keep in mind that the focus of the present study was on the relationship between policies and reading outcomes for fourth-grade Hispanic students. Given controls for teacher certification, it is important for future research to uncover why the relationships found in the present study exists. Thus, although we cannot assert that a bilingual education emphasis causes improved Hispanic student achievement, we have a starting point from which to examine in detail program implementation and content coverage across states that will further reduce bias and lead us closer to an understanding between language policies and student achievement.

Finally, we were unable to control for students’ time in the United States or at least their generational status as NAEP does not collect this information. Such controls would be necessary to examine more accurately the relationship between language policies and academic achievement among Hispanic students and thus should be addressed by large-scale data collection.

Conclusion

Language acquisition programs are required to be evaluated for effectiveness (Castañeda v. Pickard, 1981). Despite the huge variation across language acquisition models and their implementation, the findings from the present study suggest that the ways in which policies are promulgated can have deleterious consequences for not only the population they intend to serve (i.e., ELLs) but also other students.

Overall, it appears that the stronger the bilingual education emphasis, the higher the reading achievement for Hispanic students (both ELL and non-ELL). These effects may occur both directly, via the scaffolding provided by native language acquisition methods, and indirectly, via the climate created by additive policies (Stritikus & Garcia, 2003). Moreover, although some additive approaches (e.g., professional development regarding diversity) may ameliorate issues for Hispanic students who have already demonstrated mastery of English, they are insufficient to address the needs of ELLs.

Footnotes

Appendix 1

Descriptive Statistics for 2005 ELL Sample (5,650 Students, 1,030 Schools)

	M	SD	Min	Max
Level 1 variables (student)^a
Reading achievement	187.63	33.91	36.67	315.59
Mixed ethnicity	0.043	0.202	0	1
Cuban	0.024	0.152	0	1
Puerto Rican	0.030	0.170	0	1
Other Hispanic	0.183	0.387	0	1
Lunch program eligibility	1.68	0.663	0	2
Female	0.488	0.500	0	1
IEP	0.077	0.267	0	1
Home Resources Index	2.23	1.23	0	5
Level 2 variables (school)^b
Bilingual program emphasis in state	2.44	1.82	0	5
% Black	11.70	17.26	0	98
% Hispanic	50.5	29.38	0	100
% American Indian	1.52	8.15	0	100
% lunch program category	6.87	1.48	2	9
School enrollment category	2.89	0.938	1	4
Mean teacher experience	2.35	0.774	1	4
Level of required preservice ELL training in state	15.88	5.33	7	27
Per ELL student Title III spending in state	110.37	20.73	77.43	164.33
Mean professional development for diversity	2.18	0.700	1.00	4.00

Note: ELL = English language learner; IEP = individualized education program.

Reading achievement was assessed via five plausible values. Mean and standard deviations for all student-level variables are robust as calculated by AM software, while minimums and maximums for reading achievement are the lowest and highest recorded across all five plausible values.

School-level descriptive calculated in SPSS using designated school weights.

Appendix 2

Descriptive Statistics for 2005 Non-ELL Sample (10,500 Students, 1,630 Schools)

	M	SD	Min	Max
Level 1 variables (student)
Reading achievement	213.65	33.42	36.12	334.03
Mixed ethnicity	0.067	0.250	0	1
Cuban	0.050	0.218	0	1
Puerto Rican	0.069	0.254	0	1
Other Hispanic (non-Mexican)	0.236	0.425	0	1
Lunch program eligibility	1.07	0.941	0	2
Female	0.509	0.500	0	1
IEP	0.086	0.280	0	1
Home Resources Index	2.82	1.22	0	5
Level 2 variables (school)
Bilingual program emphasis in state	2.71	1.80	0	5
% Black	12.02	19.47	0	98
% Hispanic	36.97	30.86	0	100
% American Indian	1.31	5.80	0	100
% lunch program category	6.21	1.87	2	9
School enrollment category	2.68	0.990	1	4
Mean teacher experience	2.45	0.819	1	4
Level of required preservice training in state	16.22	5.78	7	27
Per ELL student Title III spending in state	112.81	22.95	77.43	164.33
Mean professional development for diversity	2.05	0.729	1.00	4.00

Note: ELL = English language learner; IEP = individualized education program.

Acknowledgements

The authors would like to thank Bob Lowe and Sharon Chubbuck for their thoughtful comments on an earlier draft of the article.

Authors’ Note

This research uses fourth-grade restricted-license National Assessment of Educational Progress Reading data; the manuscript has been cleared for dissemination to nonlicensed persons by authorities at the Institute of Education Sciences and the National Center for Education Statistics.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by grants from the Marquette University Summer Faculty Fellowship and Regular Research Grant. This research uses fourth-grade restricted-license National Assessment of Educational Progress Reading data; the manuscript has been cleared for dissemination to nonlicensed persons by authorities at the Institute of Education Sciences and the National Center for Education Statistics.

Notes

Bios

Francesca López is an assistant professor in the Educational Policy and Leadership department at Marquette University. Prior to earning her PhD in educational psychology from the University of Arizona, she was a bilingual elementary teacher and at-risk high school counselor. Her research is focused on understanding how school contexts promote achievement for Latino students—particularly those for whom English is a second language.

Elizabeth McEneaney is an assistant professor of education at UMass–Amherst. A former high school mathematics teacher, she earned a PhD in sociology from Stanford University. Her research interests include language acquisition policy, particularly as applied in STEM fields, and the organizational aspects of the charter school movement.

References

Abedi

Hofstetter

Lord

(2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74, 1-28.

Acts of the 67th Texas Legislature (1981).

Arizona Revised Statutes §15-751 (2000).

Altschul

Oyserman

Bybee

(2006). Racial-ethnic identity in mid-adolescence: Content and change as predictors of academic achievement. Child Development, 77, 1155-1169.

Baker

(1998). Structured English immersion. Phi Delta Kappan, 80, 199-205.

Baker

(1999). Basics of structured English immersion for language minority students. Bilingual education: A focus on current research. Retrieved from ERIC database. (ED432928)

Baker

K. A.

de Kanter

A. A.

(1981). Effectiveness of bilingual education: A review of the literature. Washington, DC: U.S. Department of Education, Office of Planning, Budget and Evaluation.

Baker

de Kanter

(Eds.). (1983). Bilingual education: A reappraisal of federal policy. Lexington, MA: Lexington Books.

Bickel

(2007). Multilevel analysis for applied research. New York, NY: Guilford.

10.

Bilingual Education Act Pub. L. No. (90-247), 81 Stat. 816 (1968).

11.

Bilingual Multicultural Education Act, 22, Article 23, NMSA (1978).

12.

California Education Code §§ 300-340 (1998).

13.

Castañeda v. Pickard, 648 F.2d 989 (5th Cir. 1981).

14.

Chavous

Bernat

Scmeelk-Cone

Caldwell

Kohn-Wood

Zimmerman

(2003). Racial identity and academic attainment among African American adolescents. Child Development, 74, 1076-1090.

15.

Cochran-Smith

Davis

Fries

M. K.

(2003). Multicultural teacher education: Research, practice and policy. In Banks

(Ed.), Handbook of research on multicultural education (2nd ed., pp. 931-975). San Francisco, CA: Jossey-Bass.

16.

Collier

V. P.

Thomas

W. P.

(2004). The astounding effectiveness of dual language education for all. NABE Journal of Research and Practice, 2, 1-20.

17.

Colorado Revised Statutes Section 1111(b) (1) (2002).

18.

Combs

M. C.

Evans

Fletcher

Parra

Jiménez

(2005). Bilingualism for the children: Implementing a dual-language program in an English-only state. Educational Policy, 19, 701-728.

19.

Crawford

(2000). Anatomy of the English-only movement. In Crawford

(Ed.), At war with diversity: U.S. language policy in an age of anxiety (pp. 4-30). Clevedon, UK: Multilingual Matters.

20.

Crawford

(2005). Making sense of census 2000. Retrieved from http://www.languagepolicy.net/excerpts/makingsense.html

21.

Crawford

(2007). The decline of bilingual education: How to reverse a troubling trend? International Multilingual Research Journal, 1, 33-37.

22.

Croizet

J. C.

Claire

(1998). Extending the concept of stereotype threat to social class: The intellectual underperformance of students from low socioeconomic backgrounds. Personality and Social Psychology Bulletin, 24, 588-594.

23.

Cummins

(1979). Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Working Papers on Bilingualism, 19, 121-129.

24.

Cummins

(1986). Empowering minority students: A framework for intervention. Harvard Educational Review, 56, 18-36.

25.

Deschenes

Cuban

Tyack

D. B.

(2001). Mismatch: Historical perspectives on schools and students who don’t fit them. Teachers College Record, 102, 525-547.

26.

Desimone

(2002). How can comprehensive school reform models be successfully implemented? Review of Educational Research, 72, 433-479.

27.

Dolson

D. P.

Mayer

(1992). Longitudinal study of three program models for language-minority students: A critical examination of reported findings. Bilingual Research Journal, 16, 105-157.

28.

Echevarria

J. A.

Graves

(2010). Sheltered content instruction: Teaching English language learners with diverse abilities (4th ed.). Boston, MA: Allyn & Bacon.

29.

Education Counts Research Center. (2009). Retrieved from http://www.edcounts.org/createtable/step1.php

30.

Escamilla

(1992). Classroom discourse as improvisation: Relationships between academic task structure and social participation structure in lessons. In Wilkinson

L. C.

(Ed.), Communicating in the classroom (pp. 19-158). New York, NY: MacMillan.

31.

Florida Revised Statutes Section 1003.56 (2002).

32.

Fry

(2003). Hispanic youth dropping out of U.S. schools. Washington, DC: Pew Hispanic Center. Retrieved from http://www.pewhispanic.org/files/reports/19.pdf

33.

Fry

(2007). How far behind in math and reading are English language learners? (Pew Hispanic Center Report). Retrieved from http://pewhispanic.org/reports/

34.

Furman

(2008). Tensions in multicultural teacher education research: Demographics and the need to demonstrate effectiveness. Education and Urban Society, 41, 55-79.

35.

Gándara

Merino

(1993). Measuring the outcomes of LEP programs: Test scores, exit rates, and other mythological data. Educational Evaluation and Policy Analysis, 15, 320-338.

36.

Gándara

Rumberger

(2009). Immigration, language, and education: How does language policy structure opportunity? Teachers College Record, 111, 750-782.

37.

Gándara

Rumberger

Maxwell-Jolly

Callahan

(2003). English learners in California schools: Unequal resources, unequal outcomes. Education Policy Analysis Archives, 11. Retrieved from http://epaa.asu.edu/epaa/v11n36/

38.

Gersten

Woodward

(1995). A longitudinal study of transitional and immersion bilingual education programs in one district. Elementary School Journal, 95, 223-240.

39.

Gollnick

D. M.

(2004). National and state initiatives for multicultural education. In Banks

J. A.

Banks

C. A. M.

(Eds.), Handbook of research of multicultural education (pp. 44-64), San Francisco, CA: Jossey-Bass.

40.

González

R. D.

Melis

(Eds.). (2000). Language ideologies: Education and the social implications of official language. Mahwah, NJ: Lawrence Erlbaum.

41.

Grayson

Rust

(2001). Interrater reliability. Journal of Consumer Psychology, 10(1 &2), 71-73.

42.

Greene

(1998). A meta-analysis of the effectiveness the Rossell & Baker review of bilingual education research. Bilingual Research Journal, 21, 102-122.

43.

Hayes

A. F.

Krippendorf

(2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1, 77-89.

44.

Hollins

E. R.

Guzman

M. R.

(2005). Research on preparing teachers for diverse populations. In Cochran-Smith

Zichner

K. M.

(Eds.), Studying teacher education: The report of the AERA panel of research and teacher education (pp. 477-548). Mahwah, NJ: Lawrence Erlbaum.

45.

Hombo

C. M.

(2003). NAEP and No Child Left Behind: Technical challenges and practical solutions. Theory into Practice, 42, 59-65.

46.

Irby

Tong

Lara-Alecio

Meyer

Rodríguez

(2007). The critical nature of language of instruction compared to observed practices and high stakes tests in transitional bilingual classroom. Research in the Schools, 14, 27-36.

47.

Johnson

E. G.

(1989). Considerations and techniques for the analysis of NAEP data. Journal of Educational Statistics, 14, 303-334.

48.

Jones

L. V.

Olkin

(2004). Introduction. In Jones

L. V.

Olkin

(Eds.) The nation’s report card: Evolution and perspectives (pp. 1-9). Bloomington, IN: Phi Delta Kappa in cooperation with American Educational Research Association.

49.

Krashen

(1981). Second language acquisition and second language learning. New York, NY: Pergamon Press.

50.

Krashen

(1985). Insights and inquiries. Hayward, CA: Alemany Press.

51.

Krashen

S. D.

(1991). Bilingual education: A focus on current research. Retrieved from ERIC database. (ED337034)

52.

Krashen

S. D.

(1997a). Why bilingual education? Retrieved from ERIC database. (ED403101)

53.

Krashen

S. D.

(1997b). A researcher’s view of Unz. Retrieved from http://www.languagepolicy.net/archives/Krashen1.htm

54.

Lareau

(2000). Home advantage: Social class and parental intervention in elementary education. Lanham, MD: Rowman & Littlefield.

55.

Lau v. Nichols, 414 U.S. 563 (1974).

56.

Lazarín

(2006). Improving assessment and accountability for English language learners in the No Child Left Behind Act. Washington, DC: National Council of La Raza.

57.

Leibowitz

A. H.

(1971). Educational policy and political acceptance: The imposition of English as the language of instruction in American schools. Washington, DC: Center for Applied Linguistics. (ERIC Document Reproduction Service No. 047321)

58.

Lindholm-Leary

K. J.

(2001). Dual language education. Tonawanda, NY: Multilingual Matters

59.

Lord

(1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

60.

Lubienski

S. T.

(2006). Charter, private, public schools and academic achievement: New evidence from NAEP mathematics data. New York, NY: National Center for the Study of Privatization in Education.

61.

Luke

D. A.

(2004). Multilevel modeling. Thousand Oaks, CA: SAGE.

62.

Lyons

(1995). The past and future directions of federal bilingual education policy. In Baker

(Ed.), Policy and practice in bilingual education: Extending the foundations (pp. 1-15). Clevedon, UK: Multilingual Matters.

63.

Maas

Hox

(2004). Robustness issues in multilevel regression analysis. Statistica Neerlandia, 58, 127-137.

64.

Maas

Hox

(2005). Sufficient sample size for multilevel modeling. Methodology, 1, 86-92.

65.

Madrid

(1990). Official English: A false policy issue. Annals of the American Academy of Political and Social Science, 508, 62-65.

66.

May

(2003). Misconceiving minority language rights: Implications for liberal political theory. In Kymlicka

Patten

(Eds.), Language rights and political theory (pp. 123-151). New York, NY: Oxford University Press.

67.

May

(2005). Language rights: Moving the debate forward. Journal of Sociolinguistics, 9, 319-347.

68.

McKown

Weinstein

R. S.

(2003). The development and consequences of stereotype consciousness in middle childhood. Child Development, 74, 498-515.

69.

Meece

J. L.

Kurtz-Costes

(2001). Introduction: The schooling of ethnic minority children and youth. Educational Psychologist, 36, 1-7.

70.

Menken

Antunez

(2001). An overview of the preparation and certification of teachers working with limited English proficient (LEP) students (Publication No. ED-99-CO-0007). Retrieved from http://www.usc.edu/dept/education/CMMR/FullText/teacherprep.pdf

71.

National Assessment Governing Board. (2009). Reading framework for the 2009 National Assessment of Educational Progress (U.S. Department of Education ED-02-R-0007). Retrieved from http://www.nagb.org/publications/frameworks/reading09.pdf

72.

National Center for Education Statistics. (2009a). NAEP reading. Retrieved from http://nationsreportcard.gov/reading_2009/

73.

National Center for Education Statistics. (2009b). Mapping state proficiency standards onto NAEP (U.S. Department of Education NCES 2010-456). Retrieved from http://nces.ed.gov/nationsreportcard/pdf/studies/2010456.pdf

74.

National Clearinghouse for English Language Acquisition. (2009). NCELA State Title III information system. Retrieved from http://www.ncela.gwu.edu/t3sis/

75.

Neuendorf

(2002). The content analysis guidebook. Thousand Oaks, CA: SAGE.

76.

Nevada State Board of Education. (2002). All children can succeed: Consolidated plan for the implementation of the No Child Left Behind Act. Retrieved from http://nde.doe.nv.gov/Accountability/NCLB/NCLBplan.pdf

77.

NRS 388.405. (2005). Nevada code. System of public instruction: Program to teach language to certain pupils. Establishment; regulations; submission of certain evaluations required by federal law.

78.

Office of English Language Acquisition, language enhancement, and academic achievement for limited English proficient students—homepage. (2009). Retrieved from http://www.ed.gov/about/offices/list/oela/index.html

79.

Ovando

C. J.

(2003). Bilingual education in the United States: Historical development and current issues. Tempe: Arizona State University.

80.

Oyserman

Harrison

Bybee

(2001). Can racial identity be promotive of academic efficacy? International Journal of Behavioral Development, 25, 379-385.

81.

Padilla

A. M.

Gonzalez

(2001). Academic performance of immigrant and U.S.-born Mexican heritage students: Effects of schooling in Mexico and bilingual/English language instruction. American Educational Research Journal, 38, 727-742.

82.

Parrish

T. B.

Merickel

Perez

Linguanti

Socias

Spain

Delancey

(2006). Effects of the implementation of Proposition 227 on the education of English learners, K-12: Findings from a five-year evaluation (Final Report for AB 56 and AB 116). Retrieved from http://www.wested.org/cs/we/view/rs/804

83.

Pew Hispanic Center. (2007). State and county databases, 2007. Retrieved from http://pewhispanic.org/states

84.

Porfeli

Wang

Audette

McColl

Algozzine

(2009). Influence of social and community capital on student achievement in a large urban school district. Education and Urban Society, 42, 72-95.

85.

Ramirez

Carpenter

(2005, April). Challenging assumptions about the achievement gap. Phi Delta Kappan, 86, 599-603.

86.

Ramirez

J. D.

Yuen

S. D.

Ramey

D. R.

Pasta

D. J.

(1991). Final report: Longitudinal study of structured English immersion strategy, early-exit, and late-exit transitional bilingual education programs for language minority children. Retrieved from ERIC database. (ED330216)

87.

Raudenbush

S. W.

Bryk

A. S.

(2001). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: SAGE.

88.

Reardon

S. F.

Galindo

(2009). The Hispanic-White achievement gap in math and reading in the elementary grades. American Educational Association Journal, 46, 853-891.

89.

Ricento

(1996). The courts, the legislator and society: The shaping of federal language policy in the United States. In Kibee

(Ed.), Language legislation and linguistic rights (pp. 123-141). Philadelphia, PA: John Benjamins.

90.

Rolstad

Mahoney

Glass

(2005). The big picture: A meta-analysis of program effectiveness research on English language learners. Educational Policy, 19, 572-594.

91.

Rossell

(2002). Dismantling bilingual education implementing English immersion: The California initiative. San Francisco: Public Policy Institute of California.

92.

Rossell

Baker

(1996). The educational effectiveness of bilingual education. Research in the Teaching of English, 30, 7-69.

93.

Ruiz

(1984). Orientations in language planning. NABE Journal, 8, 15-34.

94.

Salazar

J. J.

(1998). A longitudinal model for interpreting thirty years of bilingual education research. Bilingual Research Journal, 22, 1-12.

95.

Skinner

C. J.

Holt

Smith

T. M. F.

(1989). Analysis of complex surveys. Chichester, UK: Wiley.

96.

Slavin

R. E.

Cheung

(2003). A synthesis of research on language of reading: Instruction for English language learners. Review of Educational Research, 15, 247-284.

97.

Snijders

T. A. B.

Bosker

R. J.

(1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. London, UK: SAGE.

98.

Steele

C. M.

Aronson

(1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797-811.

99.

Stritikus

T. T.

Garcia

(2003). The role of theory and policy in the educational treatment of language minority students: Competitive structures in California. Education Policy Analysis Archives, 11. Retrieved from http://epaa.asu.edu/epaa/v11n26/

100.

Strong

(1986). Teacher language to limited English speakers in bilingual and submersion classrooms. In Day

R. R.

(Ed.), Talking to learn: Conversation in second language acquisition (pp. 53-63). Rowley, MA: Newbury House.

101.

Thomas

W. P.

Collier

(1997). School effectiveness for language minority students. Washington, DC: National Clearinghouse for Bilingual Education.

102.

Trueba

H. T.

Guthrie

G. P.

K. H.

(1981). Culture and the bilingual classroom: Studies in classroom ethnography. Rowley, MA: Newbury House.

103.

Tyack

Cuban

(1995). Tinkering toward Utopia: A century of public school reform. Cambridge, MA: Harvard University Press.

104.

U.S. Department of Education. (2007). The condition of education 2007 (Publication No. NCES 2007-064). Retrieved from http://nces.ed.gov/pubs2007/2007064.pdf

105.

Valenzuela

(1999). Subtractive schooling: U.S. Mexican youth and the politics of caring. Albany: SUNY Press.

106.

Wiley

T. G.

Wright

W. E.

(2004). Against the undertow: Language-minority education policy and politics in the “age of accountability.” Educational Policy, 18, 142-168.

107.

Willig

A. C.

(1985). A meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research, 55, 269-317.

108.

Wilson

W. J.

(1987). The truly disadvantaged. Chicago, IL: University of Chicago Press.

109.

Wisconsin Revised Statutes, Ch. PI 13 (2002).

110.

Wong

C. A.

Eccles

J. S.

Sameroff

(2003). The influence of ethnic discrimination and ethnic identification on African American adolescents’ school and socioemotional adjustment. Journal of Personality, 71, 1197-1232.

111.

Wright

(2005). The political spectacle of Arizona’s Proposition 203. Educational Policy, 19, 662-700.

112.

Zwick

(1992). Chapter 7: Statistical and psychometric issues in the measurement of educational achievement trends: Examples from the National Assessment of Educational Progress. Journal of Educational and Behavioral Statistics, 17, 205-218.