Abstract
We document results of validity generalization meta-analyses of the teacher, student, and parent forms of the BASC-3 Behavioral and Emotional Screening System (BESS). Effect sizes (reported here as absolute values, based on 16 studies and 428 correlation coefficients) ranged from 0.040 to 0.350 for constructs related to academics, 0.399 to 0.690 for executive functioning, 0.350 to 0.627 for externalizing problems, 0.287 to 0.815 for internalizing problems, 0.344 to 0.660 for prosocial functioning, and 0.478 to 0.875 for global risk indicators. Extracted coefficients were of the expected direction and magnitude with theoretically aligned constructs, although scores on the BESS scales showed relatively weak relationships with academic variables and results from the parent form were less consistent. Results indicate the broad utility of the BESS in identifying students for further assessment as part of the screening process and support the use of the broadband BESS Behavior and Emotional Risk Index (BERI) particularly for this purpose.
School-based, universal mental health screening refers to a systematic, routine assessment process in which all students are screened to identify those most at risk for significant mental health problems. The identification of students at significant risk informs follow-up assessment to gain more information about their risk status and also guides systematic prevention and intervention efforts. A key benefit of screening is that a sizeable proportion of students are screened, reducing the number of students that are overlooked. In addition, screening provides a baseline for future monitoring and assessment and contributes to data-driven approaches to intervention. Screening also leads to earlier intervention for emerging problems, which is less intensive and expensive than treatment received after significant symptoms of distress are exhibited (Dvorsky et al., 2014). Unsurprisingly, screening has been proposed as a key step in service delivery reform, with the potential to place increased emphasis on prevention, early intervention, and the promotion of social-emotional health (U.S. Preventive Services Task Force, 2009). It is also an important component of efforts to increase access to services and is generally perceived positively by parents, students, and school staff based on the limited research on acceptability (Palmer et al., 2025).
A critical part of the screening process is the selection of screening tools to identify students in need of services. A variety of screeners are regularly used in schools (see Benson et al., 2019). Some are broadband screeners that yield a single, overall score of general risk based on items representing multiple domains (e.g., internalizing problems, externalizing problems, school problems, and adaptive behavior). Others are narrowband screeners that yield a single score or separate scale scores based on items that all measure the same domain (e.g., screeners that only yield scores for internalizing problems). Some screeners yield both a total score and subscale scores, providing results for both broad and narrow domains. Notably, the distinction between narrowband and broadband screeners is important, as each type of screener has been associated with differing levels of reliability and validity and should be considered different approaches to screening (Allen et al., 2019). Given the purpose and potential outcomes of the screening process (e.g., further assessment, potential mental health disorder diagnosis, and intervention planning), it is important to ensure that screening tools are contextually appropriate, psychometrically sound, practical to administer, and well-validated and useful for follow-up services (Glover & Albers, 2007; Villarreal & Peterson, 2025). If screeners are inaccurate or if the interpretation of screening results is based on limited validity evidence, the potential benefits of screening are weakened. For example, students’ risk status may be misidentified, and students may miss opportunities for intervention.
Although few in number, meta-analyses of validity evidence for some specific screeners provide support for their use, as well as implications for further research in this area. For example, Kilgus et al. (2018) completed a validity generalization meta-analysis of the Student Risk Screening Scale (SRSS)—a narrowband screener focused on externalizing behavior problems—in which they identified validity correlation coefficients across 17 studies. This accumulated, large body of research allowed for quantitative synthesis of the SRSS evidence specifically, with results indicating that SRSS scores are reliable and valid indicators of student behavior. Furthermore, this study highlights methods that could be applied to other regularly used screeners that have also accumulated a large body of evidence. Sullivan et al. (2021) employed similar methods to evaluate validity evidence for the Social Skills Improvement System Screening/Progress Monitoring Scales (SSIS PMS), which is a brief teacher rating scale designed for universal screening. Across the 10 studies included in their meta-analysis, Sullivan et al. found that SSIS PMS items were generally correlated with academic and behavioral outcomes in the schools in theoretically expected ways. These results suggest that brief and even single-item screeners can provide meaningful correlations with relevant variables (e.g., grades, attendance, office discipline referrals, and measures of emotional/behavioral functioning) and therefore can be useful in identifying students in need of support.
The Behavior Assessment System for Children, Third Edition, Behavioral and Emotional Screening System (BESS; Kamphaus & Reynolds, 2015) is another brief, standardized screener that has been identified as the screening tool most widely used in schools in the United States (Benson et al., 2019). The BESS is intended to quickly (i.e., in about 5 min per student) assess behavioral and emotional risk via various indexes. The Behavioral and Emotional Risk Index (BERI) is a broadband index that is common across the teacher, parent, and student forms and represents an indicator of overall behavioral and emotional risk. In addition, the teacher and parent forms include the following narrowband scales: Externalizing Risk Index, Internalizing Risk Index, and Adaptive Skills Risk Index. The student form includes the following narrowband scales: Internalizing Risk Index, Self-Regulation Risk Index, and Personal Adjustment Risk Index. To complete the BESS, informants (teachers, parents, and/or students) rate the frequency with which students exhibit various behaviors using a four-point Likert-type scale (0 = never, 1 = sometimes, 2 = often, and 3 = almost always).
The BESS authors report adequate reliability. The BESS teacher and BESS parent index reliability coefficients were mostly in the 0.80 s to 0.90 s, respectively; the BESS student index reliability coefficients ranged from the middle 0.70 s to upper 0.80 s, with the BERI score in the 0.90 s (Kamphaus & Reynolds, 2015). The authors provide evidence of validity of the BESS using a few measures of behavior and emotional problems, as well as scales of adaptive functioning (e.g., Achenbach System of Empirically Based Assessment, Conners 3rd Edition, Autism Spectrum Rating Scale, Revised Children’s Manifest Anxiety Scale, and Children’s Depression Inventory). In general, the authors report correlations for corresponding composite scales in the 0.40 s to 0.70 range (Kamphaus & Reynolds, 2015). While the BESS authors report promising validity evidence, sample sizes for these correlations are relatively small, with an average sample size of 87 (SD = 40). Additionally, all validity correlations reported by the authors are based on other rating scales and the authors do not provide validity evidence for other domains (e.g., academic outcomes) for which the authors did not intend the BESS to be used but which—as indicated in our analyses—researchers nonetheless have used.
Several studies have been conducted examining the BESS’s psychometric defensibility. These studies have generally supported the developer-proposed structure of the BESS (i.e., the structure accounting for both global risk underlying sources of risk as indicated by its indexes) (Basting et al., 2022; Dever & Gaier, 2021; Eklund et al., 2022), but one study did suggest that a model that groups attention problems into one factor, in addition to the internalizing, externalizing, and adaptive skills factors, provides a better fit (Dowdy et al., 2019). Studies have also supported measurement equivalence by race/ethnicity (Basting et al., 2022; Dever & Gaier, 2021), gender (Basting et al., 2022), and age (Eklund et al., 2022). Additionally, the BESS has been used as an outcome indicator in studies examining a variety of issues, including general screening for mental health concerns in youth (DeBoer & Long, 2024), the effects of behavior support coaching to address disruptive behavior (Reddy et al., 2022), the effect of cognitive behavioral treatment programs (Sanders et al., 2019), and changes in behavioral health during the pandemic (Hanno et al., 2022), to name a few. This research has established important findings about the psychometric properties of the BESS and has demonstrated its broad use. However, no comprehensive synthesis of the validity of the BESS has been conducted.
Given its widespread use and the importance of screening results, it seems appropriate to evaluate the accumulated research regarding the BESS validity evidence and how well a brief screener that includes broadband and narrowband indicators of emotional and behavioral risk is predictive of functioning in distinct domains. Such an analysis would contribute to ongoing research regarding general screening practices and add to the limited quantitative synthesis studies for the BESS specifically. As detailed by Sitarenios (2022), it is especially critical to systematically evaluate the psychometric properties of brief or short-form scales to ensure that these measures can provide reliable and valid data despite having much fewer items than comprehensive measures of emotional and behavioral functioning.
As previously noted, one way of completing a useful review of related studies is by conducting a meta-analysis of the available correlational validity evidence. Referred to as a validity generalization, the purpose is to quantify validity coefficients reported for scores across multiple samples and to calculate an average validity coefficient while correcting for sampling error (Shultz & Whitney, 2005). Validity generalization studies provide stronger and more realistic estimates of the average observed validity coefficients across studies than is possible in any single study. Thus, the purpose of the current study was to conduct validity generalization meta-analyses of available evidence for the BESS to determine the estimated predictive validity between the BESS and other measures by using reported correlation coefficients from previous research studies. Resulting validity estimates are considered to represent the extent to which BESS scores predict alternative measures and outcomes such as student academic performance and social/behavioral functioning. It is important to acknowledge that there are several additional sources of evidence used to establish validity for using test scores to make decisions, including the appropriateness of item content, internal structure of the items, and classification accuracy indices such as sensitivity and specificity (American Educational Research Association [AERA] et al., 2014). Thus, the current analysis focuses on just one aspect of the larger validity concept.
Method
Search Strategy
We searched multiple electronic databases (i.e., PsycINFO, Psychology Database, PsycArticles, ERIC, SocINDEX, Psychology and Behavioral Sciences Collection, PubMed, MEDLINE, and ScienceDirect) covering the period from 2015 (the year of the publication of the BASC-3 BESS) through May 2025. Search terms included “Behavioral and Emotional Screening System” and “Behavioral Emotional Screening System,” and the search location was set to “anywhere” (i.e., the search was conducted in the title, abstract, and full text). An article was included in the study if it met the following criteria: (a) it was published in English in a peer-reviewed journal, was an unpublished dissertation, or was a published convention presentation; (b) BASC-3 BESS form(s) were administered; and (c) it reported validity evidence (i.e., correlations) representative of the relationship between any BESS index score(s) and related variables. We retrieved 448 articles in the initial search. Of these, 174 were excluded because they were duplicates (i.e., identified in multiple databases), 14 were excluded because they were not journal articles or unpublished dissertations or unpublished convention presentations, 214 were excluded because they did not administer the BASC-3 BESS, 29 were excluded because they indicated that the BASC-3 BESS was administered but no correlation information was provided, and 2 were excluded because they did not provide correlation information that could be categorized into relevant criterion variable categories (described in the next section). Thus, 15 articles (including 13 journal articles and two doctoral dissertations) met all inclusion criteria and were subject to our analysis. Additionally, we included the validity evidence provided by the BESS authors in the BESS manual, resulting in a total of 16 studies included in our analyses. See the references list for all included studies.
Data Extraction
Summary of Studies Included in the Meta-Analysis
Note. ASEBA = Achenbach System of Empirically Based Assessment; ASRI = Adaptive Skills Risk Index; ASRS = Autism Spectrum Rating Scales; BASC-3 = Behavior Assessment System for Children, Third Edition; BAY-I = Beck Anxiety Inventory for Youth; BERI = Behavioral and Emotional Risk Index; BESS-2 = BASC-2 Behavioral and Emotional Screening System; BESS-3 = BASC-3 Behavioral and Emotional Screening System; BOSS = Behavioral Observation of Students in Schools; BRIEF = Behavior Rating Inventory of Executive Function; CBCL = Child Behavior Checklist; CBQ = Childhood Behavior Questionnaire; CDI-2 = Children’s Depression Inventory, Second Edition; CLS = Conditions for Learning Survey; ECBQ = Early Childhood Behavior Questionnaire; ERI = Externalizing Risk Index; GPA = grade point average; HRQOL = Health-Related Quality of Life; IRI = Internalizing Risk Index; KSEP = Kindergarten Student Entrance Profile; MAP = Measures of Academic Progress; NIH TCM = National Institute of Health Toolbox Cognition Module; NR = not reported; PARI = Personal Adjustment Risk Index; PedsQL = Pediatric Quality of Life Inventory Sickle Cell Disease Module; PEFB = Preschool Executive Function Battery; PSC-17 = Pediatric Symptom Checklist-17; RCMAS-2 = Revised Children’s Manifest Anxiety Scale, Second Edition; SDQ-IS = Strengths and Difficulties Questionnaire-Impact Supplement; SRRI = Self-Regulation Risk Index; STAR = Renaissance START Early Literacy; YEPS = Youth Externalizing Problems Screener; YIEPS = Youth Internalizing and Externalizing Problems Screener; YIPS = Youth Internalizing Problems Screener.
aHood (2018) included two samples. For participant demographics, we have reported data for both samples.
bKamphaus and Reynolds (2015) included 12 samples in their study. For participant demographics, we have reported the range for the sample data.
Criterion Variables
We analyzed the coefficients using an aggregated approach that included analyses of all coefficients associated with (a) academics, (b) executive functioning, (c) externalizing problems, (d) internalizing problems, (e) prosocial functioning, and (f) global risk for behavior and emotional difficulties (as described in the next sections). We used this aggregate approach based in part on categories defined by Kilgus et al. (2018) and Sullivan et al. (2021). Additionally, based on typical practice for similar systematic reviews (e.g., Goerdt et al., 2025), we extracted and subsequently categorized and evaluated all relevant coefficients. This includes correlations based on matched informants and methods (e.g., correlations based on ratings from the teacher BESS and another teacher scale) and un-matched informants or methods (e.g., correlations based on ratings from the teacher BESS and student ratings, ratings from the teacher BESS, and number of office disciplinary referrals).
Academics
This category refers to test scores or grades indicative of academic proficiency and to student behaviors with relevance to academic proficiency. Examples from the articles included in our analyses include grade point average, math achievement, reading achievement, and academic engagement.
Executive Functioning
This category refers to indicators of planning, organization, and symptoms associated with attention-deficit/hyperactivity disorder (ADHD). Examples from the articles included in our analyses include ADHD index ratings, scores on tests of executive functioning, test items indicative of ADHD-related symptoms, and performance in an executive functioning delay task.
Externalizing Problems
This category refers to indicators of challenging behavior typically directed toward others and the environment. Examples from the articles included in our analyses include scores on rating scales of externalizing problems, teacher ratings of student disruptive behaviors, office discipline referrals, and school suspensions.
Internalizing Problems
This category refers to indicators of mood problems or emotional distress. Examples from the articles included in our analyses include scores on rating scales of internalizing problems, scores on scales indicating low self-esteem, and scores on scales of negative affect.
Prosocial Functioning
This category refers to indicators of a student’s capacity to exhibit prosocial skills, as well as indicators of overall positive well-being. Examples from the articles included in our analyses include scores on scales of social skills, teacher-reported prosocial behavior in the classroom, reported subjective well-being, and overall positive affect.
Global Risk
This category refers to indicators of a student’s general behavior and emotional risk. The most common example from the articles in our analyses includes scores on rating scales that combine internalizing and externalizing problems. The global risk category is distinguished from other categories in that it encompasses multiple categories.
Analysis
Validity generalization meta-analyses are based on an aggregation of correlation coefficients, which are employed as indicators of effect size. We analyzed all relevant correlation coefficients together (e.g., we treated Spearman’s rho and Pearson product-moment correlations as equivalent). Although different types of correlations reflect different data properties, we interpreted all correlations as representing the same underlying conceptual relationship. Because meta-analytic methods assume normality (Beretvas & Pastor, 2003), we transformed each correlation coefficient in our analyses to z scores using Fisher’s r-to-z transformation and used the converted estimates for the meta-analyses. Sampling variance was also computed in the transformation. For better interpretation, we transformed the correlation coefficients back into the original correlation coefficient metric and reported them as such in the results section. We conducted random-effects analyses using restricted maximum likelihood estimation (REML) and inverse variance weights to estimate parameters in our models and performed the meta-analyses using the statistical environment R, version 4.5.0 (R Core Team, 2024) and the metafor R package, version 4.8-0 (Viechtbauer, 2010). We used a random-effects model instead of a fixed-effects model because the included studies were not assumed to share a single common population effect (Raudenbush, 2009). Instead, the effect size indicators were expected to be heterogeneous in nature given the various methods, context, and samples of the studies we are analyzing.
We conducted the analyses separately for each of the 12 BESS index scales (four each from the parent, teacher, and student report forms) and each of the aggregate categories previously described, dependent on having at least two coefficients per category. Although we consequently report results of multiple meta-analyses, each resulting estimate represents a planned, unique predictor-criterion pairing rather than repeated testing of a single hypothesis. Thus, concerns about inflated Type 1 error rates are not directly applicable to our validity generalization analyses. In every study we analyzed, multiple coefficients were reported from the same sample. Given the purpose of our study and our separate analysis of each BESS scale for each informant, we treated the coefficients as independent. We used Cohen’s criteria (1988) in interpreting the effect sizes of the BESS and criterion variables (i.e., small [r = .2], moderate [r = .5], and large [r = .8]). Notably, for the BESS Parent Adaptive Skills Risk Index, Teacher Adaptive Skills Risk Index, and Student Personal Adjustment Risk Index, unlike the other indexes, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased level of behavior problems.
Results
The validity generalization meta-analysis included 16 studies reporting correlations between BESS indexes and a variety of student-focused criterion variables. The studies represented a wide range of sample sizes, from 27 participants in the smallest study to 2,880 participants in the largest study. The studies also represented a wide range of participant grade levels. Across studies, participants were in preschool through 12th grade, including one study in which the average participant age was 14 years but which did include participants up to 23 years. The percentage of white participants in each study was also highly variable, from 0% (in two studies whose population of interest were youth with sickle cell disease) to one study with 87.2% of participants identified as white. There was a wide range in gender, reported as the percentage of boys and girls in each study, but the majority of studies reported generally equal proportions in samples. Finally, the majority of studies did not report what proportion of their samples were receiving special education services or were known to have a mental health diagnosis. See Table 1 for a list of sample characteristics for all studies.
Associations Between BESS-3 Teacher Index Scores and Validity Criteria
Note. For the Teacher Adaptive Skills Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.
*p < .05. **p < .01.
Associations Between BESS-3 Self-Report Index Scores and Validity Criteria
Note. For the Student Personal Adjustment Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.
*p < .05. **p < .01.
Associations Between BESS-3 Parent Index Scores and Validity Criteria
Note. For the Parent Adaptive Skills Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.
*p < .05. **p < .01.
Meta-Analysis Results
Teacher BESS
Results of the validity generalization meta-analyses for the teacher BESS are presented in Table 2. The coefficient of the intercept is considered to be the overall effect size of the relationship between the BESS and outcomes. The reported intercepts for most of the aggregated outcomes were significantly different from zero (p < .05 or p < .01), with the following exceptions: Teacher Externalizing Risk Index correlated with Academics, Teacher Internalizing Risk Index correlated with Academics, and Teacher Adaptive Skills Risk Index correlated with Prosocial Functioning and Global Risk. Generally speaking, the teacher BESS scores were mostly moderately correlated with the aggregated outcome measures when compared with Cohen’s criteria (1,988). Notably, the Teacher BERI score was strongly correlated with the Global Risk variable, which is consistent with the intent of the BERI as an indicator of overall behavioral and emotional risk.
Student BESS
Results of the validity generalization meta-analyses for the student BESS are presented in Table 3. The reported intercepts for most of the aggregated outcomes were significantly different from zero (p < .05 or p < .01), with the following exceptions: Student BERI correlated with Externalizing Problems, Student Internalizing Risk Index correlated with Academics and Externalizing Problems, Student Self-Regulation Risk Index correlated with Executive Functioning, and Student Personal Adjustment Risk Index correlated with Academics and Externalizing Problems. These results suggest that scores on the BESS self-report scales were not as effective at predicting externalizing problems when compared to the BESS teacher- and parent-report scales. On the other hand, all four of the student BESS index scores were significantly correlated with the Global Risk variable, suggesting the utility of these scales in predicting overall risk.
Parent BESS
Results of the validity generalization meta-analyses for the parent BESS are presented in Table 4. These results are more mixed than what was observed for the teacher and student BESS scores, with 8 out of 23 correlation coefficients failing to reach statistical significance. Notably, none of the parent BESS index scores were significantly correlated with scores on the aggregated Academic variable, and only the Parent Externalizing Risk Index was significantly correlated with the Executive Functioning variable. On a positive note, all four of the parent BESS index scores were significantly correlated with the Global Risk aggregate variable, suggesting that the parent form is most appropriately used as an indicator of overall risk.
Discussion
The purpose of this study was to use validity generalization meta-analyses to assess the extent to which BESS scores, indicative of level of emotional and behavioral risk as indicated by a broadband index (i.e., BERI) and narrowband indexes (i.e., Externalizing Risk Index, Internalizing Risk Index, Adaptive Skills Risk Index, Self-Regulation Risk Index, and Personal Adjustment Risk Index), predicted alternative measures and outcomes such as student academic performance, social and behavioral functioning, and ratings from other measures. Given the substantial contribution of psychometrically sound and efficient measures to the screening process—and the BESS’s status as the most widely used screening tool in schools (Benson et al., 2019)—results have important implications for contextually appropriate use of the BESS, decisions made based on BESS scores, and providing evidence of convergent validity of BESS scores. As we describe below, our findings support the general validity of the BESS, but our findings must be understood within the general limits of our findings. Specifically, the effect sizes we present (see Tables 2–4) are generally based on a small number of studies and samples. Consequently, our ability to make nuanced statements about potential moderators is precluded, and some studies with very large sample sizes (see Table 1) have had an outsized influence on the generalized correlation coefficients. Thus, we present a broad discussion of our findings.
The 16 studies included in our analyses reported correlation coefficients between BESS scores and a wide range of criterion variables, including scores on other rating scales of emotional/behavior functioning in addition to other relevant outcomes such as office discipline referrals, suspensions, attendance, and grade point average. In general, the resulting coefficients were in the expected direction given the constructs measured by the BESS scales (e.g., the BESS Externalizing and Internalizing Risk Indexes were positively and significantly correlated with the aggregate Externalizing Problems and Internalizing Problems variables, while the Teacher and Parent Adaptive Skills Risk Indexes were negatively associated with problem behaviors). This general finding is expected, as screeners are more likely to be closely aligned with measures of similar constructs than they are with measures of distinct constructs (e.g., Allen et al., 2019; Goerdt et al., 2025). Perhaps the most notable finding is that across all three forms of the BESS (i.e., teacher, student, and parent), the BERI—representing a broadband index—consistently had amongst the highest correlations for different criterion areas. The results indicate that the BESS can be confidently used as a screening instrument for early detection and follow-up planning, particularly if decision-making is based on BERI scores—as opposed to the narrowband indexes that the BESS also provides—as the first step in the screening process. Additionally, across all three forms of the BESS (i.e., teacher, student, and parent), the lowest correlations were between the BESS scores and academic variables. This pattern suggests that the BESS should not be used as an academic screener; instead, measures focusing on constructs other than behavioral and emotional risk (e.g., grades on assignments and performance on curriculum-based assessments) are more appropriate when screening for academic problems or risk of academic delays. Similar findings have been reported in single sample studies (Dowdy et al., 2016) and are confirmed in our meta-analyses and other analyses using a variety of screeners (Allen et al., 2019; Kilgus et al., 2018).
With regard to the narrowband index results, our findings suggest that practitioners might find these useful for specific purposes. In particular, scores on the BESS Internalizing Risk Indexes were significantly correlated with the Internalizing Problems variables across teacher, parent, and self-report forms. Similarly, scores on the BESS Externalizing Risk Indexes (which appear only in the teacher and parent forms) were significantly correlated with behaviors in the Externalizing Problems category. Thus, although the results of this study provide the strongest support for using the BERI as an overall risk indicator, results also support using the narrowband indexes when specifically screening for internalizing and externalizing problems. Since the BESS Internalizing and Externalizing Risk Indexes were also consistently correlated with the Global Risk variable, these scales might be useful in screening for global distress as well.
Looking at the big picture, correlation coefficients were fairly similar across the three different forms of the BESS, but fewer statistically significant effect sizes were found for the parent BESS form as compared to the teacher and student BESS forms. As the parent BESS form provided more mixed evidence—as well as the fewest number of correlation coefficients available for analysis—there is room for further research on the implications of screening results based on parent informants. Consequently, at this time we recommend that schools prioritize administration of the teacher and student forms of the BESS, and that if having teachers complete the BESS for all students in their class(es) is not possible because of the lack of time or perhaps lack of knowledge regarding a particular student, having students complete the BESS seems to be a viable alternative given the statistically significant correlation coefficients between their ratings and criterion variables (with a few notable exceptions). That said, although the format of the BESS and other multi-informant measures might suggest that results could generalize across informants, it would be inappropriate to evaluate the results of screeners in this way, especially if the results of narrowband indexes—as opposed to broadband indexes such as the BERI—will be used in decision-making. Rather, it is important to carefully consider which informant is most appropriate for the specific screening purposes of a study or school-based implementation of screening, as each informant will provide a unique and possibly discrepant perspective which may be more relevant and informative depending on the context, constructs, and variables under investigation (Graybill et al., 2025; Zakszeski et al., 2025).
Limitations
These results must be interpreted within the context of several limitations. As previously noted, the current analyses were founded upon a relatively small number of studies, which yielded limited validity coefficients. The relatively small number of studies also influenced our analyses in that we derived estimates regarding the validity of the BESS in predicting aggregate outcomes rather than more narrowly defined outcomes that could have provided more specific implications for practice. Furthermore, the limited amount of participant data reported in most studies limited our ability to meaningfully evaluate moderator variables to determine whether these particular sample features influenced the results. This represents a constraint on generality, as for this study we did not consider individual participants characteristics such as sex/gender, race, or ethnicity.
These limitations reflect the characteristics of the available literature on the BESS. The potential for future meta-analyses to include a larger number of studies, analyze more precisely or narrowly defined criterion variables, and evaluate the influence of relevant moderators ultimately will depend upon the time necessary for the field to develop a larger sample of studies from which to draw evidence. Based on our review, it is also particularly important for future studies of behavioral and emotional risk screeners to include parent report data. Having this data will allow researchers to more meaningfully evaluate the utility of parent screeners, and it will provide information that is particularly helpful to professionals using screeners outside of the school settings, in which it is much less likely that teacher screening data is available. It will also be important to continue to use measures with large bodies of accumulated literature, such as the BESS and SRSS (Kilgus et al., 2018), to continue to examine general screening issues, such as the utility of using broadband vs. narrowband screeners for both identification of students at risk, as well as subsequent implementation of interventions (Peterson & Villarreal, 2024).
Given the focus on validity coefficients in the current study, it will also be useful for future research to examine other important sources of validity evidence such as classification accuracy statistics (i.e., sensitivity, specificity, positive predictive value, and negative predictive value; AERA et al., 2014). This will allow for a more complete understanding of how the BESS (and similar screeners) can inform screening decisions. Lastly, we wish to acknowledge that the BASC-4 BESS is scheduled for publication in 2026, which may seem to limit the utility of these results based on the BASC-3 BESS. However, as described by Cronje et al. (2022), revised versions of tests are frequently compared to (and often correlated with) the previous version, in order to understand similarities/differences from one edition to the next. Thus, knowledge of the BASC-3 BESS validity may help researchers and practitioners interpret correlations between BASC-3 BESS scores and BASC-4 BESS scores. Our findings will also inform any research that is yet to be conducted or published based on the BESS-3, which we expect to see for many years even after the BESS-4 is available. Further, if we wanted to compare validity evidence of the BASC-3 BESS with validity evidence of the BASC-4 BESS, the current study would be helpful in that regard by aggregating predictive validity coefficients using the BASC-3 BESS. Of course, this is only one source of support for using the BESS, and once the BESS-4 is published it will need to be comprehensively evaluated based on changes in norms, items, and constructs measured (Butcher, 2000).
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
