Correlational Validity Evidence for the BASC-3 Behavioral and Emotional Screening System (BESS): A Meta-Analysis

Abstract

We document results of validity generalization meta-analyses of the teacher, student, and parent forms of the BASC-3 Behavioral and Emotional Screening System (BESS). Effect sizes (reported here as absolute values, based on 16 studies and 428 correlation coefficients) ranged from 0.040 to 0.350 for constructs related to academics, 0.399 to 0.690 for executive functioning, 0.350 to 0.627 for externalizing problems, 0.287 to 0.815 for internalizing problems, 0.344 to 0.660 for prosocial functioning, and 0.478 to 0.875 for global risk indicators. Extracted coefficients were of the expected direction and magnitude with theoretically aligned constructs, although scores on the BESS scales showed relatively weak relationships with academic variables and results from the parent form were less consistent. Results indicate the broad utility of the BESS in identifying students for further assessment as part of the screening process and support the use of the broadband BESS Behavior and Emotional Risk Index (BERI) particularly for this purpose.

Keywords

validity generalization mental health screening BASC BESS

School-based, universal mental health screening refers to a systematic, routine assessment process in which all students are screened to identify those most at risk for significant mental health problems. The identification of students at significant risk informs follow-up assessment to gain more information about their risk status and also guides systematic prevention and intervention efforts. A key benefit of screening is that a sizeable proportion of students are screened, reducing the number of students that are overlooked. In addition, screening provides a baseline for future monitoring and assessment and contributes to data-driven approaches to intervention. Screening also leads to earlier intervention for emerging problems, which is less intensive and expensive than treatment received after significant symptoms of distress are exhibited (Dvorsky et al., 2014). Unsurprisingly, screening has been proposed as a key step in service delivery reform, with the potential to place increased emphasis on prevention, early intervention, and the promotion of social-emotional health (U.S. Preventive Services Task Force, 2009). It is also an important component of efforts to increase access to services and is generally perceived positively by parents, students, and school staff based on the limited research on acceptability (Palmer et al., 2025).

A critical part of the screening process is the selection of screening tools to identify students in need of services. A variety of screeners are regularly used in schools (see Benson et al., 2019). Some are broadband screeners that yield a single, overall score of general risk based on items representing multiple domains (e.g., internalizing problems, externalizing problems, school problems, and adaptive behavior). Others are narrowband screeners that yield a single score or separate scale scores based on items that all measure the same domain (e.g., screeners that only yield scores for internalizing problems). Some screeners yield both a total score and subscale scores, providing results for both broad and narrow domains. Notably, the distinction between narrowband and broadband screeners is important, as each type of screener has been associated with differing levels of reliability and validity and should be considered different approaches to screening (Allen et al., 2019). Given the purpose and potential outcomes of the screening process (e.g., further assessment, potential mental health disorder diagnosis, and intervention planning), it is important to ensure that screening tools are contextually appropriate, psychometrically sound, practical to administer, and well-validated and useful for follow-up services (Glover & Albers, 2007; Villarreal & Peterson, 2025). If screeners are inaccurate or if the interpretation of screening results is based on limited validity evidence, the potential benefits of screening are weakened. For example, students’ risk status may be misidentified, and students may miss opportunities for intervention.

Although few in number, meta-analyses of validity evidence for some specific screeners provide support for their use, as well as implications for further research in this area. For example, Kilgus et al. (2018) completed a validity generalization meta-analysis of the Student Risk Screening Scale (SRSS)—a narrowband screener focused on externalizing behavior problems—in which they identified validity correlation coefficients across 17 studies. This accumulated, large body of research allowed for quantitative synthesis of the SRSS evidence specifically, with results indicating that SRSS scores are reliable and valid indicators of student behavior. Furthermore, this study highlights methods that could be applied to other regularly used screeners that have also accumulated a large body of evidence. Sullivan et al. (2021) employed similar methods to evaluate validity evidence for the Social Skills Improvement System Screening/Progress Monitoring Scales (SSIS PMS), which is a brief teacher rating scale designed for universal screening. Across the 10 studies included in their meta-analysis, Sullivan et al. found that SSIS PMS items were generally correlated with academic and behavioral outcomes in the schools in theoretically expected ways. These results suggest that brief and even single-item screeners can provide meaningful correlations with relevant variables (e.g., grades, attendance, office discipline referrals, and measures of emotional/behavioral functioning) and therefore can be useful in identifying students in need of support.

The Behavior Assessment System for Children, Third Edition, Behavioral and Emotional Screening System (BESS; Kamphaus & Reynolds, 2015) is another brief, standardized screener that has been identified as the screening tool most widely used in schools in the United States (Benson et al., 2019). The BESS is intended to quickly (i.e., in about 5 min per student) assess behavioral and emotional risk via various indexes. The Behavioral and Emotional Risk Index (BERI) is a broadband index that is common across the teacher, parent, and student forms and represents an indicator of overall behavioral and emotional risk. In addition, the teacher and parent forms include the following narrowband scales: Externalizing Risk Index, Internalizing Risk Index, and Adaptive Skills Risk Index. The student form includes the following narrowband scales: Internalizing Risk Index, Self-Regulation Risk Index, and Personal Adjustment Risk Index. To complete the BESS, informants (teachers, parents, and/or students) rate the frequency with which students exhibit various behaviors using a four-point Likert-type scale (0 = never, 1 = sometimes, 2 = often, and 3 = almost always).

The BESS authors report adequate reliability. The BESS teacher and BESS parent index reliability coefficients were mostly in the 0.80 s to 0.90 s, respectively; the BESS student index reliability coefficients ranged from the middle 0.70 s to upper 0.80 s, with the BERI score in the 0.90 s (Kamphaus & Reynolds, 2015). The authors provide evidence of validity of the BESS using a few measures of behavior and emotional problems, as well as scales of adaptive functioning (e.g., Achenbach System of Empirically Based Assessment, Conners 3rd Edition, Autism Spectrum Rating Scale, Revised Children’s Manifest Anxiety Scale, and Children’s Depression Inventory). In general, the authors report correlations for corresponding composite scales in the 0.40 s to 0.70 range (Kamphaus & Reynolds, 2015). While the BESS authors report promising validity evidence, sample sizes for these correlations are relatively small, with an average sample size of 87 (SD = 40). Additionally, all validity correlations reported by the authors are based on other rating scales and the authors do not provide validity evidence for other domains (e.g., academic outcomes) for which the authors did not intend the BESS to be used but which—as indicated in our analyses—researchers nonetheless have used.

Several studies have been conducted examining the BESS’s psychometric defensibility. These studies have generally supported the developer-proposed structure of the BESS (i.e., the structure accounting for both global risk underlying sources of risk as indicated by its indexes) (Basting et al., 2022; Dever & Gaier, 2021; Eklund et al., 2022), but one study did suggest that a model that groups attention problems into one factor, in addition to the internalizing, externalizing, and adaptive skills factors, provides a better fit (Dowdy et al., 2019). Studies have also supported measurement equivalence by race/ethnicity (Basting et al., 2022; Dever & Gaier, 2021), gender (Basting et al., 2022), and age (Eklund et al., 2022). Additionally, the BESS has been used as an outcome indicator in studies examining a variety of issues, including general screening for mental health concerns in youth (DeBoer & Long, 2024), the effects of behavior support coaching to address disruptive behavior (Reddy et al., 2022), the effect of cognitive behavioral treatment programs (Sanders et al., 2019), and changes in behavioral health during the pandemic (Hanno et al., 2022), to name a few. This research has established important findings about the psychometric properties of the BESS and has demonstrated its broad use. However, no comprehensive synthesis of the validity of the BESS has been conducted.

Given its widespread use and the importance of screening results, it seems appropriate to evaluate the accumulated research regarding the BESS validity evidence and how well a brief screener that includes broadband and narrowband indicators of emotional and behavioral risk is predictive of functioning in distinct domains. Such an analysis would contribute to ongoing research regarding general screening practices and add to the limited quantitative synthesis studies for the BESS specifically. As detailed by Sitarenios (2022), it is especially critical to systematically evaluate the psychometric properties of brief or short-form scales to ensure that these measures can provide reliable and valid data despite having much fewer items than comprehensive measures of emotional and behavioral functioning.

As previously noted, one way of completing a useful review of related studies is by conducting a meta-analysis of the available correlational validity evidence. Referred to as a validity generalization, the purpose is to quantify validity coefficients reported for scores across multiple samples and to calculate an average validity coefficient while correcting for sampling error (Shultz & Whitney, 2005). Validity generalization studies provide stronger and more realistic estimates of the average observed validity coefficients across studies than is possible in any single study. Thus, the purpose of the current study was to conduct validity generalization meta-analyses of available evidence for the BESS to determine the estimated predictive validity between the BESS and other measures by using reported correlation coefficients from previous research studies. Resulting validity estimates are considered to represent the extent to which BESS scores predict alternative measures and outcomes such as student academic performance and social/behavioral functioning. It is important to acknowledge that there are several additional sources of evidence used to establish validity for using test scores to make decisions, including the appropriateness of item content, internal structure of the items, and classification accuracy indices such as sensitivity and specificity (American Educational Research Association [AERA] et al., 2014). Thus, the current analysis focuses on just one aspect of the larger validity concept.

Method

Search Strategy

We searched multiple electronic databases (i.e., PsycINFO, Psychology Database, PsycArticles, ERIC, SocINDEX, Psychology and Behavioral Sciences Collection, PubMed, MEDLINE, and ScienceDirect) covering the period from 2015 (the year of the publication of the BASC-3 BESS) through May 2025. Search terms included “Behavioral and Emotional Screening System” and “Behavioral Emotional Screening System,” and the search location was set to “anywhere” (i.e., the search was conducted in the title, abstract, and full text). An article was included in the study if it met the following criteria: (a) it was published in English in a peer-reviewed journal, was an unpublished dissertation, or was a published convention presentation; (b) BASC-3 BESS form(s) were administered; and (c) it reported validity evidence (i.e., correlations) representative of the relationship between any BESS index score(s) and related variables. We retrieved 448 articles in the initial search. Of these, 174 were excluded because they were duplicates (i.e., identified in multiple databases), 14 were excluded because they were not journal articles or unpublished dissertations or unpublished convention presentations, 214 were excluded because they did not administer the BASC-3 BESS, 29 were excluded because they indicated that the BASC-3 BESS was administered but no correlation information was provided, and 2 were excluded because they did not provide correlation information that could be categorized into relevant criterion variable categories (described in the next section). Thus, 15 articles (including 13 journal articles and two doctoral dissertations) met all inclusion criteria and were subject to our analysis. Additionally, we included the validity evidence provided by the BESS authors in the BESS manual, resulting in a total of 16 studies included in our analyses. See the references list for all included studies.

Data Extraction

Two researchers independently coded the articles that met inclusion criteria in regard to (1) BESS informant and scale(s) used, (2) student sample size, (3) student participant demographics, (4) criterion with which BESS scores were correlated, and (5) correlation coefficients between criterion and BESS scores. Data were coded and analyzed separately for each relevant correlation coefficient within each article. At the data extraction stage, we reviewed all included studies to examine whether multiple articles evaluated the same data sets. We determined that none of the articles meeting inclusion criteria evaluated the same data sets; therefore, we did not exclude any data. See Table 1 for relevant information from each article. Initial percent agreement for coding was 90.8%. For all disagreements, the researchers conferred to determine which codes were appropriate, resulting in the correction of coded data.

Table 1.

Summary of Studies Included in the Meta-Analysis

Study	Sample size	Grade level (s)	% Male	% White	% Special education/diagnosed	BESS scale (s)	Criterion variable (s)
Alperin et al. (2023)	349	K through 5^th	77	NR	NR	Teacher BERI	BOSS active engagement, passive engagement, total engagement, inappropriate physical behavior, inappropriate verbal behavior, noncompliance, disruptive academic behavior
DeBoer and Long (2024)	409	6^th through 12^th	45	67	NR	Teacher ERI	SDQ-IS; YEPS; YIEPS; YIPS
Dever and Gaier (2021)	210	3^rd through 5^th	45	9	NR	Student BERI, Student IRI, Student SRRI, Student PARI	Math achievement; reading achievement; office discipline referrals; school suspensions
Distefano et al. (2020)	1,007	Pre-school	50	43	NR	Teacher ERI, Teacher IRI, Teacher ASRI, Parent ERI, Parent IRI, Parent ASRI	BESS-3 parent ASRI, parent ERI, parent IRI, teacher ASRI, teacher ERI, teacher IRI
Dowdy et al. (2019)	459	Pre-school	49	51	7	Parent ERI, Parent IRI, Parent ASRI	BESS-2 attention problem items, parent ASRI, parent ERI, parent IRI; BESS-3 attention problem items, parent ASRI, parent ERI, parent IRI
Edmunds et al. (2023)	90	Pre-school	71	64	47.7	Parent ERI, Parent IRI	EBCQ frustration; CBQ frustration; PEFB inhibition (executive function); delay task (executive function)
Eklund et al. (2022)	1,472	K through 5th	52	69	NR	Teacher BERI, Teacher ERI, Teacher IRI, Teacher ASRI	BESS-3 teacher ARSI, teacher BERI, teacher ERI, teacher IRI
Fletcher-Janzen and Harrington (2020)	808	9th	48	28	NR	Student BERI, Student IRI, Student SRRI, Student PARI	BASC-3 student emotional symptoms index, student inattention/hyperactivity, student internalizing problems, student personal adjustment, student school problems; BESS-3 student BERI, student IRI, student PARI, student SRRI
Hanno et al. (2022)	2,880	Pre-school through 2^nd	51	73	NR	Parent ERI, Parent IRI, Parent ASRI	BESS-3 parent ASRI, parent ERI, parent IRI; BRIEF dysregulated behaviors
Hood (2018) ^a	27, 34	K through 12^th+	33, 50	0, 0	NR	Teacher ERI	NIH TCM non-executive abilities, executive abilities; Conners 3 self-report attention; BRIEF-2 self-report executive functioning; HRQOL self-report
Hood et al. (2019)	51	K through 12	47	0	NR	Parent BERI	BRIEF-2; Conners 3; PedsQL health-related quality of life
Ijaz et al. (2024)	34	9^th through 12^th	61	0	NR	Student BERI, Student IRI, Student SSRI, Student PARI	BAI-Y; BESS 3 student IRI, student PARI, student SRRI, student BERI
Kamphaus and Reynolds (2015) ^b	39–173	Pre-school through 12^th	35–62	42–87	NR	Teacher BERI, Teacher ERI, Teacher IRI, Teacher ARSI, Parent BERI, Parent ERI, Parent IRI, Parent ASRI, Student BERI, Student IRI, Student SRRI, Student PARI	ASEBA CBCL student internalizing problems, student total problems, teacher externalizing problems, teacher internalizing problems, teacher total problems, parent externalizing problems, parent internalizing problems, parent total problems; Conners 3 parent ADHD index, parent global index, student ADHD index, teacher ADHD index, teacher global index; ASRS parent total score, teacher total score; CDI-2 student report; RCMAS-2 student report
Moore et al. (2021)	535	Pre-school	44	12	NR	Parent BERI, Parent BRI, Parent IRI. Teacher BERI, Teacher ERI. Teacher IRI	BESS-3 parent BERI, parent ERI, parent IRI, teacher BERI, teacher ERI, teacher IRI, KSEP academic readiness, KSEP social-emotional readiness, PSC-17 parent externalizing, parent internalizing, parent total, teacher externalizing, teacher internalizing, teacher total; STAR
Naser and Dever (2019)	73	4^th	59	40	NR	Teacher BERI, Teacher ERI, Teacher IRI, Teacher ASRI, Student IRI, Student SRRI, Student PARI, Student BERI	MAP math, reading; CLS school climate safety, school climate social emotional learning, school climate academic rigor, school climate student support; BESS-3 teacher ERI, teacher IRI, student IRI, student PARI, teacher BERI
Sutton (2023)	318	5^th	47	72	12	Student BERI, Student IRI, Student SRRI, Student PARI	School absences; BESS-3 student IRI, student PARI, student SRRI, student BERI; reading achievement; math achievement; GPA

Note. ASEBA = Achenbach System of Empirically Based Assessment; ASRI = Adaptive Skills Risk Index; ASRS = Autism Spectrum Rating Scales; BASC-3 = Behavior Assessment System for Children, Third Edition; BAY-I = Beck Anxiety Inventory for Youth; BERI = Behavioral and Emotional Risk Index; BESS-2 = BASC-2 Behavioral and Emotional Screening System; BESS-3 = BASC-3 Behavioral and Emotional Screening System; BOSS = Behavioral Observation of Students in Schools; BRIEF = Behavior Rating Inventory of Executive Function; CBCL = Child Behavior Checklist; CBQ = Childhood Behavior Questionnaire; CDI-2 = Children’s Depression Inventory, Second Edition; CLS = Conditions for Learning Survey; ECBQ = Early Childhood Behavior Questionnaire; ERI = Externalizing Risk Index; GPA = grade point average; HRQOL = Health-Related Quality of Life; IRI = Internalizing Risk Index; KSEP = Kindergarten Student Entrance Profile; MAP = Measures of Academic Progress; NIH TCM = National Institute of Health Toolbox Cognition Module; NR = not reported; PARI = Personal Adjustment Risk Index; PedsQL = Pediatric Quality of Life Inventory Sickle Cell Disease Module; PEFB = Preschool Executive Function Battery; PSC-17 = Pediatric Symptom Checklist-17; RCMAS-2 = Revised Children’s Manifest Anxiety Scale, Second Edition; SDQ-IS = Strengths and Difficulties Questionnaire-Impact Supplement; SRRI = Self-Regulation Risk Index; STAR = Renaissance START Early Literacy; YEPS = Youth Externalizing Problems Screener; YIEPS = Youth Internalizing and Externalizing Problems Screener; YIPS = Youth Internalizing Problems Screener.

^aHood (2018) included two samples. For participant demographics, we have reported data for both samples.

^bKamphaus and Reynolds (2015) included 12 samples in their study. For participant demographics, we have reported the range for the sample data.

Criterion Variables

We analyzed the coefficients using an aggregated approach that included analyses of all coefficients associated with (a) academics, (b) executive functioning, (c) externalizing problems, (d) internalizing problems, (e) prosocial functioning, and (f) global risk for behavior and emotional difficulties (as described in the next sections). We used this aggregate approach based in part on categories defined by Kilgus et al. (2018) and Sullivan et al. (2021). Additionally, based on typical practice for similar systematic reviews (e.g., Goerdt et al., 2025), we extracted and subsequently categorized and evaluated all relevant coefficients. This includes correlations based on matched informants and methods (e.g., correlations based on ratings from the teacher BESS and another teacher scale) and un-matched informants or methods (e.g., correlations based on ratings from the teacher BESS and student ratings, ratings from the teacher BESS, and number of office disciplinary referrals).

Academics

This category refers to test scores or grades indicative of academic proficiency and to student behaviors with relevance to academic proficiency. Examples from the articles included in our analyses include grade point average, math achievement, reading achievement, and academic engagement.

Executive Functioning

This category refers to indicators of planning, organization, and symptoms associated with attention-deficit/hyperactivity disorder (ADHD). Examples from the articles included in our analyses include ADHD index ratings, scores on tests of executive functioning, test items indicative of ADHD-related symptoms, and performance in an executive functioning delay task.

Externalizing Problems

This category refers to indicators of challenging behavior typically directed toward others and the environment. Examples from the articles included in our analyses include scores on rating scales of externalizing problems, teacher ratings of student disruptive behaviors, office discipline referrals, and school suspensions.

Internalizing Problems

This category refers to indicators of mood problems or emotional distress. Examples from the articles included in our analyses include scores on rating scales of internalizing problems, scores on scales indicating low self-esteem, and scores on scales of negative affect.

Prosocial Functioning

This category refers to indicators of a student’s capacity to exhibit prosocial skills, as well as indicators of overall positive well-being. Examples from the articles included in our analyses include scores on scales of social skills, teacher-reported prosocial behavior in the classroom, reported subjective well-being, and overall positive affect.

Global Risk

This category refers to indicators of a student’s general behavior and emotional risk. The most common example from the articles in our analyses includes scores on rating scales that combine internalizing and externalizing problems. The global risk category is distinguished from other categories in that it encompasses multiple categories.

Analysis

Validity generalization meta-analyses are based on an aggregation of correlation coefficients, which are employed as indicators of effect size. We analyzed all relevant correlation coefficients together (e.g., we treated Spearman’s rho and Pearson product-moment correlations as equivalent). Although different types of correlations reflect different data properties, we interpreted all correlations as representing the same underlying conceptual relationship. Because meta-analytic methods assume normality (Beretvas & Pastor, 2003), we transformed each correlation coefficient in our analyses to z scores using Fisher’s r-to-z transformation and used the converted estimates for the meta-analyses. Sampling variance was also computed in the transformation. For better interpretation, we transformed the correlation coefficients back into the original correlation coefficient metric and reported them as such in the results section. We conducted random-effects analyses using restricted maximum likelihood estimation (REML) and inverse variance weights to estimate parameters in our models and performed the meta-analyses using the statistical environment R, version 4.5.0 (R Core Team, 2024) and the metafor R package, version 4.8-0 (Viechtbauer, 2010). We used a random-effects model instead of a fixed-effects model because the included studies were not assumed to share a single common population effect (Raudenbush, 2009). Instead, the effect size indicators were expected to be heterogeneous in nature given the various methods, context, and samples of the studies we are analyzing.

We conducted the analyses separately for each of the 12 BESS index scales (four each from the parent, teacher, and student report forms) and each of the aggregate categories previously described, dependent on having at least two coefficients per category. Although we consequently report results of multiple meta-analyses, each resulting estimate represents a planned, unique predictor-criterion pairing rather than repeated testing of a single hypothesis. Thus, concerns about inflated Type 1 error rates are not directly applicable to our validity generalization analyses. In every study we analyzed, multiple coefficients were reported from the same sample. Given the purpose of our study and our separate analysis of each BESS scale for each informant, we treated the coefficients as independent. We used Cohen’s criteria (1988) in interpreting the effect sizes of the BESS and criterion variables (i.e., small [r = .2], moderate [r = .5], and large [r = .8]). Notably, for the BESS Parent Adaptive Skills Risk Index, Teacher Adaptive Skills Risk Index, and Student Personal Adjustment Risk Index, unlike the other indexes, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased level of behavior problems.

Results

The validity generalization meta-analysis included 16 studies reporting correlations between BESS indexes and a variety of student-focused criterion variables. The studies represented a wide range of sample sizes, from 27 participants in the smallest study to 2,880 participants in the largest study. The studies also represented a wide range of participant grade levels. Across studies, participants were in preschool through 12th grade, including one study in which the average participant age was 14 years but which did include participants up to 23 years. The percentage of white participants in each study was also highly variable, from 0% (in two studies whose population of interest were youth with sickle cell disease) to one study with 87.2% of participants identified as white. There was a wide range in gender, reported as the percentage of boys and girls in each study, but the majority of studies reported generally equal proportions in samples. Finally, the majority of studies did not report what proportion of their samples were receiving special education services or were known to have a mental health diagnosis. See Table 1 for a list of sample characteristics for all studies.

The 16 studies yielded 428 correlation coefficients. The majority of the correlations were based on associations between the BESS indexes and other scale-based ratings as opposed to measures such as classroom observations and grades, both of which represent different assessment methods. This finding—that most correlations are between rating scales—has been found in similar reviews (e.g., Allen et al., 2019; Goerdt et al., 2025; Kilgus et al., 2018). Our individual meta-analyses (i.e., unique analyses for each BESS index for each criterion category) represented a wide range of coefficients and samples, ranging from 2 coefficients to 17 coefficients and 1 sample to 6 samples. The specific number of correlation coefficients and samples used in each analysis is reported in Table 2 (teacher), Table 3 (student), and Table 4 (parent). Notably, although the BESS is the most widely used screening instrument in schools and has been used in studies covering a wide variety of topics, we identified a relatively small number of coefficients for each BESS index for each criterion category that we used, resulting in fewer than the minimum recommended number of studies and coefficients needed to conduct moderator analyses (Deeks et al., 2022). Thus, our results are based on the primary findings from our general validity generalization meta-analyses.

Table 2.

Associations Between BESS-3 Teacher Index Scores and Validity Criteria

Scale	Parameter	Academics	Executive functioning	Externalizing problems	Internalizing problems	Prosocial functioning	Global risk
Teacher BERI	Intercept	−.312**	—	.627**	.448*	−.660*	.618**
	(SE)	.062	—	.217	.146	.222	.142
	# of coefficients	8	—	12	6	4	8
	# of samples	3	—	4	3	3	3
	Mean sample size (SD)	292 (197)	—	438 (371)	621 (449)	531 (666)	256 (232)
Teacher externalizing risk index	Intercept	−.310	—	.575**	.287**	−.530**	.688**
	(SE)	.175	—	.119	.082	.118	.160
	# of coefficients	5	—	9	8	6	8
	# of samples	2	—	4	4	4	3
	Mean sample size (SD)	258 (253)	—	335 (335)	602 (727)	810 (910)	488 (453)
Teacher internalizing risk index	Intercept	−.139	—	.424**	.418*	−.453**	.569**
	(SE)	.079	—	.085	.145	.097	.133
	# of coefficients	3	—	10	7	6	8
	# of samples	2	—	4	4	4	3
	Mean sample size (SD)	535 (0)	—	687 (754)	375 (336)	810 (910)	360 (222)
Teacher adaptive skills risk index	Intercept	.350*	—	−.481**	−.339*	.395	−.714
	(SE)	.069	—	.073	.087	.192	.322
	# of coefficients	3	—	6	4	3	4
	# of samples	1	—	3	3	2	2
	Mean sample size (SD)	73 (0)	—	623 (614)	900 (568)	375 (548)	441 (687)

Note. For the Teacher Adaptive Skills Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.

*p < .05. **p < .01.

Table 3.

Associations Between BESS-3 Self-Report Index Scores and Validity Criteria

Scale	Parameter	Academics	Executive functioning	Externalizing problems	Internalizing problems	Prosocial functioning	Global risk
Student BERI	Intercept	−.343*	.591**	.566	.815**	—	.659**
	(SE)	.126	.098	.279	.113	—	.176
	# of coefficients	13	17	6	13	—	9
	# of samples	4	5	3	5	—	4
	Mean sample size (SD)	234 (209)	143 (260)	218 (111)	353 (339)	—	127 (161)
Student internalizing risk index	Intercept	−.247	.566**	.350	.618**	—	.875**
	(SE)	.112	.059	.190	.089	—	.104
	# of coefficients	9	5	6	9	—	5
	# of samples	3	4	3	5	—	4
	Mean sample size (SD)	324 (189)	413 (376)	162 (74)	254 (326)	—	406 (384)
Student self-regulation risk index	Intercept	−.283*	.655	.395*	.477**	—	.638**
	(SE)	.112	.223	.142	.057	—	.100
	# of coefficients	9	2	7	12	—	5
	# of samples	3	2	3	4	—	4
	Mean sample size (SD)	324 (189)	452 (503)	149 (76)	348 (354)	—	406 (384)
Student personal adjustment risk index	Intercept	.245	−.399**	−.409	−.589**	—	−.787**
	(SE)	.124	.053	.424	.047	—	.083
	# of coefficients	9	5	5	8	—	5
	# of samples	3	4	2	4	—	4
	Mean sample size (SD)	324 (189)	413 (376)	210 (67)	276 (341)	—	406 (384)

Note. For the Student Personal Adjustment Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.

*p < .05. **p < .01.

Table 4.

Associations Between BESS-3 Parent Index Scores and Validity Criteria

Scale	Parameter	Academics	Executive functioning	Externalizing problems	Internalizing problems	Prosocial functioning	Global risk
Parent BERI	Intercept	−.155	.690	.581*	.442*	−.405	.559**
	(SE)	.031	.275	.165	.126	.242	.149
	# of coefficients	2	3	6	6	3	7
	# of samples	1	2	2	2	2	3
	Mean sample size (SD)	535 (0)	92 (70)	390 (226)	390 (226)	222 (271)	290 (233)
Parent externalizing risk index	Intercept	−.065	.548*	.520*	.560**	−.426**	.585**
	(SE)	.031	.188	.186	.152	.084	.109
	# of coefficients	2	6	6	14	8	7
	# of samples	1	4	3	6	5	2
	Mean sample size (SD)	535 (0)	692 (1,085)	469 (340)	601 (727)	810 (910)	360 (222)
Parent internalizing risk index	Intercept	−.040	.437	.561**	.357*	−.344*	.478**
	(SE)	.031	.208	.145	.111	.101	.119
	# of coefficients	2	6	12	8	8	7
	# of samples	1	4	6	4	5	2
	Mean sample size (SD)	535 (0)	692 (1,085)	687 (754)	374 (336)	810 (910)	360 (222)
Parent adaptive risk index	Intercept	-	−.583	−.573**	−.479**	.484	−.531*
	(SE)	-	.224	.115	.088	.287	.095
	# of coefficients	-	4	7	7	3	3
	# of samples	-	3	4	4	2	1
	Mean sample size (SD)	-	992 (1,265)	859 (966)	859 (966)	379 (544)	126 (63)

Note. For the Parent Adaptive Skills Risk Index, higher scores represent increased (i.e., more positive) behavioral skills while lower scores indicate increased levels of behavior problems.

*p < .05. **p < .01.

Meta-Analysis Results

Teacher BESS

Results of the validity generalization meta-analyses for the teacher BESS are presented in Table 2. The coefficient of the intercept is considered to be the overall effect size of the relationship between the BESS and outcomes. The reported intercepts for most of the aggregated outcomes were significantly different from zero (p < .05 or p < .01), with the following exceptions: Teacher Externalizing Risk Index correlated with Academics, Teacher Internalizing Risk Index correlated with Academics, and Teacher Adaptive Skills Risk Index correlated with Prosocial Functioning and Global Risk. Generally speaking, the teacher BESS scores were mostly moderately correlated with the aggregated outcome measures when compared with Cohen’s criteria (1,988). Notably, the Teacher BERI score was strongly correlated with the Global Risk variable, which is consistent with the intent of the BERI as an indicator of overall behavioral and emotional risk.

Student BESS

Results of the validity generalization meta-analyses for the student BESS are presented in Table 3. The reported intercepts for most of the aggregated outcomes were significantly different from zero (p < .05 or p < .01), with the following exceptions: Student BERI correlated with Externalizing Problems, Student Internalizing Risk Index correlated with Academics and Externalizing Problems, Student Self-Regulation Risk Index correlated with Executive Functioning, and Student Personal Adjustment Risk Index correlated with Academics and Externalizing Problems. These results suggest that scores on the BESS self-report scales were not as effective at predicting externalizing problems when compared to the BESS teacher- and parent-report scales. On the other hand, all four of the student BESS index scores were significantly correlated with the Global Risk variable, suggesting the utility of these scales in predicting overall risk.

Parent BESS

Results of the validity generalization meta-analyses for the parent BESS are presented in Table 4. These results are more mixed than what was observed for the teacher and student BESS scores, with 8 out of 23 correlation coefficients failing to reach statistical significance. Notably, none of the parent BESS index scores were significantly correlated with scores on the aggregated Academic variable, and only the Parent Externalizing Risk Index was significantly correlated with the Executive Functioning variable. On a positive note, all four of the parent BESS index scores were significantly correlated with the Global Risk aggregate variable, suggesting that the parent form is most appropriately used as an indicator of overall risk.

Discussion

The purpose of this study was to use validity generalization meta-analyses to assess the extent to which BESS scores, indicative of level of emotional and behavioral risk as indicated by a broadband index (i.e., BERI) and narrowband indexes (i.e., Externalizing Risk Index, Internalizing Risk Index, Adaptive Skills Risk Index, Self-Regulation Risk Index, and Personal Adjustment Risk Index), predicted alternative measures and outcomes such as student academic performance, social and behavioral functioning, and ratings from other measures. Given the substantial contribution of psychometrically sound and efficient measures to the screening process—and the BESS’s status as the most widely used screening tool in schools (Benson et al., 2019)—results have important implications for contextually appropriate use of the BESS, decisions made based on BESS scores, and providing evidence of convergent validity of BESS scores. As we describe below, our findings support the general validity of the BESS, but our findings must be understood within the general limits of our findings. Specifically, the effect sizes we present (see Tables 2 –4) are generally based on a small number of studies and samples. Consequently, our ability to make nuanced statements about potential moderators is precluded, and some studies with very large sample sizes (see Table 1) have had an outsized influence on the generalized correlation coefficients. Thus, we present a broad discussion of our findings.

The 16 studies included in our analyses reported correlation coefficients between BESS scores and a wide range of criterion variables, including scores on other rating scales of emotional/behavior functioning in addition to other relevant outcomes such as office discipline referrals, suspensions, attendance, and grade point average. In general, the resulting coefficients were in the expected direction given the constructs measured by the BESS scales (e.g., the BESS Externalizing and Internalizing Risk Indexes were positively and significantly correlated with the aggregate Externalizing Problems and Internalizing Problems variables, while the Teacher and Parent Adaptive Skills Risk Indexes were negatively associated with problem behaviors). This general finding is expected, as screeners are more likely to be closely aligned with measures of similar constructs than they are with measures of distinct constructs (e.g., Allen et al., 2019; Goerdt et al., 2025). Perhaps the most notable finding is that across all three forms of the BESS (i.e., teacher, student, and parent), the BERI—representing a broadband index—consistently had amongst the highest correlations for different criterion areas. The results indicate that the BESS can be confidently used as a screening instrument for early detection and follow-up planning, particularly if decision-making is based on BERI scores—as opposed to the narrowband indexes that the BESS also provides—as the first step in the screening process. Additionally, across all three forms of the BESS (i.e., teacher, student, and parent), the lowest correlations were between the BESS scores and academic variables. This pattern suggests that the BESS should not be used as an academic screener; instead, measures focusing on constructs other than behavioral and emotional risk (e.g., grades on assignments and performance on curriculum-based assessments) are more appropriate when screening for academic problems or risk of academic delays. Similar findings have been reported in single sample studies (Dowdy et al., 2016) and are confirmed in our meta-analyses and other analyses using a variety of screeners (Allen et al., 2019; Kilgus et al., 2018).

With regard to the narrowband index results, our findings suggest that practitioners might find these useful for specific purposes. In particular, scores on the BESS Internalizing Risk Indexes were significantly correlated with the Internalizing Problems variables across teacher, parent, and self-report forms. Similarly, scores on the BESS Externalizing Risk Indexes (which appear only in the teacher and parent forms) were significantly correlated with behaviors in the Externalizing Problems category. Thus, although the results of this study provide the strongest support for using the BERI as an overall risk indicator, results also support using the narrowband indexes when specifically screening for internalizing and externalizing problems. Since the BESS Internalizing and Externalizing Risk Indexes were also consistently correlated with the Global Risk variable, these scales might be useful in screening for global distress as well.

Looking at the big picture, correlation coefficients were fairly similar across the three different forms of the BESS, but fewer statistically significant effect sizes were found for the parent BESS form as compared to the teacher and student BESS forms. As the parent BESS form provided more mixed evidence—as well as the fewest number of correlation coefficients available for analysis—there is room for further research on the implications of screening results based on parent informants. Consequently, at this time we recommend that schools prioritize administration of the teacher and student forms of the BESS, and that if having teachers complete the BESS for all students in their class(es) is not possible because of the lack of time or perhaps lack of knowledge regarding a particular student, having students complete the BESS seems to be a viable alternative given the statistically significant correlation coefficients between their ratings and criterion variables (with a few notable exceptions). That said, although the format of the BESS and other multi-informant measures might suggest that results could generalize across informants, it would be inappropriate to evaluate the results of screeners in this way, especially if the results of narrowband indexes—as opposed to broadband indexes such as the BERI—will be used in decision-making. Rather, it is important to carefully consider which informant is most appropriate for the specific screening purposes of a study or school-based implementation of screening, as each informant will provide a unique and possibly discrepant perspective which may be more relevant and informative depending on the context, constructs, and variables under investigation (Graybill et al., 2025; Zakszeski et al., 2025).

Limitations

These results must be interpreted within the context of several limitations. As previously noted, the current analyses were founded upon a relatively small number of studies, which yielded limited validity coefficients. The relatively small number of studies also influenced our analyses in that we derived estimates regarding the validity of the BESS in predicting aggregate outcomes rather than more narrowly defined outcomes that could have provided more specific implications for practice. Furthermore, the limited amount of participant data reported in most studies limited our ability to meaningfully evaluate moderator variables to determine whether these particular sample features influenced the results. This represents a constraint on generality, as for this study we did not consider individual participants characteristics such as sex/gender, race, or ethnicity.

These limitations reflect the characteristics of the available literature on the BESS. The potential for future meta-analyses to include a larger number of studies, analyze more precisely or narrowly defined criterion variables, and evaluate the influence of relevant moderators ultimately will depend upon the time necessary for the field to develop a larger sample of studies from which to draw evidence. Based on our review, it is also particularly important for future studies of behavioral and emotional risk screeners to include parent report data. Having this data will allow researchers to more meaningfully evaluate the utility of parent screeners, and it will provide information that is particularly helpful to professionals using screeners outside of the school settings, in which it is much less likely that teacher screening data is available. It will also be important to continue to use measures with large bodies of accumulated literature, such as the BESS and SRSS (Kilgus et al., 2018), to continue to examine general screening issues, such as the utility of using broadband vs. narrowband screeners for both identification of students at risk, as well as subsequent implementation of interventions (Peterson & Villarreal, 2024).

Given the focus on validity coefficients in the current study, it will also be useful for future research to examine other important sources of validity evidence such as classification accuracy statistics (i.e., sensitivity, specificity, positive predictive value, and negative predictive value; AERA et al., 2014). This will allow for a more complete understanding of how the BESS (and similar screeners) can inform screening decisions. Lastly, we wish to acknowledge that the BASC-4 BESS is scheduled for publication in 2026, which may seem to limit the utility of these results based on the BASC-3 BESS. However, as described by Cronje et al. (2022), revised versions of tests are frequently compared to (and often correlated with) the previous version, in order to understand similarities/differences from one edition to the next. Thus, knowledge of the BASC-3 BESS validity may help researchers and practitioners interpret correlations between BASC-3 BESS scores and BASC-4 BESS scores. Our findings will also inform any research that is yet to be conducted or published based on the BESS-3, which we expect to see for many years even after the BESS-4 is available. Further, if we wanted to compare validity evidence of the BASC-3 BESS with validity evidence of the BASC-4 BESS, the current study would be helpful in that regard by aggregating predictive validity coefficients using the BASC-3 BESS. Of course, this is only one source of support for using the BESS, and once the BESS-4 is published it will need to be comprehensively evaluated based on changes in norms, items, and constructs measured (Butcher, 2000).

Footnotes

ORCID iDs

Victor Villarreal

Laura M. Peña

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

*Indicates studies included in the meta-analyses.

Allen

Kilgus

S. P.

Burns

M. K.

Hodgson

(2019). Surveillance of internalizing behaviors: A reliability and validity generalization study of universal screening evidence. School Mental Health, 11(2), 194–209. https://doi.org/10.1007/s12310-018-9290-3

* Alperin

Dudek

C. M.

Reddy

L. A.

Glover

T. A.

Wiggs

N. B.

Bronstein

(2023). Convergent validity of the behavior observation of students in schools for elementary school students with disruptive behaviors. Psychology in the Schools, 60(10), 4039–4060. https://doi.org/10.1002/pits.22983

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (2014). Standards for educational and psychological testing. American Educational Research Association.

Basting

E. J.

Naser

Goncy

E. A.

(2022). Assessing the factor structure and measurement invariance of the BASC-3 behavioral and emotional screening system student form across race/ethnicity and gender. Assessment for Effective Intervention, 48(1), 43–51. https://doi.org/10.1177/15345084221095440

Benson

N. F.

Floyd

R. G.

Kranzler

J. H.

Eckert

T. L.

Fefer

S. A.

Morgan

G. B.

(2019). Test use and assessment practices of school psychologists in the United States: Findings from the 2017 National Survey. Journal of School Psychology, 72, 29–48. https://doi.org/10.1016/j.jsp.2018.12.004

Beretvas

S. N.

Pastor

D. A.

(2003). Using mixed-effects models in reliability generalization studies. Educational & Psychological Measurement, 63(1), 75–95. https://doi.org/10.1177/0013164402239318

Butcher

J. N.

(2000). Revising psychological tests: Lessons learned from the revision of the MMPI. Psychological Assessment, 12(3), 263–271. https://doi.org/10.1037/1040-3590.12.3.263

Cohen

(1988). Statistical power analysis for the behavioral sciences. Routledge.

10.

Cronje

J. H.

Watson

M. B.

Stroud

L. A.

(2022). Guidelines for the revision and use of revised psychological tests: A systematic review study. Europe’s Journal of Psychology, 18(3), 293–301. https://doi.org/10.5964/ejop.2901

11.

* DeBoer

J. L.

Long

A. C. J.

(2024). A comparison of self-report measures to screen for mental health concerns in youth. School Mental Health, 16(1), 25–40. https://doi.org/10.1007/s12310-023-09615-9

12.

Deeks

J. J.

Higgins

J. P. T.

Altman

D. G.

(2022). Chapter 10: Analysing data and undertaking meta-analyses. In Higgins

J. P. T.

Thomas

Chandler

Cumpston

Page

M. J.

Welch

V. A.

(Eds.), Cochrane handbook for systematic reviews of interventions, version 6.3. Cochrane. (Updated February 2022). www.training.cochrane.org/handbook

13.

* Dever

B. V.

Gaier

(2021). Psychometric properties of the behavior assessment system for children-3 behavioral and emotional screening system student-report form among a predominantly Latinx elementary school sample. Journal of Psychoeducational Assessment, 39(1), 128–133. https://doi.org/10.1177/0734282920951065

14.

* DiStefano

Greer

Shi

Dowdy

(2020). Examination of method effects with a social-emotional screening instrument across parent and teacher raters. Psychological Test and Assessment Modeling, 62(3), 359–374.

15.

Dowdy

Dever

B. V.

Raines

T. C.

Moffa

(2016). A preliminary investigation into the added values of multiple gates and informants in universal screening for behavioral and emotional risk. Journal of Applied School Psychology, 32(2), 178–198. https://doi.org/10.1080/15377903.2016.1165327

16.

* Dowdy

DiStefano

Greer

Moore

Pompey

(2019). Examining the latent structure of the BASC-3 BESS parent preschool form. Journal of Psychoeducational Assessment, 37(2), 181–193. https://doi.org/10.1177/0734282917739109

17.

Dvorsky

Girio-Herrera

Owens

J. S.

(2014). School-based screening for mental health in early childhood. In Weist

M. D.

(Ed.), Handbook of school mental health (pp. 297–310). Springer. https://doi.org/10.1007/978-1-4614-7624-5

18.

* Edmunds

S. R.

Jones

Braverman

Fogler

Rowland

Faja

S. K.

(2023). Irritability as a transdiagnostic risk factor for functional impairment in autistic and non-autistic toddlers and preschoolers. Research on Child and Adolescent Psychopathology, 52(4), 551–565. https://doi.org/10.1007/s10802-023-01150-0

19.

* Eklund

Kilgus

S. P.

Willenbrink

J. B.

Collins

Gill

Weist

M. M.

Porter

Lewis

T. J.

Mitchell

Wills

(2022). Evidence of the internal structure and measurement invariance of the BASC-3 behavioral and emotional screening system teacher form. Journal of Psychoeducational Assessment, 40(8), 936–949. https://doi.org/10.1177/07342829221116807

20.

* Fletcher-Janzen

Harrington

(2020). Translating ACE research into multi-tiered systems of supports for at-risk high-school students. Journal of Pediatric Neuropsychology, 7(3), 89–101. https://doi.org/10.1007/s40817-020-00093-4

21.

Glover

T. A.

Albers

C. A.

(2007). Considerations for evaluating universal screening assessments. Journal of School Psychology, 45(2), 117–135. https://doi.org/10.1016/j.jsp.2006.05.005

22.

Goerdt

Miller

F. G.

Dupuis

Rausch

(2025). Validity evidence of the social, academic, and emotional behavior risk screener-teacher rating scale: A systematic review & quantitative synthesis. Journal of School Psychology, 110, 1–19. https://doi.org/10.1016/j.jsp.2025.101449

23.

Graybill

Lewis

Anghel

Awan

Barger

Salmon

(2025). Universal behavior screening and early warning system indicators in middle schools. Psychology in the Schools, 62(9), 2955–2968. https://doi.org/10.1002/pits.23515

24.

* Hanno

E. C.

Cuartas

Miratrix

L. W.

Jones

S. M.

Lesaux

N. K.

(2022). Changes in children’s behavioral health and family well-being during the COVID-19 pandemic. Journal of Developmental & Behavioral Pediatrics, 43(3), 168–175. https://doi.org/10.1097/dbp.0000000000001010

25.

* Hood

(2018). Biological interventions to improve cognition in children with sickle cell disease [Doctoral dissertation, Washington University in St. Louis]. UCL discovery. https://discovery.ucl.ac.uk/id/eprint/10122971/

26.

* Hood

A. M.

Reife

King

A. A.

White

D. A.

(2019). Brief screening measures identify risk for psychological difficulties among children with sickle cell disease. Journal of Clinical Psychology in Medical Settings, 27(4), 651–661. https://doi.org/10.1007/s10880-019-09654-y

27.

* Ijaz

Rohail

Irfan

(2024). School-based intervention for anxiety using group cognitive behavior therapy in Pakistan: A feasibility randomized controlled trial. Psicologia Reflexão E Crítica, 37(1), 1–10. https://doi.org/10.1186/s41155-024-00311-4

28.

* Kamphaus

R. W.

Reynolds

C. R.

(2015). Behavior Assessment System for Children—Third Edition (BASC-3): Behavioral and Emotional Screening System (BESS). Pearson.

29.

Kilgus

S. P.

Eklund

Maggin

D. M.

Taylor

C. N.

Allen

A. N.

(2018). The student risk screening scale: A reliability and validity generalization meta-analysis. Journal of Emotional and Behavioral Disorders, 26(3), 143–155. https://doi.org/10.1177/1063426617710207

30.

* Moore

S. A.

Dowdy

Fleury

DiStefano

Greer

F. W.

(2021). Comparing informants for mental health screening at the preschool level. School Psychology Review, 51(5), 589–608. https://doi.org/10.1080/2372966x.2020.1841546

31.

* Naser

S. C.

Dever

B. V.

(2019). A preliminary investigation of the reliability and validity of the BESS-3 teacher and student forms. Journal of Psychoeducational Assessment, 38(2), 263–269. https://doi.org/10.1177/0734282919837825

32.

Palmer

Kane

Patterson

Tuomainen

(2025). Universal mental health screening in schools: How acceptable is this to key stakeholders? A systematic review. Journal of Child and Family Studies, 34(2), 366–380. https://doi.org/10.1007/s10826-025-03007-0

33.

Peterson

L. S.

Villarreal

(2024). Ethical considerations in school‐based mental health screening and service provision - A commentary. Journal of School Health, 94(12), 1196–1199. https://doi.org/10.1111/josh.13520

34.

Raudenbush

S. W.

(2009). Analyzing effect sizes: Random-effects models. In Cooper

Hedges

L. V.

Valentine

J. C.

(Eds.), The handbook of research synthesis and meta-analysis (pp. 295–315). Russell Sage Foundation.

35.

R Core Team . (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.rproject.org/

36.

Reddy

L. A.

Glover

T. A.

Dudek

C. M.

Alperin

Wiggs

N. B.

Bronstein

(2022). A randomized trial examining the effects of paraprofessional behavior support coaching for elementary students with disruptive behavior disorders: Paraprofessional and student outcomes. Journal of School Psychology, 92, 227–245. https://doi.org/10.1016/j.jsp.2022.04.002

37.

Sanders

Lane

J. J.

Losinski

Nelson

Asiri

Holloway

S. M.

Rogers

(2019). An implementation of a computerized cognitive behavioral treatment program to address student mental health needs: A pilot study in an after-school program. Professional School Counseling, 22(1), 1–9. https://doi.org/10.1177/2156759X19838462

38.

Shultz

K. S.

Whitney

D. J.

(2005). Measurement theory in action: Case studies and exercises. Sage.

39.

Sitarenios

(2022). Short versions of tests: Best practices and potential pitfalls. Journal of Pediatric Neuropsychology, 8(3), 101–115. https://doi.org/10.1007/s40817-022-00126-0

40.

Sullivan

J. R.

Villarreal

Flores

Gomez

Warren

(2021). SSIS performance screening guide as an indicator of behavior and academics: A meta-analysis. Assessment for Effective Intervention, 46(3), 228–237. https://doi.org/10.1177/1534508420926584

41.

* Sutton

(2023). Social, emotional, and behavioral functioning among 5th grade students: An evaluation of a pilot screening assessment protocol [Legacy theses & dissertations, Doctoral dissertation, University at Albany]. https://doi.org/10.54014/ma7q-67dh

42.

U.S. Preventive Services Task Force . (2009). Screening and treatment for major depressive disorder in children and adolescents: US preventive services task force recommendation statement. Pediatrics, 123(4), 1223–1228. https://doi.org/10.1542/peds.2008-2381

43.

Viechtbauer

(2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03

44.

Villarreal

Peterson

L. S.

(2025). Mental health screening: Recommendations from an integrated literature review. Contemporary School Psychology, 29(1), 250–260. https://doi.org/10.1007/s40688-024-00501-y

45.

Zakszeski

B. N.

Ormiston

H. E.

Nygaard

M. A.

Carlock

(2025). Informant discrepancies in universal screening as a function of student and teacher characteristics. School Psychology Review, 54(1), 128–142. https://doi.org/10.1080/2372966X.2023.2262362