Invariance and Construct Validity of HiTOP Dimensions Across Race and Ethnicity in the Adolescent Brain and Cognitive Development (ABCD) Study

Abstract

The Hierarchical Taxonomy of Psychopathology (HiTOP) has gained significant traction in clinical psychological science. However, HiTOP has not been extensively validated across diverse populations. This study tested measurement invariance—the degree to which latent constructs are measured with equivalence across groups—in HiTOP across racial and ethnic groups using the Child Behavior Checklist (CBCL) in the Adolescent Brain Cognitive Development (ABCD) Study. These models were followed with rigorous tests of construct validation (i.e., convergent, discriminant, and concurrent) on the latent factors using a Multitrait-Multimethod (MTMM) framework. Comparing across non-Hispanic White (n = 7,166), Hispanic (n = 2,411), and non-Hispanic Black (n = 1,862) youths, the five-factor model comprising Externalizing, Neurodevelopmental, Internalizing, Somatoform, and Detachment factors demonstrated configural, metric, scalar, and strict measurement invariance. While each of the five factors demonstrated good evidence of concurrent and convergent validity, evidence for their discriminant validity was not as robust. Establishing measurement invariance and construct validity of the HiTOP model has critical scientific and clinical implications, particularly if dimensions are to be used in addressing mental health disparities in minoritized populations.

Keywords

HiTOP race and ethnicity measurement invariance validity children

The U.S. population is becoming increasingly racially and ethnically diverse. According to the 2020 U.S. Census, Hispanic or Latino and Black populations rose by 23% and 5.6% since 2010, now making up 18.7% and 12.4% of the total U.S. population, respectively (U.S. Census, 2020). Concurrently, rates of mental disorders continue to rise (Bitsko et al., 2018), disproportionately among racial and ethnic minority youths (Whitney & Peterson, 2019). These populations face unique challenges, including higher rates of co-occurring mental health conditions (Ahmed & Conway, 2020; Weller et al., 2018) and lower utilization of mental health services compared to non-Hispanic White populations (Merikangas et al., 2011; Wang et al., 2005). These issues are compounded by a relative scarcity of research that focuses on racial and ethnic minority mental health more generally (see review by Buchanan et al., 2021). For example, Rodriguez-Seijas, Li, and colleagues (2023) reviewed 543 articles published between 2013 and 2020 in a prominent clinical psychological science journal and found that only 23 (4.2%) focused on issues related to race or ethnicity. Research on the intersection of mental health and race and ethnicity is paramount considering the growing mental health needs among rapidly changing racial and ethnic demographics in the United States.

The Hierarchical Taxonomy of Psychopathology (HiTOP; Kotov et al., 2017, 2022) is an empirically driven framework that has gained significant traction in the clinical psychological sciences. It addresses many of the limitations associated with the Diagnostic and Statistical Manual of Mental Disorders (DSM; American Psychiatric Association, 2013), including that the diagnostic criteria often rely on arbitrary symptom thresholds, resulting in clinical samples with heterogeneous clinical presentations and excessively high rates of co-occurrence among disorders. Unlike the DSM, HiTOP empirically quantifies the covariation among symptoms, whereby higher-order dimensions subsume correlated lower-order dimensions (Kotov et al., 2017). Despite its promising utility for clinical science and practice (Kotov et al., 2022), there are some major shortcomings of the current model. First, there is a relative dearth of HiTOP research focused on child populations when compared to the amount of research conducted in primarily adult populations (Forbes et al., 2024; Michelini et al., 2024; Ringwald et al., 2023). This knowledge gap is further reflected in the fact that the new measure of HiTOP under development (Simms et al., 2022) is not designed to be applicable to youths (Tackett & Hallquist, 2022). Second, and following from the first shortcoming, whether HiTOP is generalizable across racially and ethnically diverse groups is also understudied, as most of the existing HiTOP research relies on samples comprised of predominantly White populations (see review by Rodriguez-Seijas, Li, et al., 2023). These growing scientific gaps are concerning because they clearly affect the equitable dissemination and implementation of HiTOP research into practice (Cicero et al., 2024).

Given the lack of an HiTOP-specific instrument to assess dimensions in youths, researchers often use the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2004) as a proxy measure due to its strong validity, high degree of reliability, as well as breadth of coverage with respect to assessing behavior, cognition, and temperament (Conway et al., 2023; Forbes et al., 2024; Moore et al., 2020). Indeed, prior research suggests that the factor structure of the CBCL maps onto most of the dimensions of the HiTOP model. Michelini and colleagues (2019) used baseline (9- to 10-year-olds, N = 9,987) data from the Adolescent Brain Cognitive Development (ABCD) Study^® to show that items from the CBCL mapped onto five factors that were comparable to the HiTOP dimensions: Internalizing, Externalizing, Neurodevelopment, Somatoform, and Detachment. Specifically, the Internalizing dimension consisted of symptoms such as feeling sad, feeling fearful or anxious, and being easily embarrassed. The Externalizing dimension consisted of symptoms such as impulsive actions, destroying things, and being disobedient. The Neurodevelopment dimension consisted of symptoms such as strange behaviors, concentration problems, and failing to finish tasks. The Somatoform dimension consisted of symptoms such as being overtired without good reason, feeling dizzy or lightheaded, and feeling pain or aches. The Detachment dimension consisted of symptoms such as socially withdrawn, underactive, and secretive. Their study is noteworthy because it was among the first to use a large and well-characterized child study population to identify the HiTOP structure. However, ABCD is a racially and ethnically diverse sample, and the authors did not assess whether the HiTOP dimensions they identified exhibited meaningful measurement invariance across racial and ethnic subgroups.

Measurement invariance is a fundamental aspect of establishing validity (Cicero & Ruggero, 2021; He & Li, 2021). Testing for measurement invariance typically involves identifying differences in the factor structure, item loadings, item intercepts/thresholds, and error variances across groups. When measurement invariance is present, the measure can be assumed to reflect the same underlying construct or phenomenon across these groups. Failure to identify measurement invariance may lead to systematic measurement errors or biases by way of confounding observed group differences in the mean level of dimensions and their associations with other constructs (Borsboom, 2006; Fischer & Karl, 2019; Hillemeier et al., 2007). One example of this issue comes from a study examining the structure of the widely used Personality Inventory for DSM-5 (PID-5). Bagby et al., (2022) conducted tests of measurement invariance across White and Black Americans enrolled in college. While a five-factor solution was identified in the White American group, only a single-factor solution was identified in the Black American group, leading the authors to conclude that the PID-5 is not factorially invariant across Black and White Americans. Without establishing invariance of measurement scales or models, resulting observed group differences across race may be confounded by measurement issues (Kim & Yoon, 2011).

The literature on measurement invariance in HiTOP with respect to youths and across racial and ethnic subgroups is still nascent, but there is emerging evidence that the hierarchical structure of psychopathology exhibits invariance across race and ethnicity. However, these studies all feature some noteworthy limitations that warrant cautious interpretation. Hoffmann et al. (2021) tested 11 factor models from the CBCL in a cohort of youths and young adults (N=7,011, ages 5–22). They found invariance of all structural models in their diverse sample. However, the authors combined all individuals who identified as either of Asian, Black, Mixed, Native American, or other into a single “non-White” category, thereby assuming homogeneity among individuals of non-White racial and ethnic backgrounds. Another study conducted confirmatory factor analysis (CFA) of items across the eight-syndrome scales of the CBCL in 30 different societies (i.e., countries and regions; N = 61,703) and found that evidence of configural invariance of an eight-factor model across all societies (Ivanova et al., 2019). However, the researchers did not provide robust support for scalar invariance across societies (i.e., invariance of item loadings and thresholds), which may not be surprising when considering that some of the societies in their study were quite a bit more racially and ethnically heterogeneous (e.g., the United States) than others (e.g., Japan). He and Li (2021) tested for measurement invariance in the latent factor structure of 15 mental disorders ascertained by a computerized, semi-structured interview in a large sample of White (n = 5,147) and Black/African American (n = 3,088) youths and young adults aged 8–21 years (N = 8,235) in the Philadelphia Neurodevelopmental Cohort. They found evidence for configural invariance when comparing the bifactor model across White and Black/African American groups. However, this study utilized DSM-informed diagnoses as observed indicators in their models, which is a crucial limitation given the aforementioned shortcomings of the DSM (e.g., arbitrary symptom thresholds, heterogeneity of clinical presentations, excessive comorbidity among disorders, etc.).

Finally, Stewart and colleagues (2024) conducted tests of measurement invariance in non-Hispanic White, non-Hispanic Black/African American, non-Hispanic biracial White and Black/African American, and Hispanic White subgroups using items from the CBCL in the same ABCD Study^® 9 to 11-year-old cohort as reported in Michelini et al. (2019). They found that the five-factor model that was identified in Michelini et al. (2019), comprising Externalizing, Detachment, Neurodevelopment, Internalizing, and Somatoform latent dimensions, demonstrated full measurement invariance (i.e., metric, scalar, residual, and latent variances) across the four racial and ethnic subgroups they studied. Furthermore, their study reported on several latent mean differences between racial and ethnic subgroups (e.g., non-Hispanic White and Hispanic White participants had lower latent mean scores on Externalizing than compared to Black/African American and biracial participants). However, an important limitation was that the latent CBCL dimensions for each racial and ethnic group were not validated (e.g., convergent, discriminant, and/or concurrent), leaving open the important question of whether HiTOP dimensions reflect conceptually meaningful indicators of risks across the racial and ethnic groups.

Crucially, testing for measurement invariance is only one step, albeit an important one, in validating constructs for cross-group comparisons (Cicero & Ruggero, 2021; Meredith, 1993; Vandenberg & Lance, 2000). Establishing convergent/discriminant validity and concurrent validity ensures that the construct not only operates similarly across groups but also meaningfully relates to theoretically relevant variables and remains distinct from unrelated constructs, thereby supporting its utility in clinical science research (Campbell & Fiske, 1959; Cronbach & Meehl, 1955). Thus, the overarching aims of our study are to rigorously (a) test for measurement invariance (i.e., configural, metric, scalar, and strict invariance) and (b) validate the resultant HiTOP models and dimensions using multiple methods in ABCD Study^® data across three of the largest racial and ethnic subgroups in the data set: Hispanic, non-Hispanic White, and non-Hispanic Black participants. With respect to the first aim, we expect to replicate the hierarchical structure of HiTOP as previously identified from the CBCL (Michelini et al., 2019). We also expect to replicate the finding of full measurement invariance across the racial and ethnic groups (Stewart et al., 2024). Because of the dearth of literature as it pertains to HiTOP measurement invariance across race and ethnicity in youths, we focused our analysis on the baseline wave of data when participants were 9 to 10 years.

For our second aim, we tested the convergent/discriminant and concurrent validity of our resultant HiTOP models across racial and ethnic groups using the Multitrait-Multimethod (MTMM) framework (Campbell & Fiske, 1959; Cronbach & Meehl, 1955). Convergent validity is supported when a construct correlates with other validated measures of the same construct, including measurements assessed by different informants (Campbell & Fiske, 1959; De Los Reyes et al., 2015). Discriminant validity refers to the degree to which a construct is not strongly related to measures of different or putatively unrelated constructs (Campbell & Fiske, 1959). Our MTMM matrix is comprised of three methods (and different informants) to assess for the convergent and discriminant validity of our CBCL latent factors across race and ethnicity: caregiver ratings of eight psychiatric disorders from a semi-structured diagnostic interview (i.e., the Kiddie-Schedule for Affective Disorders and Schizophrenia for Diagnostic and Statistical Manual Disorder [K-SADS], and teacher and youth self report ratings for Internalizing, Externalizing, and Attention Problems factors from the Brief Problem Monitor [BPM]) (Achenbach et al., 2011). In line with expectations of convergent validity, we expect to observe robust correlations for the same/similar trait across different methods (i.e., “monotrait-heteromethod,” e.g., CBCL Externalizing vs. BPM Teacher and Youth Self Reports Externalizing and K-SADS oppositional defiant and conduct disorders). In line with discriminant validity, we expect that correlations between different traits measured with the same method (“heterotrait-monomethod,” e.g., correlation between CBCL Externalizing vs. Internalizing) to be lower than correlations observed between the same/similar trait measured with different methods (i.e., “monotrait-heteromethod,” e.g., CBCL Externalizing vs. BPM Teacher and Youth Self Reports Externalizing and K-SADS oppositional defiant and conduct disorder). In addition, we expect that correlations between different traits and different methods will be generally low (i.e., “heterotrait-heteromethod,” e.g., CBCL Externalizing vs. BPM Teacher and Youth Self Reports Internalizing and K-SADS major depressive disorder).

Finally, concurrent validity is supported when a construct correlates with theoretically related constructs from measures administered at the same time as the focal construct (Cronbach & Meehl, 1955). To test the concurrent validity of our latent constructs, we examined the correlations of CBCL latent factors with concurrently measured neurocognitive outcomes via the NIH Toolbox. There is well-established evidence of dysfunction across various aspects of neurocognition (e.g., intelligence quotient [IQ]) across each HiTOP dimension (Kotov et al., 2020; Krueger et al., 2021; Watson, Levin-Aspenson, et al., 2022). Some researchers have even posited that neurocognitive dysfunction, despite its clear cross-loadings across the HiTOP spectra, may itself be a unique construct within the HiTOP model (see meta-analysis by Ringwald et al., 2024). As such, we expect to observe negative correlations of the CBCL latent factors with Crystalized and Fluid Cognition scores from the NIH Toolbox across racial and ethnic groups.

Method

Sample

We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. The ABCD Study^® is a prospective longitudinal study of 9- to 10-year-old youths from 21 study sites across the United States. Participants were recruited from schools, selected based on age, gender, race and ethnicity, sociodemographic status, and urbanicity. The sample was recruited to closely reflect the socio-demographics of the United States population. An extensive array of data was collected, including brain imaging, genetics, neurocognitive outcomes, mental health, and family, community, and environment. The current release of the study contains the full data from baseline, 6-month, 12-month, 18-month, 2-year, 30-month, and 3-year follow-up as well as interim data from 42-month visits and 4-year follow-up. The present study utilized data from 9,456 youths collected at baseline. Based on parent report, the assigned sex was female for 47.59%, 47.30%, and 49.30% among the Hispanic, non-Hispanic White, and non-Hispanic Black groups, respectively (see Classification of Race and Ethnicity in ABCD for more details). The ABCD Study^® is publicly available to researchers upon signing a research use agreement. Thus, this study was exempt from requiring approval from the Institutional Review Board of the University of Wisconsin-Madison.

Classification of Race and Ethnicity in ABCD

The ABCD Study^® team categorizes individuals into one of five groups based on the parent and/or caregiver report of the youth’s race and ethnicity during the baseline visit: Hispanic (i.e., Hispanic, Latino, Latina), White, Black, Asian, and Other. Respondents could only select among White, Black, and Asian if Hispanic was not also selected. The Other category consisted of individuals who (a) selected Other and did not select Hispanic or (b) selected more than one race (i.e., White, Black, Asian) and did not select Hispanic. Using this scoring algorithm, the analytic sample in the current study consisted of 2,015 Hispanic youths, 6,021 non-Hispanic White youths, and 1,420 non-Hispanic Black youths (see Table 1 and Supplemental Table S1 for descriptives on these subsamples). Individuals in the Asian group were not included in the analysis due to their comparatively smaller sample size and corresponding concerns about statistical power and analytic convergence of the models (Asian n = 252). In addition, individuals in the Other group (n = 1,247) were also not included in the study due to their ambiguous racial and ethnic composition, heterogeneity, and consequently, unclear interpretation from a results standpoint. We acknowledge that race and ethnicity are complex social constructs, sometimes even considered as proxies for sociodemographic differences between groups (Brown et al., 2013; Cénat et al., 2024). Race and ethnicity also intersect with other social identities and processes such as socioeconomic status and gender, which together form cultural identities that are multidimensional and dynamic (Brown et al., 2013; Causadias & Cicchetti, 2018). Thus, we acknowledge that this approach to classification may not capture the full complexity of this construct.

Table 1.

Descriptives.

Variable	Hispanic	Non-Hispanic White	Non-Hispanic Black
Sample size	2,015	6,021	1,420
Female %	47.59%	47.30%	49.30%
Age in months (SE)	118.48 (7.56)	119.19 (7.54)	118.99 (7.27)
Parent mean age (SE)	38.52 (6.77)	41.33 (6.04)	37.15 (7.57)
Parent gender female %	90.52%	88.04%	92.75%
Parent with a partner %	79.76%	89.48%	54.76%
Parental Education (SE)	5.12 (2.19)	6.83 (1.44)	5.13 (1.91)
Parental Income (SE)	4.51 (2.45)	5.27 (2.81)	4.21 (2.30)
Family Combined Income (SE)	6.26 (2.39)	8.21 (1.64)	5.15 (2.65)
Family Financial Hardship (SE)	.54 (1.10)	.24 (0.82)	.99 (1.47)
Medical Service Utilization %	54.79%	58.98%	55.70%

Note. Please see Supplemental Table S1 for more details on the operationalization of parental education, parental income, family combined income, family financial hardship, and medical service utilization.

Measures

CBCL

The CBCL is a standardized assessment measure developed as part of the Achenbach System of Empirically Based Assessment (ASEBA) (Achenbach, 1999). It assesses behavioral and emotional traits in children ranging from affective problems, anxiety, somatic problems, attention-deficit hyperactivity, opposition defiance, and conduct problems. Parents and/or caregivers are asked to rate the frequency of specific behaviors or problems exhibited by the child, using 113 three-point Likert-type scale items. Example items include “acts too young for his or her age” and “complains of loneliness,” to which parents or caregivers may respond with “not true” (coded as 0), “somewhat or sometimes true” (coded as 1), and “very true or often true” (coded as 2) (please also see the Supplemental Materials for additional details).

BPM—Teacher Report Form

The BPM—Teacher Report Form is a 19-item questionnaire for monitoring children’s mental health and functioning as rated by the teacher (Achenbach et al., 2011). It assesses symptoms and behaviors related to Internalizing, Externalizing, and Attention Problems using items worded similarly to the CBCL for Ages 6–18 years, the Teacher’s Report Form, and Youth Self Report (Achenbach et al., 2011). We used the BPM—Teacher Report Form at baseline (n = 3,634) in the validation analyses. Internal consistency for the Internalizing, Externalizing, and Attention Problems dimensions was excellent (Cronbach’s alphas = .84, .89, .89 respectively).

BPM—Youth Self Report

The BPM—Youth Self Report was administered to youth participants at the 6-month follow-up. Youths reported on the same set of 19 items as from the BPM—Teacher Report Form. The BPM—Youth Self Report also assesses the same three composite domains as from the BPM—Teacher Report Form—Internalizing, Externalizing, and Attention Problems scales in the validation analyses. Internal consistency for the Internalizing, Externalizing, and Attention Problems dimensions was acceptable (Cronbach’s alphas = .72, .67, .69, respectively).

K-sads

Parents completed the semi-structured and computerized K-SADS at baseline and every annual follow-up (Kaufman et al., 1997; Townsend et al., 2020). The K-SADS assessed mental disorders including major depressive disorder (MDD), generalized anxiety disorder (GAD), social anxiety disorder (SOC), separation anxiety disorder (SEP), specific phobias (PHB), panic disorder (PAN), eating disorders, obsessive-compulsive disorder (OCD), post-traumatic stress disorder (PTSD), attention-deficit/hyperactivity disorder (ADHD), oppositional defiant disorder (ODD), and conduct disorder (CD). We focused on the screening items for these disorders because they were administered to all parents (Townsend et al., 2020). Non-screener items for each disorder were only administered when parents endorsed at least one of the screening items. Including non-screener items would have significantly restricted the non-missing K-SADS item pool and sample sizes for our validation analyses.

On the K-SADS, parents rated the presence (yes/no) of a symptom at the present time (typically within a 2-week or 1-month period of the assessment date), as well as in the past (typically in the past 6–12 months prior to their assessment date). Composite scores for each disorder were calculated by summing the number of endorsed symptoms. Due to the limited number of items administered for the individual anxiety disorders (i.e., GAD, SOC, SEP, PHB, and PAN), symptoms for each were summed. We focused on K-SADS data from baseline (n= 10,866) and computed 16 composite scores reflecting present or past symptom counts for each of the eight aforementioned mental disorders. Internal consistencies were excellent (i.e., Cronbach’s alphas >.80) for both present and past symptom counts of MDD, anxiety disorders, PTSD, ADHD, and ODD, present symptom counts of OCD, and past symptom counts of eating disorder. Internal consistencies were adequate (i.e., Cronbach’s alphas >.70 but <.80) for present and past symptom counts of CD, present symptom counts of eating disorders, and past symptom counts of OCD.

NIH Toolbox

The NIH Toolbox is a computerized measure of neurocognitive function, including executive function, visual-spatial processing, language, and memory. The following seven test instruments were administered to eligible children in English at baseline: Picture Vocabulary Task, Oral Reading Recognition Task, Pattern Comparison Processing Speed Test, List Sorting Working Memory Test, Picture Sequence Memory Test, Flanker Task, and Dimensional Change Card Sort Task. Composite scores, including fluid and crystallized cognition, were computed based on individuals’ performances in each subtest (n = 9,991). Fluid cognition reflects abilities to solve problems, spontaneously respond to external stimuli, and encode episodic memories; it is argued to be influenced more by biological processes than learning and environmental exposure. Crystallized cognition, on the contrary, reflects the accumulation of knowledge and skills and is argued to be heavily influenced by experiences such as education and parenting (Akshoomoff et al., 2013). Internal consistency is not reported because the raw items comprising each composite score are not currently available to researchers.

Statistical Analyses

Exploratory Factor Analyses

We first replicated a previously published study on the hierarchical structure of youth psychopathology based on CBCL items in the full ABCD sample (Michelini et al., 2019). Following the procedures of that study, we ensured data quality in latent modeling by removing items with excessively low frequency (>98% of “0” being endorsed). Items that were highly correlated (polychoric r>.75) were aggregated into composite scores (see Supplemental Methods for details). A bass-ackward method was used for exploratory factor analyses (EFAs), specifying one to five factors. In addition, we did not perform separate item selection, quality controls, or EFA for each racial and ethnic group because there is no theoretical or empirical evidence that different items should be examined for different racial groups. Furthermore, doing so would have precluded the investigation of measurement invariance across groups given that each group would have featured a different item pool.

CFAs

Factor loading patterns in the CFA were derived from the bass-ackwards five-factor model (Supplemental Table S2). Items were removed if their loadings on all factors were lower than .35. This cut-off was chosen to ensure that our model contained the same set of indicators as Michelini et al. (2019) so that we could replicate their model before testing its invariance. For items that had cross-loadings of higher than .35 on more than one factor, we specified the item to load onto the strongest loading factor. A total of 71 items (out of 88) exhibited factor loadings larger than .35 without cross-loadings and were kept in the CFA model. In addition, Item 91 (“talks about killing self”) exhibited similar loadings (~.40) onto both Externalizing and Internalizing; we decided to specify the item to load onto Internalizing to avoid cross-loading in the model. Thus, 72 items (out of 88) were retained in the CFA model. We treated all items as ordinal and used the weighted least square mean and variance adjusted (WLSMV) estimator to fit the models (Flora & Curran, 2004). To scale and identify the models, we used theta parameterization, which sets the variances of the latent response variables to 1 and allowed us to estimate residual variances of indicators. We used the comparative fit index (CFI), Tucker–Lewis index (TLI), root mean squared error of approximation (RMSEA), and standardized root mean squared residual (SRMR) to assess model fit. Based on previously established conventions, CFI and TLI values greater than .95, RMSEA values less than .06, and SRMR values less than .08 generally indicate “good” fit (Hu & Bentler, 1999).

Measurement Invariance

We conducted tests of measurement invariance in five steps. First, we fit a baseline model in each of the three racial and ethnic groups separately. If a model fit well in all three groups, we fit a configural model in which factor loadings, item thresholds, and intercepts were freely estimated for each group in the same model. Good performance of the model indicates that the hypothesized factor structure has configural invariance. If the configural model fit well, we then fit a metric invariance model, constraining factor loadings and item thresholds to equivalence across groups. A CFI difference of less than .01 between the metric and the configural model indicates invariant factor loadings and/or item thresholds (Cheung & Rensvold, 2002). Metric invariance suggests that the relationships between items and their corresponding latent factors are equivalent across groups. Given metric invariance, we then tested scalar invariance, in which item intercepts were constrained to equivalence across groups. Evidence of scalar invariance indicates that observed mean differences in the latent factors are due to differences in levels of the latent factors rather than measurement errors. We then tested the final level of measurement invariance, strict invariance, which constrains item residual variance to equivalence. All invariance analyses were conducted using Mplus 8.10. Survey propensity weights and research site identification number were used to adjust the standard errors of parameter estimates according to the complex survey design of the ABCD Study.

Scholars have challenged the sole reliance on fit indices when evaluating latent factor models and their invariance (Ferrando & Lorenzo-Seva, 2018; Greene et al., 2019, 2022; Waldman et al., 2023). For instance, likelihood ratio tests (LRTs) were conducted and reported in the current study, because this aligns with current convention in measurement invariance research and because it is the only index of the absolute fit of the model, which may provide useful information for future meta-analytic work (Fischer & Karl, 2019; Putnick & Bornstein, 2016; Svetina et al., 2019). However, we note that chi-square statistics tend to be sensitive to large sample sizes such that minimal group discrepancies in model parameters can lead to significant chi-square differences (Cheung & Rensvold, 2002; Kyriazos, 2018). Therefore, we did not rely on LRT results when assessing measurement invariance. In addition to fit indices, we also examined item factor loadings and item thresholds from the configural model and conducted visual and statistical comparisons across groups. Specifically, we tested differences in loadings (and their standard errors) for each factor across groups using general linear models and computed bootstrapped confidence intervals around the standard deviation of loadings for each group. Standard errors are useful indicators of the statistical precision in which loadings are estimated within each group. Standard deviations, on the contrary, are useful in assessing the variability of loadings within each group. Partial eta-squared were reported as a measure of effect size. We also computed the Tucker’s Congruence Coefficient (TCC) (Lorenzo-Seva & ten Berge, 2006; Lovik et al., 2020) as a statistical measure of factor similarities across groups. The TCC was computed based on factor loadings in a common factor model, and a value larger than .95 indicates that the common factor is statistically identical across two groups.

Validation Analyses

To validate the latent factor models for each racial and ethnic group, we examined bivariate correlations among the latent factor scores and validators, including the Internalizing, Externalizing, and Attention Problems scores from the BPM—Teacher Report Form and from the BPM—Youth Self Report, Crystalized and Fluid Cognition scores from the NIH Toolbox, and past and present symptom counts for eight mental disorders from the K-SADS. We used structural equation modeling and added the validators into the model as exogenous variables, estimating their correlations with the CBCL latent factors. To adjust for the complex survey design, the correlation coefficients were estimated with research sites as a cluster variable and with propensity weights. The Benjamini and Hochberg method was used to adjust for multiple testing by controlling for the false discovery rate (Benjamini & Hochberg, 1995). To facilitate group comparisons in the correlations, we followed the recommended practice by transforming the correlation coefficients to Fisher’s z and computing Cohen’s q as the difference in Fisher’s z between any pair of two groups. Cohen’s q of .10, .30, and .50 were interpreted as small, medium, and large effect sizes (Cohen, 1988; Zager Kocjan et al., 2021). Cohen’s q of less than .10 reflects minimal differences in correlation coefficients between any pair of groups.

Results

Descriptive Statistics

As shown in Table 1, Hispanic, non-Hispanic White, and non-Hispanic Black youths did not differ in terms of their age (in months) (M = 118.48, 119.19, 118.99; SE = 7.56, 7.54, 7.27, respectively), sex assigned at birth (female = 47.59%, 47.30%, 49.30%, respectively), parental age (in years) (M = 38.52, 41.33, 37.15; SE = 6.77, 6.04, 7.57, respectively), and parental gender (female = 90.52%, 88.04%, 92.75%, respectively). However, compared to parents of Hispanic and non-Hispanic Black youths, parents of non-Hispanic White youths reported a higher likelihood of having a partner (partnered = 89.48% vs. 79.76% and 54.76%, respectively), a higher level of education (M = 6.83, 5.12, 5.13; SE = 1.44, 2.19, 1.91, respectively), higher parental income (M = 5.27, 4.51, 4.21; SE = 2.81, 2.45, 2.30, respectively), higher family combined income (M = 8.21, 6.26, 5.15; SE = 1.64, 2.39, 2.65, respectively), and higher likelihood of utilizing medical service (utilization = 58.98%, 54.79%, 55.70%, respectively). Please also see Supplemental Table S1 for more details about the sample characteristics, including operationalization of parental education, parental income, and family combined income.

EFA

We followed the procedures described in Michelini et al. (2019), where 18 items were removed due to low frequency (>98% of “0” being endorsed). A total of 88 CBCL items remained after quality control. We then conducted an EFA using the bass-ackwards method, specifying one factor to five factors. Consistent with the previous literature (Michelini et al., 2019) and based on a visual inspection of the scree plot and parallel analysis (Supplemental Figure S1), the five-factor (i.e., Externalizing, Neurodevelopmental, Internalizing, Somatoform, and Detachment) model emerged as the best-fitting model. We selected items that exhibited at least one factor loading that was larger than .35 and specified cross-loaded items to load onto the stronger-loading factor. In total, 72 items (out of 88) were retained in the CFA (Supplemental Table S2).

CFAs

The five-factor model (Figure 1) fit well in non-Hispanic Black youths (Table 2; CFI = .96, TLI = .96, RMSEA = .01, SRMR = .07), non-Hispanic White youths (CFI = .97, TLI = .97, RMSEA = .01, SRMR = .07), and Hispanic youths (CFI = .98, TLI = .98, RMSEA = .01, SRMR = .07), indicating that the five-factor model explained the covariation among the 72 CBCL items consistently well. The baseline model’s satisfactory performance in all three groups permitted us to test its invariance across race and ethnicity.

Figure 1.

Five-Factor Latent Model.

Table 2.

Model Fit Indices.

Model	CFI	TLI	RMSEA	SRMR	Chi-squared	df	Chi-squared Diff	p(>chisq)
Baseline-NH-w	.970	.969	.009	.065	2,951.72	2,474	NA	NA
Baseline-NH-B	.957	.956	.010	.065	4,240.09	2,474	NA	NA
Baseline-HI	.977	.976	.009	.068	2,837.35	2,474	NA	NA
Configural	.966	.965	.010	.066	10,023.45	7,422	NA	NA
Metric	.966	.965	.010	.066	10,169.25	7,556	388.86	<.001
Scalar	.967	.967	.010	.066	10,259.65	7,690	232.37	<.001
Strict	.966	.967	.010	.069	10,337.01	7,762	235.70	<.001

Note. NH-w = Non-Hispanic White; NH-B = Non-Hispanic Black; HI = Hispanic. CFI = Comparative Fit Index; TLI = Tucker–Lewis Index; RMSEA = root mean squared error of approximation; SRMR = standardized root mean squared residual; Chi-square Diff = difference in chi-square statistics in LRT analysis; chisq = chi-square statistics.

Tests of Measurement Invariance

Configural Invariance

The configural model performed well (CFI = .97, TLI = .97, RMSEA = .01, SRMR = .07, Table 2), suggesting that the same five-factor structure fit the data equally well in all three groups. We did not find significant loading differences across Hispanic, non-Hispanic White, and non-Hispanic Black youths, F(2,213) = 1.25, p = .29, partial etq-squared = .01, for Externalizing (M = .71, .71, .74; SE = .02, .02, .02, respectively), Neurodevelopmental (M = .66, .66, .67; SE = .02, .02, .03, respectively), Internalizing (M = .69, .72, .72; SE = .03, .02, .03, respectively), Somatoform (M = .60, .58, .65; SE = .04, .03, .04, respectively), and Detachment (M = .72, .72, .73; SE = .02, .02, .02, respectively) (also see Supplemental Table S3). We also examined the statistical precision in which loadings were estimated in each group using the standard errors of factor loadings. Loadings were estimated precisely in all three groups, indicated by the small average standard errors of .02, .01, and .03 in Hispanic, non-Hispanic White, and non-Hispanic Black, respectively. Loadings were estimated more precisely for non-Hispanic White than for Hispanic and non-Hispanic Black youths, F(2, 213) = 13.06, p < .001, partial eta-squared = .11; however, this is likely due to the larger sample size of the non-Hispanic White individuals.

Factor loadings varied within each group but not between groups, illustrated by the overlapping ranges and similar medians for each factor across groups in the boxplot of item loadings (Figure 2). In addition, the standard deviation of loadings for each group had overlapping bootstrapped confidence intervals (95% bootstrapped confidence interval [CI] = [.10, .13], [.10, .14], and [.08, .12]) in Hispanic, non-Hispanic White, and non-Hispanic Black youths, suggesting that the variability in factor loadings were similar across groups. The highest and lowest loaded items were generally the same in the three racial and ethnic groups (Supplemental Table S3). For instance, “sudden change in mood” ranked among the top loading items in the Externalizing factor for all three groups, and “speech problem” was the lowest loading item in the Neurodevelopmental factor in all three groups. TCCs were computed for each pair of group comparisons (i.e., Hispanic and non-Hispanic White, Hispanic and non-Hispanic Black, non-Hispanic Black and non-Hispanic White), and all TCCs were above .99 (Supplemental Table S4). This indicated that each of the factors did not differ significantly across the three groups based on item loadings.

Figure 2.

Item Loading Boxplot.

Item thresholds exhibited between-group similarities as well. No differences were observed among the Hispanic, non-Hispanic White, and non-Hispanic Black youths for threshold 1 (the level of latent trait required to endorse a “1” vs. “0”), M = 1.06, 1.09, 1.08; SE = .06, .04, .06, respectively; F(2, 213) = 0.09, p = .92, partial eta-squared < .01, and for threshold 2, M = 2.18, 2.20, 2.14; SE = .09, .06, .09, respectively; F(2,213) = 0.51, p = .60, partial eta-squared <.01, across items. Visual inspection of the boxplot of the item thresholds (Figure 3), the overlapping ranges, and the statistical non-significance of item threshold group comparisons provided evidence for within-group variations and between-group similarities in item thresholds.

Figure 3.

Item Threshold Boxplot.

Metric Invariance

The metric model fit the data very well (CFI = .97, TLI = .97, RMSEA = .01, SRMR = .07, Table 2). Importantly, the metric model exhibited a minimal difference in fit relative to the configural model (ΔCFI<.001, ΔTLI<.001, ΔRMSEA<.001, ΔSRMR<.001), indicating that item loadings and thresholds were statistically equivalent across groups. This is consistent with the high TCC values of each factor, the overlapping standard deviations of loadings across groups, and the non-significant differences in loadings and their standard errors across groups as reported in the Configural invariance section above. We also performed LRT tests to compare the absolute fit of the models at the four levels of invariance (Table 2). The metric invariance model fit worse than the configural model (Δχ² = 388.86, Δdf = 134). As the chi-square statistic is an absolute fit index (Alavi et al., 2020; Hu & Bentler, 1999), significant LRT results could stem from model parameters that were not identical but likely to be interpretatively similar across the racial and ethnic groups. Indeed, Figures 2 and 3 provided evidence that item loadings and item thresholds were highly similar but not statistically identical across groups. For this reason, we chose to not discuss results from the LRT tests for the rest of the model comparisons; all results were still reported in Table 2 for completeness of reporting.

Scalar Invariance

The scalar invariance model fit the data well (CFI = .97, TLI = .97, RMSEA = .01, SRMR = .07, Table 2), exhibiting a minimal decrease in CFI relative to the metric model (ΔCFI = .001, ΔTLI = .002, ΔRMSEA < .001, ΔSRMR < .001).

We generated standardized latent means and latent factor correlations from the scalar model to aid the assessment and interpretation of the model. For model identification purposes, latent means were fixed at 0 for a reference group, which was the Hispanic group in our analyses. Latent means for the other two groups were estimated in comparison to the Hispanic youth group (Table 3). Non-Hispanic White youths exhibited higher latent mean for Internalizing (M = .20, SE = .08). Non-Hispanic Black youths, on the contrary, exhibited lower latent mean for Externalizing, Internalizing, Somatoform, and Detachment (M = −.27, −.32, −.44, and −.26; SE = .09, .09, .11, and .12, respectively).

Table 3

CBCL Latent Factor Means and Correlation Coefficients.

Factor 1	Factor 2	Factor 1 latent means (SE)			Correlation coefficient (SE)
Factor 1	Factor 2	Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B
Externalizing	Neurodevelopmental	NA	.02 (.08)	-.27 (.09)	.84 (.01)^a	.82 (.01)^a	.85 (.01)^a
	Internalizing				.77 (.01)^a	.69 (.01)^a	.77 (.02)^a
	Somatoform				.63 (.02)^a	.56 (.02)^a	.58 (.04)^a
	Detachment				.73 (.02)^a	.69 (.01)	.68 (.02)^a
Neurodevelopmental	Internalizing	NA	.10 (.11)	.17 (.12)	.77 (.02)^a	.68 (.01)^a	.81 (.01)^a
	Somatoform				.65 (.03)^a	.60 (.02)^a	.60 (.03)^a
	Detachment				.72 (.02)^a	.70 (.02)^a	.75 (.02)^a
Internalizing	Somatoform	NA	.20 (.08)	-.32 (.09)	.70 (.02)^a	.67 (.01)^a	.72 (.03)^a
Internalizing	Detachment				.83 (.02)^a	.74 (.01)^a	.80 (.02)^a
Somatoform	Detachment	NA	.25 (.14)	-.44 (.11)	.70 (.03)^a	.58 (.03)^a	.66 (.03)^a
Detachment	−	NA	-.06 (.10)	-.26 (.12)	-	-	-

Note. NH-w = Non-Hispanic White; NH-B = Non-Hispanic Black. Superscripts indicate levels of statistical significance: ^a p < .001; ^b p < .01; ^c p < .05.

The five factors were strongly correlated with each other in Hispanic, non-Hispanic White, and non-Hispanic Black youths (Table 3), exhibiting average correlations (SE) of .73 (.02), .67 (.01), and .72 (.02), respectively. The strongest correlation was observed between Externalizing and Neurodevelopmental in the three groups (r = .84, .82, .85; SE = .01, .01, .01, respectively), whereas the weakest correlations were observed between Somatoform and Externalizing in the three groups (r = .63, .56, .58; SE = .02, .02, .04, respectively).

Strict Invariance

Strict invariance was supported by the model’s strong fit (CFI = .97, TLI = .97, RMSEA = .01, SRMR = .07, Table 2) and minimal difference relative to the scalar model (ΔCFI < .001, ΔTLI < .001, ΔRMSEA < .001, ΔSRMR = .003).

Validation Analyses

Convergent and Discriminant Validation

We conducted all validation analyses via structural equation modeling, with each validator construct added as an exogenous variable in the five-factor CBCL model. With respect to convergent validity, we expected to observe robust correlations for the same/similar trait across different methods (i.e., “monotrait-heteromethod”). Convergent validity was consistently supported in our models. The CBCL Internalizing, Externalizing, and Neurodevelopmental latent factors were robustly correlated with Internalizing, Externalizing, and Attention Problems scores from Teacher Report Form and Youth Self Reports of the BPM, respectively, across all three racial and ethnic groups (see gray-shaded cells in Table 4). These correlations did not significantly differ when compared across the three racial and ethnic groups, as all Cohen’s q absolute values were smaller than .1 (Supplemental Tables S4 and S5). Convergent validity was further supported through the robust correlations observed between the CBCL latent factors and past and present symptom counts assessed from K-SADS (see gray-shaded cells in Table 5). CBCL Externalizing was robustly correlated with both past and present symptom counts for ODD and CD, which are characterized as externalizing disorders. Similarly, CBCL Internalizing was robustly correlated with past and present symptom counts for eating disorders, MDD, anxiety disorders, OCD, and PTSD, which are all characterized as forms of internalizing disorders. And CBCL Neurodevelopmental was robustly correlated with ADHD, which is considered a neurodevelopmental disorder. Racial and ethnic differences in the correlations among the five latent factors and past and present symptom counts across eight mental disorders from the K-SADS were minimal to small, with most Cohen’s q values smaller than .1, and only three values larger than .2 (Supplemental Tables S4 and S6).

Table 4.

Correlations Between CBCL Latent Factors, BPM Teacher Report Form, BPM Youth Self Report, and NIH Toolbox Scores.

Timepoint	Validators	EXT			Neuro			INT
Timepoint	Validators	Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B
Baseline	EXT BPM Teacher	.33^a	.36^a	.37^a	.31^a	.30^a	.23^a	.11^b	.17^a	.11^b
6-month	EXT BPM Youth	.29^a	.35^a	.36^a	.25^a	.29^a	.29^a	.19^a	.17^a	.21^a
Baseline	ATT BPM Teacher	.31^a	.34^a	.36^a	.39^a	.40^a	.40^a	.11^c	.15^a	.09^b
6-month	ATT BPM Youth	.21^a	.29^a	.24^a	.29^a	.37^a	.31^a	.16^a	.17^a	.22^a
Baseline	INT BPM Teacher	.19^a	.25^a	.21^a	.25^a	.31^a	.24^a	.27^a	.29^a	.27^a
6-month	INT BPM Youth	.17^a	.18^a	.22^a	.20^a	.24^a	.24^a	.21^a	.26^a	.22^a
Baseline	NIH TB Fluid	-.11^a	-.12^a	-.09^a	-.16^a	-.20^a	-.13^a	-.03	-.09^a	-.02
Baseline	NIH TB Cryst	-.09^a	-.09^a	-.07^c	-.11^a	-.14^a	-.09^a	-.07^b	.02	.06^c
Timepoint		Validators			Soma			Detach
Timepoint		Validators			Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B
Baseline		EXT BPM Teacher			.00	.11^a	.08^b	.12^b	.19^a	.19^a
6-month		EXT BPM Youth			.03	.09^a	-.02	.07	.19^a	.07
Baseline		ATT BPM Teacher			.11^c	.15^a	.04	.26^a	.27^a	.22^a
6-month		ATT BPM Youth			.11^a	.14^a	.11^a	.20^a	.20^a	.22^a
Baseline		INT BPM Teacher			.13^a	.12^a	.09^a	.17^a	.17^a	.19^a
6-month		INT BPM Youth			.11^b	.16^a	.12^a	.19^a	.17^a	.19^a
Baseline		NIH TB Fluid			-.03	-.07^a	.01	-.04	-.10^a	-.07
Baseline		NIH TB Cryst			-.04	.00	.05	-.08^a	.01	.00

Note. Neuro = Neurodevelopmental; INT = Internalizing; Soma = Somatoform; Detach = Detachment; EXT BPM Teacher = Externalizing score from Brief Problem Monitor, Teacher Report Form; ATT BPM Teacher = Attention Problems score from Brief Problem Monitor, Teacher Report Form; INT BPM Teacher = Internalizing score from Brief Problem Monitor, Teacher Report Form; EXT BPM Youth = Externalizing score from Brief Problem Monitor, Youth Self Report; ATT BPM Youth = Attention Problems score from Brief Problem Monitor, Youth Self Report; INT BPM Youth = Internalizing score from Brief Problem Monitor, Youth Self Report; NIH TB Fluid = NIH Toolbox Fluid Cognition score; NIH TB Cryst = NIH Toolbox Crystallized Cognition score. Timepoint column included information about when the validators were measured in ABCD; NH-w = Non-Hispanic white; NH-B = Non-Hispanic Black; EXT = Externalizing; Neuro = Neurodevelopmental; INT = Internalizing; Soma = Somatoform; Detach = Detachment; EXT BPM Teacher = Externalizing score from Brief Problem Monitor, Teacher Report Form; ATT BPM Teacher = Attention Problems score from Brief Problem Monitor, Teacher Report Form; INT BPM Teacher = Internalizing score from Brief Problem Monitor, Teacher Report Form; EXT BPM Youth = Externalizing score from Brief Problem Monitor, Youth Self Report; ATT BPM Youth = Attention Problems score from Brief Problem Monitor, Youth Self Report; INT BPM Youth = Internalizing score from Brief Problem Monitor, Youth Self Report; NIH TB Fluid = NIH Toolbox Fluid Cognition score; NIH TB Cryst = NIH Toolbox Crystallized Cognition score. Superscripts indicate levels of statistical significance: ^a p <.001; ^b p <.01; ^c p <.05; correlation coefficients without superscripts were not statistically significant; gray-shaded cells represented hypothesized convergent validity, such that correlations between similar traits (assessed with different methods) should have higher correlations than different traits (assessed with different methods), as represented by the non-shaded cells.

Table 5.

Correlations Between CBCL Latent Factors and K-SADs Present and Past Symptom Counts.

	EXT			Neuro			INT
Variable	Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B
Eat K-SADS Present	.24^a	.22^a	.28^a	.27^a	.24^a	.31^a	.29^a	.20^a	.38^a
Eat K-SADS Past	.29^a	.32^a	.27^a	.27^a	.25^a	.32^a	.36^a	.36^a	.43^a
MDD K-SADS Present	.56^a	.68^a	.53^a	.48^a	.58^a	.47^a	.49^a	.56^a	.45^a
MDD K-SADS Past	.48^a	.52^a	.47^a	.45^a	.44^a	.47^a	.47^a	.50^a	.51^a
Anxiety K-SADS Present	.27^a	.32^a	.32^a	.33^a	.35^a	.36^a	.41^a	.42^a	.48^a
Anxiety K-SADS Past	.35^a	.33^a	.38^a	.41^a	.36^a	.41^a	.48^a	.48^a	.57^a
OCD K-SADS Present	.42^a	.41^a	.38^a	.50^a	.55^a	.49^a	.46^a	.49^a	.48^a
OCD K-SADS Past	.36^a	.33^a	.32^a	.47^a	.51^a	.42^a	.49^a	.44^a	.47^a
PTSD K-SADS Present	.41^a	.48^a	.35^a	.42^a	.44^a	.33^a	.50^a	.43^a	.43^a
PTSD K-SADS Past	.37^a	.36^a	.32^a	.35^a	.35^a	.31^a	.41^a	.37^a	.41^a
ADHD K-SADS Present	.59^a	.61^a	.63^a	.77^a	.83^a	.74^a	.46^a	.41^a	.52^a
ADHD K-SADS Past	.58^a	.58^a	.61^a	.72^a	.71^a	.69^a	.40^a	.33^a	.45^a
ODD K-SADS Present	.70^a	.71^a	.70^a	.54^a	.52^a	.53^a	.49^a	.37^a	.45^a
ODD K-SADS Past	.67^a	.68^a	.68^a	.51^a	.48^a	.52^a	.48^a	.40^a	.45^a
CD K-SADS Present	.56^a	.64^a	.65^a	.51^a	.45^a	.52^a	.31^a	.28^a	.42^a
CD K-SADS Past	.49^a	.63^a	.58^a	.37^a	.50^a	.48^a	.24^a	.33^a	.34^a
				Soma			Detach
Variable				Hispanic	NH-w	NH-B	Hispanic	NH-w	NH-B
Eat K-SADS Present				.36^a	.17^a	.36^a	.26^a	.22^a	.40^a
Eat K-SADS Past				.27^a	.35^a	.34^a	.31^a	.32^a	.34^a
MDD K-SADS Present				.40^a	.44^a	.40^a	.48^a	.63^a	.49^a
MDD K-SADS Past				.48^a	.40^a	.42^a	.53^a	.50^a	.48^a
Anxiety K-SADS Present				.26^a	.32^a	.38^a	.32^a	.33^a	.40^a
Anxiety K-SADS Past				.37^a	.37^a	.39^a	.38^a	.39^a	.41^a
OCD K-SADS Present				.35^a	.38^a	.33^a	.48^a	.41^a	.47^a
OCD K-SADS Past				.38^a	.27^a	.35^a	.38^a	.38^a	.43^a
PTSD K-SADS Present				.44^a	.36^a	.29^a	.42^a	.41^a	.33^a
PTSD K-SADS Past				.32^a	.34^a	.35^a	.24^a	.30^a	.35^a
ADHD K-SADS Present				.33^a	.31^a	.38^a	.32^a	.40^a	.44^a
ADHD K-SADS Past				.28^a	.25^a	.27^a	.25^a	.34^a	.36^a
ODD K-SADS Present				.38^a	.27^a	.35^a	.42^a	.40^a	.40^a
ODD K-SADS Past				.36^a	.30^a	.33^a	.42^a	.42^a	.37^a
CD K-SADS Present				.22^a	.24^a	.26^a	.37^a	.40^a	.42^a
CD K-SADS Past				.18^a	.32^a	.29^a	.26^a	.37^a	.33^a

Note. All K-SADS symptoms were assessed at baseline. NH-w = Non-Hispanic white; NH-B = Non-Hispanic Black; EXT = Externalizing; Neuro = Neurodevelopmental; INT = Internalizing; Soma = Somatoform; Detach = Detachment; Eat K-SADS = Eating disorder symptoms; MDD K-SADS = Major depressive disorder symptoms; Anxiety K-SADS = Anxiety disorder symptoms; OCD K-SADS = Obsessive compulsive disorder symptoms; PTSD K-SADS = Post-traumatic stress disorder symptoms; ADHD K-SADS = Attention-deficit/hyperactivity disorder symptoms; ODD K-SADS = Oppositional defiant disorder symptoms; CD K-SADS = Conduct disorder symptoms. Superscripts indicate the levels of statistical significance: ^a = p < .001; ^b = p < .01; ^c = p < .05; correlation coefficients without superscripts were not statistically significant; gray-shaded cells represented hypothesized convergent validity, such that correlations between similar traits (assessed with different methods) should have higher correlations than different traits, as represented by the non-shaded cells.

Support for discriminant validity of the CBCL latent factors, however, was mixed. Once again, for discriminant validity, we expected correlations between different traits measured with the same method (“heterotrait-monomethod”) to be lower than correlations observed between the same/similar trait measured with different methods (i.e., “monotrait-heteromethod”). In addition, we expected that correlations between different traits and different methods would be generally low (i.e., “heterotrait-heteromethod”). Some of the CBCL latent factors did not exhibit consistently robust evidence for discriminant validity with respect to the heterotrait comparisons we made using the BPM—Teacher Report Form and Youth Self Report. For instance, the CBCL Neurodevelopmental latent factor did not appear to be more correlated (by virtue of stronger effect sizes and reaching more stringent levels of statistical significance) with the Attention Problems subscale (r’s all ranging from .29 to .40, p’s < .001) than with the Internalizing subscale (r’s all ranging from .20 to .31, p’s < .001) of the BPM—Teacher Report Form and Youth Self Report (Table 4). On the contrary, the CBCL Internalizing latent factor appeared to feature a consistently lower set of correlations with the Externalizing subscale of the BPM—Teacher Report Form and Youth Self Report (r’s all ranging from .09 to .22, p’s between .001 and .05) than the Internalizing subscale (r’s all ranging from .21 to .29, p’s < .001).

A mixed set of results also emerged with respect to our discriminant validity analyses using past and present symptom counts for eight mental disorders measured with the K-SADS (see Table 5). For example, while correlations between the CBCL Externalizing latent factor were robustly correlated with past and present symptom counts for ODD and CD as expected (r’s ranging from .49 to .71, p’s < .001), we also observed robust correlations between this latent factor and all non-externalizing disorder symptom counts as well (i.e., eating disorders, MDD, anxiety disorders, OCD, PTSD), with r’s as high as .68 for MDD and all p’s < .001. Similarly, the CBCL Internalizing latent factor did not appear to be consistently more correlated with other internalizing mental disorders than either ADHD (a neurodevelopmental disorder) or ODD (and externalizing disorder).

Concurrent Validity

Concurrent validity is supported when a construct correlates with theoretically related constructs from measures administered at the same time as the focal construct. With a few exceptions as noted herein, there was generally consistent evidence for the concurrent validity of each of the CBCL latent factors in that Crystalized and Fluid Cognition scores from the NIH Toolbox were significantly and negatively correlated with Externalizing and Neurodevelopmental latent factors across the three racial and ethnic groups (all p’s < .001) (see Table 4). Fluid Cognition was negatively correlated with the CBCL Internalizing latent factor in non-Hispanic White youths (p < .001), but not in Hispanic and non-Hispanic Black youths, and negatively correlated with Somatoform and Detachment factor in non-Hispanic White youths only (p < .001). The Crystalized Cognition score was significantly and negatively correlated with the CBCL Internalizing latent factor in Hispanic youths (p < .01) and in non-Hispanic Black youths (p < .05), and negatively correlated with Detachment factor in Hispanic youths only (p < .001). Racial and ethnic differences in the correlations among the five latent factors and Crystalized and Fluid Cognition scores from the NIH Toolbox were minimal to small, with only one Cohen’s q absolute value larger than .1 (Supplemental Tables S4 and S5).

Discussion

Replicating and extending upon prior findings (Michelini et al., 2019; Stewart et al., 2024), we first identified full measurement invariance of the five-factor model across Hispanic, non-Hispanic White, and non-Hispanic Black youths, in which the items loaded strongly onto the following factors: Externalizing, Neurodevelopmental, Internalizing, Somatoform, and Detachment. However, evidence for the overall construct validity of these models was mixed. The five CBCL latent factors demonstrated consistent evidence of convergent and concurrent validity across each racial and ethnic group but mixed evidence of discriminant validity. This calls into question the broader issue of whether our HiTOP latent factors are conceptually distinct enough from one another.

Findings from this study support HiTOP as a promising taxonomic framework of psychopathology across various demographics, groups, and methods, including racial and ethnic groups, age, gender, informants, and measurement tools (Eaton et al., 2012, 2013; He & Li, 2021; Hoffmann et al., 2022; Ivanova et al., 2019; Lahey et al., 2017; Ringwald et al., 2023; Stewart et al., 2024). Our study adds to a growing literature suggesting that the HiTOP model is robust across race and ethnicity among youths and adults. Critically, establishing measurement invariance provides HiTOP researchers greater confidence to interpret differences in etiological or psychosocial associations between groups and across the dimensions. In addition, the prevailing approach to assessing measurement invariance of psychopathology models has emphasized fit indices and their differences between models (e.g., Bieda et al., 2017; Eaton et al., 2013; He & Li, 2021; Zager Kocjan et al., 2021). Our study rigorously established measurement invariance by testing for configural, metric, scalar, and strict invariance and utilizing multiple indices to evaluate invariance including alternative fit indices, the TCC, and statistical tests of loading differences across groups. Relying solely on fit indices has several limitations, including their sensitivity to sample size and model complexity, the use of arbitrary cut-point values, inconsistent conclusions about model performance, and a bias toward bifactor models (Ferrando & Lorenzo-Seva, 2018; Fischer & Karl, 2019; Greene et al., 2019, 2022; Waldman et al., 2023). Furthermore, chi-square statistics are computed within a null hypothesis testing framework, providing information on the statistical significance of hypotheses rather than their magnitude or practical significance (Kirk, 1996; Peugh & Feldon, 2020). In our study, for instance, statistically significant LRT results likely reflect group variations in model parameters that are small and not practically meaningful. Therefore, in addition to reporting fit indices, we incorporated alternative criteria such as the TCC, effect sizes of group differences in item loadings and their standard errors and standard deviations, as well as group differences in item thresholds to assess the quality and equivalence of our latent model across groups. Examining and reporting these model characteristics are crucial for interpreting the validity and generalizability of a model.

Another significant contribution of our study was that we leveraged the classical Multitrait-Multimethod framework by Campbell and Fiske (1959) to rigorously test for convergent, divergency, and concurrent forms of validity with respect to our CBCL latent factors across racial and ethnic groups. Our validators included different measures and informants—i.e., Teacher and Youth Self Reports from the BPM, caregiver ratings from the K-SADS, and youth scores from the NIH Toolbox. While each of the CBCL latent factors exhibited consistent evidence of convergent and concurrent validity, the discriminant validity of the CBCL latent factors was not nearly as robust. We highlighted the example of the CBCL Neurodevelopmental latent factor, which did not appear to be any more correlated with the Attention Problems subscale (as would be expected) than with the Internalizing subscale as measured in either the BPM—Teacher Report Form or the Youth Self Report. The CBCL Internalizing latent factor did not appear to be any more correlated with K-SADS internalizing disorders (e.g., MDD, anxiety disorders) than externalizing ones (e.g., ODD, CD), as correlations were uniformly robust across each pairwise trait and across racial and ethnic groups. This may indicate the lack of conceptual distinction between these putatively distinct traits (i.e., neurodevelopmental and internalizing disorders), which may not be inherently surprising considering the omnipresence of the p-factor in HiTOP studies (Michelini et al., 2019; Watts et al., 2022). The p-factor may have significantly attenuated the uniqueness of any single latent factor in our models, resulting in the relatively weak evidence of discriminant validity for each of the CBCL latent factors across racial and ethnic groups. Supporting this conclusion is that our cross-factor bivariate correlations consistently show between-factor correlations above .50 across racial and ethnic subgroups. Given that our study aimed to replicate the factor structure from Michelini et al., it may be worth testing alternative factor models beyond the five-factor correlated factor model, as increasing the number of latent factors might yield more orthogonality between the latent dimensions.

The p-factor notwithstanding, the weak evidence of discriminant validity of the CBCL latent factor models raises the broader question of whether HiTOP latent factors are distinct enough from one another to have meaningful utility for clinical science research. As it stands, the relatively scant HiTOP-aligned studies that have rigorously tested the validity of their measures and models have produced rather uninspiring conclusions with respect to this aspect of validity. Funkhouser and colleagues (2021) raised concerns about the weak criterion and discriminant validity and the reliability of HiTOP models when estimated using a bifactor model in a community sample (N = 504). Zimmermann and colleagues (2022) developed preliminary scales for the HiTOP Detachment spectrum using an MTurk and university-based sample and found “lower, but still substantial associations” with putatively unassociated traits from a different questionnaire (i.e., heterotrait-heteromethod), leading them to conclude that their scale only showed “some evidence” of convergent and discriminant validity. Contending with weak discriminant validity was also prominent in the development of the HiTOP Internalizing measure (Watson, Forbes, et al., 2022). We note that discriminant validation is complicated by the fact that there is no scientific consensus as to how discriminant validity ought to be established, as some studies consider it as a matter of varying degrees (e.g., degree to which the absolute value of the correlation between two constructs differs from one), whereas others define it as a dichotomous attribute (e.g., complete absence of a significant correlation between two constructs) (see excellent review by Rönkkö & Cho, 2020). We found Campbell’s (1960) classical conceptualization of discriminant validity to be instructive for its simplicity and practicality: “tests [should] not correlate too highly with measures from which it is supposed to differ” (Campbell, 1960, p. 548). Through this lens, few HiTOP-aligned studies, including ours, have provided convincing evidence of discriminant validity. In addition, the p-factor does not entirely explain the weak discriminant validity in HiTOP measures and constructs. Measurement artifacts, such as common methods variance, excessive redundancy and overlapping items, and unmeasured third variable effects (e.g., sociodemographic factors, cognitive abilities), may also be at play.

It is important to note that the current study did not aim to test and explain group differences in latent factors and their correlations with relevant constructs. As noted by several others (Bernard et al., 2021; Causadias & Cicchetti, 2018; Coll et al., 1996; Rodriguez-Seijas, Li, et al., 2023; Schwartz & Meyer, 2010), careful follow-up research is needed to fully and rigorously contextualize these types of findings by systematically investigating plausible causal mechanisms underlying HiTOP dimensions within and between racial and ethnic groups. Establishing that the higher-order latent dimensions are invariant and valid across race and ethnicity is critical for establishing the clinical utility of HiTOP (Cicero & Ruggero, 2021).

Several limitations of the current study should be mentioned. First, we did not test for individual item-level invariance of the CBCL across groups. Such tests would entail extensive differential item functioning analyses and would have been beyond the scope of the current study, which aimed to examine invariance at the level of latent factors (i.e., factorial invariance) (D’Urso et al., 2022; Kim & Yoon, 2011). Items that exhibit non-invariance may introduce systematic measurement error if a small number of items are used to measure an underlying construct. Future studies can build on the current study and examine item-level invariance of measures of psychopathology. Second, contextual (e.g., sociodemographic) information was not accounted for in our models. For example, ABCD families of non-Hispanic White youths had higher incomes and greater educational attainment compared to families of Hispanic and non-Hispanic Black youths. Crucially, socioeconomic status may impact not only the experience and expression of youth mental health traits and conditions but also parents’ reporting of their offspring’s mental health (Crijnen et al., 1999). Follow-up studies focusing on measurement invariance should account for contextual factors, such as socioeconomic status, which may affect item response and subsequent model fit across groups. Third, our analyses only focused on the three most prevalent racial and ethnic groups in the dataset. Youths from other racial and ethnic backgrounds could not be included in the analyses due to a very limited sample size relative to a rather complex model. Thus, the question of whether the HiTOP model is invariant across all racial and ethnic groups in the United States remains largely unanswered. Fourth, we only focused our analysis on the baseline wave of data collection, when the participants were aged 9–10 years. It is possible that the model we identified here may not be invariant at follow-up assessments, thus necessitating replication of this work at later follow-ups.

In summary, our study conducted extensive tests to validate the HiTOP model across several (but not all) racial and ethnic youth groups in the United States. We provided strong evidence for racial and ethnic measurement invariance of the five-factor model, as well as its convergent and concurrent validity, but found weak evidence for discriminant validity. In line with recent calls for a more equitable clinical science (Gordon, 2020; Rodriguez-Seijas, McClendon, et al., 2023), our study highlights the importance of testing the construct validity of measures for mental health measures across diverse populations, especially if they are to be used and interpreted for minoritized populations.

Supplemental Material

sj-docx-1-asm-10.1177_10731911251391567 – Supplemental material for Invariance and Construct Validity of HiTOP Dimensions Across Race and Ethnicity in the Adolescent Brain and Cognitive Development (ABCD) Study

Supplemental material, sj-docx-1-asm-10.1177_10731911251391567 for Invariance and Construct Validity of HiTOP Dimensions Across Race and Ethnicity in the Adolescent Brain and Cognitive Development (ABCD) Study by James J. Li, Quanfa He, Irwin D. Waldman and Craig Rodriguez-Seijas in Assessment

Footnotes

Authors’ Note

A listing of participating sites and a complete listing of the study investigators can be found at . ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This manuscript reflects the views of the authors and may not reflect the opinions or views of the National Institutes of Health (NIH) or ABCD consortium investigators.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The ABCD Study^® is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, and U24DA041147. A full list of supporters is available at . James J. Li was supported by grants from Wisconsin Alumni Research Foundation, the National Institute of Mental Health (R01 MH134039, R01 MH128371) and in part by a core grant to the Waisman Center from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P50 HD105353).

ORCID iDs

James J. Li

Quanfa He

Craig Rodriguez-Seijas

Data Availability Statement

Data used in the preparation of this article were obtained from the Adolescent Brain Cognitive Development (ABCD) Study (https://abcdstudy.org), held in the NIMH Data Archive (NDA). This is a multisite, longitudinal study designed to recruit more than 10,000 children age 9–10 years and follow them over 10 years into early adulthood. The ABCD data repository grows and changes over time. The ABCD data used in this report came from https://nda.nih.gov/study.html?id=901. DOIs can be found at .

Supplemental Material

Supplemental material for this article is available online.

References

Achenbach

T. M.

(1999). The Child Behavior Checklist and related instruments. In Maruish

M. E.

(Ed.), The use of psychological testing for treatment planning and outcomes assessment (2nd ed., pp. 429–466). Lawrence Erlbaum.

Achenbach

T. M.

McConaughy

S. H.

Ivanova

M. Y.

Rescorla

L. A.

(2011). Manual for the ASEBA Brief Problem MonitorTM for Ages 6-18. ASEBA.

Achenbach

T. M.

Rescorla

L. A.

(2004). The Achenbach System of Empirically Based Assessment (ASEBA) for ages 1.5 to 18 years. In The use of psychological testing for treatment planning and outcomes assessment: Instruments for children and adolescents (Vol. 2, 3rd ed., pp. 179–213). Lawrence Erlbaum.

Ahmed

Conway

C. A.

(2020). Medical and mental health comorbidities among minority racial/ethnic groups in the United States. Journal of Social, Behavioral, and Health Sciences, 14(1), 153–168. https://doi.org/10.5590/JSBHS.2020.14.1.11

Akshoomoff

Beaumont

J. L.

Bauer

P. J.

Dikmen

Gershon

Mungas

Slotkin

Tulsky

Weintraub

Zelazzo

Heaton

R. K.

(2013). NIH Toolbox Cognitive Function Battery (CFB): Composite scores of crystallized, fluid, and overall cognition. Monographs of the Society for Research in Child Development, 78(4), 119–132. https://doi.org/10.1111/mono.12038

Alavi

Visentin

D. C.

Thapa

D. K.

Hunt

G. E.

Watson

Cleary

(2020). Chi-square for model fit in confirmatory factor analysis. Journal of Advanced Nursing, 76(9), 2209–2211. https://doi.org/10.1111/jan.14399

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). https://www.psychiatry.org/psychiatrists/practice/dsm

Bagby

R. M.

Keeley

J. W.

Williams

C. C.

Mortezaei

Ryder

A. G.

Sellbom

. (2022). Evaluating the measurement invariance of the Personality Inventory for DSM-5 (PID-5) in Black Americans and White Americans. Psychological Assessment, 34(1), 82–90. https://doi.org/10.1037/pas0001085

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.

10.

Bernard

D. L.

Calhoun

C. D.

Banks

D. E.

Halliday

C. A.

Hughes-Halbert

Danielson

C. K.

(2021). Making the “C-ACE” for a culturally-informed adverse childhood experiences framework to understand the pervasive mental health impact of racism on Black youth. Journal of Child & Adolescent Trauma, 14, 233–247. https://doi.org/10.1007/s40653-020-00319-9

11.

Bieda

Hirschfeld

Schönfeld

Brailovskaia

Zhang

X. C.

Margraf

(2017). Universal happiness? Cross-cultural measurement invariance of scales assessing positive mental health. Psychological Assessment, 29(4), 408–421. https://doi.org/10.1037/pas0000353

12.

Bitsko

R. H.

Holbrook

J. R.

Ghandour

R. M.

Blumberg

S. J.

Visser

S. N.

Perou

Walkup

J. T.

(2018). Epidemiology and impact of health care provider-diagnosed anxiety and depression among US children. Journal of Developmental and Behavioral Pediatrics: JDBP, 39(5), 395–403. https://doi.org/10.1097/DBP.0000000000000571

13.

Borsboom

(2006). When does measurement invariance matter? Medical Care, 44(11), S176–S181.

14.

Brown

T. N.

Sellers

S. L.

Brown

K. T.

Jackson

J. S.

(2013). Race, ethnicity, and culture in the sociology of mental health. In Aneshensel

C. S.

Phelan

J. C.

(Eds.), Handbook of the sociology of mental health (pp. 167–182). Springer. https://doi.org/10.1007/0-387-36223-1_9

15.

Buchanan

N. T.

Perez

Prinstein

M. J.

Thurston

I. B.

(2021). Upending racism in psychological science: Strategies to change how science is conducted, reported, reviewed, and disseminated. The American Psychologist, 76(7), 1097–1112. https://doi.org/10.1037/amp0000905

16.

Campbell

D. T.

(1960). Recommendations for APA test standards regarding construct, trait, or discriminant validity. American Psychologist, 15(8), 546–553.

17.

Campbell

D. T.

Fiske

D. W.

(1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–104.

18.

Causadias

J. M.

Cicchetti

(2018). Cultural development and psychopathology. Development and Psychopathology, 30(5), 1549–1555. https://doi.org/10.1017/S0954579418001220

19.

Cénat

J. M.

Broussard

Jacob

Kogan

Corace

Ukwu

Onesi

Furyk

S. E.

Bekarkhanechi

F. M.

Williams

Chomienne

M.-H.

Grenier

Labelle

P. R.

(2024). Antiracist training programs for mental health professionals: A scoping review. Clinical Psychology Review, 108, 102373. https://doi.org/10.1016/j.cpr.2023.102373

20.

Cheung

G. W.

Rensvold

R. B.

(2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5

21.

Cicero

D. C.

Ruggero

C. J.

(2021). Commentary—Opening a can of worms: The importance of testing the measurement invariance of hierarchical models of psychopathology—a commentary on He and Li (2020). Journal of Child Psychology and Psychiatry, 62(3), 299–302. https://doi.org/10.1111/jcpp.13353

22.

Cicero

D. C.

Ruggero

C. J.

Balling

C. E.

Bottera

A. R.

Cheli

Elkrief

Forbush

K. T.

Hopwood

C. J.

Jonas

K. G.

Jutras-Aswad

Kotov

Levin-Aspenson

H. F.

Mullins-Sweatt

S. N.

Johnson-Munguia

Narrow

W. E.

Negi

Patrick

C. J.

Rodriguez-Seijas

Sheth

Thomeczek

M. L.

(2024). State of the science: The hierarchical taxonomy of psychopathology (HiTOP). Behavior Therapy, 55(6), 1114–1129. https://doi.org/10.1016/j.beth.2024.05.001

23.

Cohen

(1988). Statistical power analysis for the behavioral sciences. Routledge. https://doi.org/10.4324/9780203771587

24.

Coll

C. G.

Crnic

Lamberty

Wasik

B. H.

Jenkins

García

H. V.

McAdoo

H. P.

(1996). An integrative model for the study of developmental competencies in minority children. Child Development, 67(5), 1891–1914. https://doi.org/10.1111/j.1467-8624.1996.tb01834.x

25.

Conway

C. C.

Kotov

Krueger

R. F.

Caspi

(2023). Translating the hierarchical taxonomy of psychopathology (HiTOP) from potential to practice: Ten research questions. American Psychologist, 78(7), 873–885. https://doi.org/10.1037/amp0001046

26.

Crijnen

A. A. M.

Achenbach

T. M.

Verhulst

F. C.

(1999). Problems reported by parents of children in multiple cultures: The Child Behavior Checklist Syndrome constructs. American Journal of Psychiatry, 156(4), 569–574. https://doi.org/10.1176/ajp.156.4.569

27.

Cronbach

L. J.

Meehl

P. E.

(1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

28.

De Los Reyes

Augenstein

T. M.

Wang

Thomas

S. A.

Drabick

D. A. G.

Burgers

D. E.

Rabinowitz

. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. https://doi.org/10.1037/a0038498

29.

D’Urso

E. D.

De Roover

Vermunt

J. K.

Tijmstra

(2022). Scale length does matter: Recommendations for measurement invariance testing with categorical factor analysis and item response theory approaches. Behavior Research Methods, 54(5), 2114–2145. https://doi.org/10.3758/s13428-021-01690-7

30.

Eaton

N. R.

Keyes

K. M.

Krueger

R. F.

Balsis

Skodol

A. E.

Markon

K. E.

Grant

B. F.

Hasin

D. S.

(2012). An invariant dimensional liability model of gender differences in mental disorder prevalence: Evidence from a national sample. Journal of Abnormal Psychology, 121(1), 282–288. http://dx.doi.org.ezproxy.library.wisc.edu/10.1037/a0024780

31.

Eaton

N. R.

Keyes

K. M.

Krueger

R. F.

Noordhof

Skodol

A. E.

Markon

K. E.

Grant

B. F.

Hasin

D. S.

(2013). Ethnicity and psychiatric comorbidity in a national sample: Evidence for latent comorbidity factor invariance and connections with disorder prevalence. Social Psychiatry and Psychiatric Epidemiology, 48(5), 701–710. https://doi.org/10.1007/s00127-012-0595-5

32.

Ferrando

P. J.

Lorenzo-Seva

(2018). Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educational and Psychological Measurement, 78(5), 762–780. https://doi.org/10.1177/0013164417719308

33.

Fischer

Karl

J. A.

(2019). A primer to (cross-cultural) multi-group invariance testing possibilities in R. Frontiers in Psychology, 10, 1507. https://doi.org/10.3389/fpsyg.2019.01507

34.

Flora

D. B.

Curran

P. J.

(2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9(4), 466–491. https://doi.org/10.1037/1082-989X.9.4.466

35.

Forbes

M. K.

Watts

A. L.

Twose

Barrett

Hudson

J. L.

Lyneham

H. J.

McLellan

Newton

N. C.

Sicouri

Chapman

McKinnon

Rapee

R. M.

Slade

Teesson

Markon

Sunderland

(2024). A hierarchical model of the symptom-level structure of psychopathology in youth. Clinical Psychological Science, 13(2), 278–300. https://doi.org/10.1177/21677026241257852

36.

Funkhouser

C. J.

Correa

K. A.

Letkiewicz

A. M.

Cozza

E. M.

Estabrook

Shankman

S. A.

(2021). Evaluating the criterion validity of hierarchical psychopathology dimensions across models: Familial aggregation and associations with research domain criteria (sub)constructs. Journal of Abnormal Psychology, 130(6), 575–586. https://doi.org/10.1037/abn0000687

37.

Gordon

(2020, June 19). Racism and mental health research: Steps toward equity. National Institute of Mental Health (NIMH). https://www-nimh-nih-gov.ezproxy.library.wisc.edu/about/director/messages/2020/racism-and-mental-health-research-steps-toward-equity

38.

Greene

A. L.

Eaton

N. R.

Forbes

M. K.

Fried

E. I.

Watts

A. L.

Kotov

Krueger

R. F.

(2022). Model fit is a fallible indicator of model quality in quantitative psychopathology research: A reply to Bader and Moshagen. Journal of Psychopathology and Clinical Science, 131, 696–703. https://doi.org/10.1037/abn0000770

39.

Greene

A. L.

Eaton

N. R.

Forbes

M. K.

Krueger

R. F.

Markon

K. E.

Waldman

I. D.

Cicero

D. C.

Conway

C. C.

Docherty

A. R.

Fried

E. I.

Ivanova

M. Y.

Jonas

K. G.

Latzman

R. D.

Patrick

C. J.

Reininghaus

Tackett

J. L.

Wright

A. G. C.

Kotov

(2019). Are fit indices used to test psychopathology structure biased? A simulation study. Journal of Abnormal Psychology, 128(7), 740–764. https://doi.org/10.1037/abn0000434

40.

J. J.

(2021). Factorial invariance in hierarchical factor models of mental disorders in African American and European American youths. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 62(3), 289–298. https://doi.org/10.1111/jcpp.13243

41.

Hillemeier

M. M.

Foster

E. M.

Heinrichs

Heier

(2007). Racial differences in parental reports of attention-deficit/hyperactivity disorder behaviors. Journal of Developmental and Behavioral Pediatrics, 28(5), 353–361. https://doi.org/10.1097/DBP.0b013e31811ff8b8

42.

Hoffmann

M. S.

Moore

T. M.

Axelrud

L. K.

Tottenham

Zuo

X.-N.

Rohde

L. A.

Milham

M. P.

Satterthwaite

T. D.

Salum

G. A.

(2021). Reliability and validity of bifactor models of dimensional psychopathology in youth from three continents. medRxiv. https://doi.org/10.1101/2021.06.27.21259601

43.

Hoffmann

M. S.

Moore

T. M.

Kvitko Axelrud

Tottenham

Zuo

X.-N.

Rohde

L. A.

Milham

M. P.

Satterthwaite

T. D.

Salum

G. A.

(2022). Reliability and validity of bifactor models of dimensional psychopathology in youth. Journal of Psychopathology and Clinical Science, 131(4), 407–421. https://doi.org/10.1037/abn0000749

44.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

45.

Ivanova

M. Y.

Achenbach

T. M.

Rescorla

L. A.

Guo

Althoff

R. R.

Kan

K.-J.

Almqvist

Begovac

Broberg

A. G.

Chahed

da Rocha

M. M.

Dobrean

Döepfner

Erol

Fombonne

Fonseca

A. C.

Forns

Frigerio

Grietens

Verhulst

F. C.

(2019). Testing syndromes of psychopathology in parent and youth ratings across societies. Journal of Clinical Child & Adolescent Psychology, 48(4), 596–609. https://doi.org/10.1080/15374416.2017.1405352

46.

Kaufman

J. P. D.

Birmaher

B. M. D.

Brent

D. M. D.

Rao

U. M. D.

Flynn

C. M. A.

Moreci

P. M. S.W.

Williamson

D. M. A.

Ryan

N. M. D.

(1997). Schedule for affective disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL): Initial reliability and validity data. Journal of the American Academy of Child, 36(7), 980–988.

47.

Kim

E. S.

Yoon

(2011). Testing measurement invariance: A comparison of multiple-group categorical CFA and IRT. Structural Equation Modeling: A Multidisciplinary Journal, 18(2), 212–228. https://doi.org/10.1080/10705511.2011.557337

48.

Kirk

R. E.

(1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. https://doi.org/10.1177/0013164496056005002

49.

Kotov

Cicero

D. C.

Conway

C. C.

DeYoung

C. G.

Dombrovski

Eaton

N. R.

First

M. B.

Forbes

M. K.

Hyman

S. E.

Jonas

K. G.

Krueger

R. F.

Latzman

R. D.

J. J.

Nelson

B. D.

Regier

D. A.

Rodriguez-Seijas

Ruggero

C. J.

Simms

L. J.

Skodol

A. E.

Wright

A. G. C.

(2022). The Hierarchical Taxonomy of Psychopathology (HiTOP) in psychiatric practice and research. Psychological Medicine, 52(9), 1666–1678. https://doi.org/10.1017/S0033291722001301

50.

Kotov

Jonas

K. G.

Carpenter

W. T.

Dretsch

M. N.

Eaton

N. R.

Forbes

M. K.

Forbush

K. T.

Hobbs

Reininghaus

Slade

South

S. C.

Sunderland

Waszczuk

M. A.

Widiger

T. A.

Wright

A. G. C.

Zald

D. H.

Krueger

R. F.

Watson

Workgroup

H. U.

(2020). Validity and utility of Hierarchical Taxonomy of Psychopathology (HiTOP): I. Psychosis superspectrum. World Psychiatry, 19(2), 151–172. https://doi.org/10.1002/wps.20730

51.

Kotov

Krueger

R. F.

Watson

Achenbach

T. M.

Althoff

R. R.

Bagby

R. M.

Brown

T. A.

Carpenter

W. T.

Caspi

Clark

L. A.

Eaton

N. R.

Forbes

M. K.

Forbush

K. T.

Goldberg

Hasin

Hyman

S. E.

Ivanova

M. Y.

Lynam

D. R.

Markon

Zimmerman

(2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4), 454–477. https://doi.org/10.1037/abn0000258

52.

Krueger

R. F.

Hobbs

K. A.

Conway

C. C.

Dick

D. M.

Dretsch

M. N.

Eaton

N. R.

Forbes

M. K.

Forbush

K. T.

Keyes

K. M.

Latzman

R. D.

Michelini

Patrick

C. J.

Sellbom

Slade

South

S. C.

Sunderland

Tackett

Waldman

Waszczuk

M. A.

Workgroup

H. U.

(2021). Validity and utility of Hierarchical Taxonomy of Psychopathology (HiTOP): II. Externalizing superspectrum. World Psychiatry, 20(2), 171–193. https://doi.org/10.1002/wps.20844

53.

Kyriazos

T. A.

(2018). Applied psychometrics: Sample size and sample power considerations in factor analysis (EFA, CFA) and SEM in general. Psychology, 9(8), Article 8. https://doi.org/10.4236/psych.2018.98126

54.

Lahey

B. B.

Krueger

R. F.

Rathouz

P. J.

Waldman

I. D.

Zald

D. H.

(2017). A hierarchical causal taxonomy of psychopathology across the life span. Psychological Bulletin, 143(2), 142–186. https://doi.org/10.1037/bul0000069

55.

Lorenzo-Seva

ten Berge

J. M. F.

(2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 2(2), 57–64. https://doi.org/10.1027/1614-2241.2.2.57

56.

Lovik

Nassiri

Verbeke

Molenberghs

(2020). A modified Tucker’s congruence coefficient for factor matching. Methodology, 16(1), 59–74. https://doi.org/10.5964/meth.2813

57.

Meredith

(1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825

58.

Merikangas

K. R.

Burstein

Swendsen

Avenevoli

Case

Georgiades

Heaton

Swanson

Olfson

(2011). Service utilization for lifetime mental disorders in U.S. adolescents: Results of the National Comorbidity Survey–Adolescent Supplement (NCS-A). Journal of the American Academy of Child & Adolescent Psychiatry, 50(1), 32–45. https://doi.org/10.1016/j.jaac.2010.10.006

59.

Michelini

Barch

D. M.

Tian

Watson

Klein

D. N.

Kotov

(2019). Delineating and validating higher-order dimensions of psychopathology in the Adolescent Brain Cognitive Development (ABCD) study. Translational Psychiatry, 9(1), 1–15. https://doi.org/10.1038/s41398-019-0593-4

60.

Michelini

Carlisi

C. O.

Eaton

N. R.

Elison

J. T.

Haltigan

J. D.

Kotov

Krueger

R. F.

Latzman

R. D.

J. J.

Levin-Aspenson

H. F.

Salum

G. A.

South

S. C.

Stanton

Waldman

I. D.

Wilson

(2024). Where do neurodevelopmental conditions fit in transdiagnostic psychiatric frameworks? Incorporating a new neurodevelopmental spectrum. World Psychiatry, 23(3), 333–357. https://doi.org/10.1002/wps.21225

61.

Moore

T. M.

Kaczkurkin

A. N.

Durham

E. L.

Jeong

H. J.

McDowell

M. G.

Dupont

R. M.

Applegate

Tackett

J. L.

Cardenas-Iniguez

Kardan

Akcelik

G. N.

Stier

A. J.

Rosenberg

M. D.

Hedeker

Berman

M. G.

Lahey

B. B.

(2020). Criterion validity and relationships between alternative hierarchical dimensional models of general and specific psychopathology. Journal of Abnormal Psychology, 129(7), 677–688. https://doi.org/10.1037/abn0000601

62.

Peugh

Feldon

D. F.

(2020). “How well does your structural equation model fit your data?”: Is Marcoulides and Yuan’s equivalence test the answer? CBE—Life Sciences Education, 19(3), es5. https://doi.org/10.1187/cbe.20-01-0016

63.

Putnick

D. L.

Bornstein

M. H.

(2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review: DR, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004

64.

Ringwald

W. R.

Abramovitch

Agelink Van Rentergem

Kotov

(2024). Do cognitive functions belong in the Hierarchical Taxonomy of Psychopathology (HiTOP) Model? A meta-analysis. PsyArXiv. https://doi.org/10.31234/osf.io/8r2vq

65.

Ringwald

W. R.

Forbes

M. K.

Wright

A. G. C.

(2023). Meta-analysis of structural evidence for the Hierarchical Taxonomy of Psychopathology (HiTOP) model. Psychological Medicine, 53(2), 533–546. https://doi.org/10.1017/S0033291721001902

66.

Rodriguez-Seijas

C. A.

J. J.

Balling

Brandes

Bernat

Boness

C. L.

Forbes

M. K.

Forbush

K. T.

Joyner

K. J.

Krueger

R. F.

Levin-Aspenson

H. F.

Michelini

Rutter

Stanton

Tackett

J. L.

Waszczuk

Eaton

N. R.

(2023). Diversity and the hierarchical taxonomy of psychopathology (HiTOP). Nature Reviews Psychology, 2(8), Article 8. https://doi.org/10.1038/s44159-023-00200-0

67.

Rodriguez-Seijas

C. A.

McClendon

Wendt

D. C.

Novacek

D. M.

Ebalu

Hallion

L. S.

Hassan

N. Y.

Huson

Spielmans

G. I.

Folk

Khazem

Neblett

Cunningham

Hampton-Anderson

Steinman

Hamilton

J. L.

Mekawi

(2023). The next generation of clinical psychological science: Moving toward antiracism. PsyArXiv. https://doi.org/10.31234/osf.io/mhdx8

68.

Rönkkö

Cho

(2020). An updated guideline for assessing discriminant validity. Organizational Research Methods, 25(1), 6–14. https://doi.org/10.1177/1094428120968614

69.

Schwartz

Meyer

I. H.

(2010). Mental health disparities research: The impact of within and between group analyses on tests of social stress hypotheses. Social Science & Medicine, 70(8), 1111–1118. https://doi.org/10.1016/j.socscimed.2009.11.032

70.

Simms

L. J.

Wright

A. G. C.

Cicero

Kotov

Mullins-Sweatt

S. N.

Sellbom

Watson

Widiger

T. A.

Zimmermann

(2022). Development of Measures for the Hierarchical Taxonomy of Psychopathology (HiTOP): A collaborative scale development project. Assessment, 29(1), 3–16. https://doi.org/10.1177/10731911211015309

71.

Stewart

L. C.

Asadi

Rodriguez-Seijas

Wilson

Michelini

Kotov

Cicero

D. C.

Olino

T. M.

(2024). Measurement invariance of the Child Behavior Checklist (CBCL) across race/ethnicity and sex in the Adolescent Brain and Cognitive Development (ABCD) study. Psychological Assessment, 36(8), 441–451. https://doi.org/10.1037/pas0001319

72.

Svetina

Rutkowski

(2019). Multiple-group invariance with categorical outcomes using updated guidelines: An illustration using M plus and the lavaan/semTools packages. Structural Equation Modeling: A Multidisciplinary Journal, 27, 1–20. https://doi.org/10.1080/10705511.2019.1602776

73.

Tackett

J. L.

Hallquist

(2022). The need to grow: Developmental considerations and challenges for modern psychiatric taxonomies. Journal of Psychopathology and Clinical Science, 131(6), 660–663. https://doi.org/10.1037/abn0000751

74.

Townsend

Kobak

Kearney

Milham

Andreotti

Escalera

Alexander

Gill

M. K.

Birmaher

Sylvester

Rice

Deep

Kaufman

(2020). Development of three web-based computerized versions of the Kiddie schedule for affective disorders and schizophrenia child psychiatric diagnostic interview: Preliminary validity data. Journal of the American Academy of Child & Adolescent Psychiatry, 59(2), 309–325. https://doi.org/10.1016/j.jaac.2019.05.009

75.

U.S. Census. (2020). 2020 Census illuminates racial and ethnic composition of the country. Census.Gov. https://www.census.gov/library/stories/2021/08/improved-race-ethnicity-measures-reveal-united-states-population-much-more-multiracial.html

76.

Vandenberg

R. J.

Lance

C. E.

(2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. https://doi.org/10.1177/109442810031002

77.

Waldman

I. D.

King

C. D.

Poore

H. E.

Luningham

J. M.

Zinbarg

R. M.

Krueger

R. F.

Markon

K. E.

Bornovalova

Chmielewski

Conway

Dretsch

Eaton

N. R.

Forbes

M. K.

Forbush

Naragon-Gainey

Greene

A. L.

Haltigan

J. D.

Ivanova

Joyner

Zald

(2023). Recommendations for adjudicating among alternative structural models of psychopathology. Clinical Psychological Science, 11(4), 616–640. https://doi.org/10.1177/21677026221144256

78.

Wang

P. S.

Lane

Olfson

Pincus

H. A.

Wells

K. B.

Kessler

R. C.

(2005). Twelve-month use of mental health services in the United States: Results from the National Comorbidity Survey Replication. Archives of General Psychiatry, 62(6), 629–640. https://doi.org/10.1001/archpsyc.62.6.629

79.

Watson

Forbes

M. K.

Levin-Aspenson

H. F.

Ruggero

C. J.

Kotelnikova

Khoo

Michael Bagby

Sunderland

Patalay

Kotov

(2022). The development of preliminary HiTOP internalizing spectrum scales. Assessment, 29(1), 17–33.

80.

Watson

Levin-Aspenson

H. F.

Waszczuk

M. A.

Conway

C. C.

Dalgleish

Dretsch

M. N.

Eaton

N. R.

Forbes

M. K.

Forbush

K. T.

Hobbs

K. A.

Michelini

Nelson

B. D.

Sellbom

Slade

South

S. C.

Sunderland

Waldman

Witthöft

Wright

A. G. C.

Workgroup

H. U.

(2022). Validity and utility of Hierarchical Taxonomy of Psychopathology (HiTOP): III. Emotional dysfunction superspectrum. World Psychiatry, 21(1), 26–54. https://doi.org/10.1002/wps.20943

81.

Watts

A. L.

Makol

B. A.

Palumbo

I. M.

De Los Reyes

Olino

T. M.

Latzman

R. D.

DeYoung

C. G.

Wood

P. K.

Sher

K. J

. (2022). How robust is the p factor? Using multitrait-multimethod modeling to inform the meaning of general factors of youth psychopathology. Clinical Psychological Science, 10(4), 640–661. https://doi.org/10.1177/2167702621105517

82.

Weller

B. E.

Blanford

K. L.

Butler

A. M.

(2018). Estimated prevalence of psychiatric comorbidities in U.S. adolescents with depression by race/ethnicity, 2011–2012. Journal of Adolescent Health, 62(6), 716–721. https://doi.org/10.1016/j.jadohealth.2017.12.020

83.

Whitney

D. G.

Peterson

M. D.

(2019). US national and state-level prevalence of mental health disorders and disparities of mental health care use in children. JAMA Pediatrics, 173(4), 389–391. https://doi.org/10.1001/jamapediatrics.2018.5399

84.

Zager Kocjan

Jose

Socan

Avsec

. (2021). Measurement invariance of the subjective happiness scale across countries, gender, age, and time. Assessment, 29(4), 826–841. https://doi.org/10.1177/1073191121993558

85.

Zimmermann

Widiger

T. A.

Oeltjen

Conway

C. C.

Morey

L. C.

(2022). Developing preliminary scales for assessing the HiTOP detachment spectrum. Assessment, 29(1), 75–87. https://doi.org/10.1177/10731911211015313

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB