Abstract
The parent-report Affective Reactivity Index (ARI-P) is the most studied brief scale specifically developed to assess irritability, but relatively little is known about its performance in early childhood (i.e., ≤8 years). Support in such populations is particularly important given developmental shifts in what constitutes normative irritability across childhood. We examined the performance of the ARI-P in a diverse, treatment-seeking sample of children ages 3 to 8 years (N = 115; mean age = 5.56 years; 58.4% from ethnic/racial minority backgrounds). In this sample, confirmatory factor analysis supported the single-factor structure of the ARI-P previously identified with older youth. ARI-P scores showed large associations with another irritability index, as well as small-to-large associations with aggression, anxiety, depression, and attention problems, supporting the convergent and concurrent validity of the ARI-P when used with children in this younger age range. Findings support the ARI-P as a promising parent-report tool for assessing irritability in early childhood, particularly in clinical samples.
Irritability is typically characterized by a low threshold for tolerating frustration, a proneness to frustration-based temper outbursts and tantrums, and underlying negative affect in between outbursts (Leibenluft, 2017; Roy & Comer, 2020). To varying degrees, irritability is a normative part of child development, although in more chronic and severe forms, irritability presents as a highly impairing and transdiagnostic clinical concern (Althoff et al., 2010; Brotman et al., 2006; Copeland et al., 2015; Leibenluft, 2017; Roy & Comer, 2020; Stringaris & Goodman, 2009; Vidal-Ribas et al., 2016; Wakschlag et al., 2015). In the Diagnostic and Statistical Manual of Mental Disorders and International Classification of Diseases, irritability is explicitly included as a symptom in the definitions of a wide range of mental disorders—such as generalized anxiety disorder, oppositional defiant disorder, major depression (in children and adolescents), bipolar disorder, and disruptive mood dysregulation disorder—and has been linked with several other disorders, including the full spectrum of anxiety disorders (Cornacchio et al., 2016; Shimshoni et al., 2020), attention-deficit/hyperactivity disorder (Nigg et al., 2020), conduct disorders (Stringaris et al., 2009; Stringaris & Goodman, 2009), and autism spectrum disorder (Kalvin et al., 2020).
Irritability has been prospectively associated with increased psychopathology into adulthood, greater functional impairment, higher psychiatric service use, and elevated suicide risk (Dougherty et al., 2013; Dougherty et al., 2015; Hawes et al., 2019; Jha et al., 2020; Leibenluft & Stoddard, 2013; Pickles et al., 2010; Wiggins et al., 2014), underscoring the critical need for accurate assessment in childhood. Assessing irritability in early childhood (i.e., ages 8 years and below) represents a particularly important challenge, given that severe irritability in early childhood is associated with particularly unfavorable trajectories across time (Dougherty et al., 2015), and the fact that normative thresholds for detecting severe irritability evolve across development (Copeland et al., 2015; Kaat et al., 2019; Martin et al., 2017; Wakschlag et al., 2012; Wiggins et al., 2020). Importantly, irritability levels considered to be normal in early childhood are considered abnormal in later childhood.
Focused empirical investigation into the identification, nature, and treatment of pediatric irritability is still relatively new (see Roy & Comer, 2020), and as such, progress in the evidence-based assessment of irritability has somewhat lagged. That said, the past 15 years have witnessed considerable advances in irritability assessment. In the literature, pediatric irritability has typically been assessed via youth-report, parent-report, and/or structured or semistructured interview (Cornacchio et al., 2016; Dougherty et al., 2013; Dougherty et al., 2020; Haller et al., 2020; Stringaris et al., 2012; Wakschlag et al., 2015, Wakschlag et al., 2020; Wiggins et al., 2018). Across these modes of irritability assessment, parent-report questionnaires are typically favored for several reasons. First, parent-reports of child irritability are more reliable than youth’s own self-reports, and parent-reports are also more predictive of clinical outcomes in adulthood (Dougherty et al., 2020; Stringaris et al., 2009). In addition, structured and semistructured clinical interview strategies to assessing irritability may not be as reliable as parent-report questionnaire-based approaches and may not be sufficiently sensitive to symptoms at the lower end of the irritability spectrum (Dougherty et al., 2020). Structured and semistructured interview strategies also offer less pragmatic feasibility relative to questionnaire-based approaches in resource-limited clinical practice or research settings.
The parent-report Affective Reactivity Index (ARI-P) is the most studied brief (i.e., <10 items) scale developed specifically to assess irritability in youth (Stringaris et al., 2012). Across seven items, the ARI-P measures the frequency, severity, duration, and impairment of irritable symptoms, with a parallel youth self-report directly corresponding to each of the seven parent-report items. Across community and clinical samples, the ARI-P has demonstrated strong psychometric properties, including excellent internal consistency, acceptable test–retest reliability, strong convergent and predictive validity, and a robust single-factor structure assessing a unidimensional irritability construct (DeSousa et al., 2013; Dougherty et al., 2020; Evans et al., 2020; Stringaris et al., 2012; Tseng et al., 2017). The ARI-P has been translated into over 15 non-English languages and has demonstrated consistent psychometric properties in several different countries (e.g., DeSousa et al., 2013; Mulraney et al., 2014; Tseng et al., 2017).
Despite advances in the psychometric evaluation of the ARI-P, relatively little has been learned about its performance in younger child populations. The majority of psychometric work on the ARI-P has been conducted in samples of youth in later childhood and adolescence. Existing psychometric studies that have included children ages 8 years and below have conducted analyses on pooled samples that include children as old as 17 or 18 years (e.g., Evans et al., 2020; Stringaris et al., 2012), limiting the extent to which findings can speak specifically to early childhood. Dougherty et al.’s (2020) recent study of irritability assessments offers preliminary support for the use of the ARI-P in younger child populations, but their sample included children as old as 10 years, did not include preschoolers ages 3 or 4 years, and focused on community youth rather than a clinically selected sample. Accordingly, psychometric evaluations of the ARI-P have yet to adequately evaluate the measure’s utility with early child clinical populations.
Given the extent to which early child irritability in clinical populations is predictive of worsening psychopathology and impairment across time, the lack of documented support for the ARI-P in clinical populations ages 8 years and below is a critical gap. Other supported parent-report questionnaires of irritability—such as the Temper Loss scale of the Multidimensional Assessment Profile of Disruptive Behavior (MAP-DB; Wakschlag et al., 2015)—are not as concise as the ARI-P, and their supporting psychometric research has focused predominantly on community youth. Moreover, the MAP-DB has not been evaluated for use with older youth, potentially limiting its use in studies aiming to use a consistent measurement strategy to longitudinally follow children across time or to cross-sectionally compare youth at different ages.
The present study examined the psychometric performance of the ARI-P in a diverse clinical sample of children ages 3 to 8 years seeking treatment for a range of internalizing and/or externalizing problems. We applied confirmatory factor analysis (CFA) to examine the extent to which the single-factor structure supported in samples of older youth (Evans et al., 2020; Stringaris et al., 2012) fit the data in the present clinical sample of children ages 8 years and below. We further examined the internal consistency of the ARI-P in this sample, as well as convergent validity by examining the extent to which ARI-P scores were associated with scores on a three-item irritability scale from the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2000) that has appeared as an ad hoc index of irritability in the literature (Cornacchio et al., 2016; Roberson-Nay et al., 2015; Wiggins et al., 2014). Finally, to evaluate concurrent validity of the ARI-P in this clinical sample of children ages 8 years and below, we examined the extent to which the ARI-P scores associated with CBCL scores reflecting aggression, anxiety problems, depressive problems, and attention problems, each of which have been found to associate with irritability in previous research.
Method
Participants
Participants (N = 115) were primary caregivers of a pooled sample of children between the ages of 3 to 8 years who presented with diverse clinical issues for youth mental health services at an interdisciplinary children’s mental health center located in a metropolitan region of the Southeastern United States. Families were drawn from samples of participants across four clinical trials focused on a range of internalizing and/or externalizing problems (n = 43 drawn from Comer et al., 2021; n = 27 from Comer et al., 2017; n = 31 from Cornacchio et al., 2019; n = 3 Cooper-Vince et al., 2017), or from a general child anxiety clinic at the same institution (n = 11). Table 1 presents baseline demographic characteristics of the sample. Roughly half of the children were female, and 58.4% were from ethnic and/or racial minority backgrounds. The majority of children from ethnic and/or racial minority background self-identified as Hispanic/Latinx (80.3%). Families came from a range of socioeconomic backgrounds, with 73.9% of families earning an annual household income of less than $100,000 (M = $140,754, SD = $420,151; Mdn = $70,000). The child age distribution across the sample was relatively balanced—that is, 28.7% (N = 33) of the sample was between the ages of 3 to 4 years, 38.3% (N = 44) was between the ages of 5 to 6 years, and 33.0% (N = 38) was between the ages of 7 to 8 years. Clinical presentations of children varied across the sample and comorbidity was common. Specifically, 37% of the children were diagnosed with selective mutism and/or social anxiety disorder, 31% were diagnosed with generalized anxiety disorder, 27% were diagnosed with oppositional defiant disorder, 24% were diagnosed with attention-deficit/hyperactivity disorder, 24% were diagnosed with separation anxiety disorder, 10% were diagnosed with a specific phobia, 6% were diagnosed with conduct disorder or disruptive behavior disorder not otherwise specified, and 4% were diagnosed with obsessive–compulsive disorder.
Demographic Characteristics of Sample.
Data provided by N = 115 families (100% of sample). bData provided by N = 109 families (94.8% of sample). cData provided by N = 112 families (97.4% of sample). dMedian annual household income = $70,000. eData provided by N = 106 families (92.2% of sample).
Measures
Affective Reactivity Index
The seven-item ARI-P (Stringaris et al., 2012) is a commonly used and well-supported measure of pediatric irritability in samples of youth in later-middle childhood or adolescence. The ARI-P has parents consider their child’s behavior and feelings over the past 6 months relative to other same-aged children. Parents indicate the extent to which they agree with each of six irritability items (e.g., “loses temper frequently;” “is easily annoyed by other people”) with a response of either 0 = “not true,” 1 = “somewhat true,” or 2 = “certainly true.” The ARI-P total score reflects the sum of the first six items (range: 0-12), and then a seventh item assesses impairment due to the child’s irritability (range: 0-2). In samples of youth older than those in the present sample, multiple factor analyses have supported a single-factor structure of the first six ARI-P items (e.g., Stringaris et al., 2012), and the measure has demonstrated excellent internal consistency, good construct validity and discriminant validity, excellent test–retest reliability and incremental utility over other irritability indices (Dougherty et al., 2020; Stoddard et al., 2014; Stringaris et al., 2012; Tseng et al., 2017).
Child Behavior Checklist
The CBCL (Achenbach & Dumenci, 2001; Achenbach & Rescorla, 2000) was used to collect caregiver reports of child emotional and behavioral problems. Caregivers completed the CBCL 1.5-5 (Achenbach & Rescorla, 2000) for children <6 years and the CBCL 6-18 (Achenbach & Dumenci, 2001) for children ≥6 years. The CBCL 1.5-5 and the CBCL 6-18 have both demonstrated strong psychometric properties in previous studies, including good internal consistency (Achenbach et al., 2003; CBCL 1.5-5 α = .95 and CBCL 6-18 α = .91 in present sample) and strong convergent and divergent validity of its scales (Nakamura et al., 2009). Both CBCL versions yield several broadband problem scores, as well as several clinically oriented subscales. For each scale, T-scores normed by age and sex are generated. The present analysis included the CBCL anxiety problems scale, the CBCL anxious-depressed scale, the CBCL aggressive behavior scale, and the CBCL attention problems scale. The present study also included the three-item CBCL irritability scale that has increasingly appeared in the literature as an ad hoc measure of pediatric irritability (“temper tantrums or hot temper,” “stubborn, sullen or irritable,” and “sudden changes in mood or feelings”; Cornacchio et al., 2016; Roberson-Nay et al., 2015; Wiggins et al., 2014).
Data Analysis
First, summary indices and basic descriptive statistics were computed to describe the mean ARI-P score in the sample and the extent to which scores correlated with child age. Second, Cronbach’s α and interitem correlations were computed in this sample to examine the internal consistency of the ARI-P. Higher alpha coefficients reflect greater internal consistency of a measure, with alpha coefficients ≥.70 considered to be acceptable. That said, α coefficients that are too high (i.e., >.90) suggest there may be redundancies across items and the measure should be shortened (Tavakol & Dennick, 2011).
Third, CFA was used to examine the factor structure of the ARI-P in this sample using Mplus 8, using the weighted least squares means and variance adjusted estimator. Specifically, we examined a one-factor structure of the first six ARI-P items based on the results of multiple previous factor analyses of the ARI-P in older youth that have supported a single-factor solution (e.g., Stringaris et al., 2012). Model fit was evaluated using the following fit indices: Chi-squared statistic (χ2; smaller values indicate better fit); comparative fit index (CFI; values ≥ .95 are considered good); root mean square error of approximation (RMSEA; values ≤.06 are considered good), and weighted root mean square residual (WRMR; values ≤1.0 are considered good). Fit indices were interpreted according to cutoff guidelines (Hu & Bentler, 1999; Yu, 2002)—that is, “excellent overall fit” reflects the CFI, RMSEA, and WRMR all fall in the “good” range, “acceptable overall fit” reflects two of the three indices fall in the “good” range, and “poor fit” reflects only one or none of the fit indices fall in the “good” range.
After evaluating the structure of the first six ARI-P items, we evaluated the convergent validity of the ARI-P in this sample by computing partial correlations (controlling for child age) assessing the degree of association between ARI-P total scores and the CBCL irritability scale. Next, to evaluate concurrent validity, partial correlations (controlling for age) were computed to assess associations between the ARI-P total score and the CBCL anxiety problems scale, the CBCL anxious/depressed subscale, the CBCL aggressive problems scale, and the CBCL attention problems scale. Finally, to examine the unique variance in ARI-P total scores that these CBCL subscales each account for, a regression model was run using CBCL anxiety problems, CBCL anxious/depressed scores, CBCL aggressive problems, CBCL attention problems, and child age as simultaneous predictors of ARI-P total scores.
Results
Descriptive Findings
Table 2 presents descriptive statistics for the ARI-P total score and each individual ARI-P item. ARI-P scores were negatively correlated with age (r = −.259, p = .005), indicating that irritability scores in this sample declined with age. Irritability impairment also declined with age (r = −.232, p = .013). The first six items of the ARI-P exhibited acceptable internal consistency in the present sample (α = .87; mean interitem r = .541), with no indication of item redundancy or need to shorten the scale.
ARI-P Descriptive Statistics and Factor Loadings of Individual Items.
Note. ARI-P = Affective Reactivity Index–Parent Version.
Item 7 was not included in factor analyses or the total score.
Factor Structure
Table 2 also presents factor loadings of each of the first six ARI-P items. Table 3 presents the fit indices drawn from the CFA examining a one-factor solution. CFA confirmed that the traditional one-factor solution of the ARI-P provided an acceptable fit to the data in this sample, with two of the three fit indices (CFI, WRMR) falling in the “good” range.
Fit Indices From Confirmatory Factor Analysis of the First Six ARI-P Items as a Single Factor.
Note. ARI-P = Affective Reactivity Index–Parent Version; CFI = comparative fit index; RMSEA = root mean square error of approximation; WRMR = weighted root mean square residual.
Convergent and Concurrent Validity
Table 4 presents bivariate partial correlations (controlling for age) between ARI-P scores and the included CBCL subscales. Evidence of strong convergent validity is captured by the “large” (i.e., ≥.5) partial correlation between ARI-P total scores and the three-item CBCL irritability scale. Furthermore, significant partial correlations between the ARI-P total score and the included CBCL clinical subscales provide evidence of concurrent validity. Moreover, the ARI Impairment item was significantly associated with the CBCL clinical subscales. Regression analysis examined the unique variance in ARI-P total score accounted for by child age, anxiety problems score, anxious/depressed symptoms, attention problems, and aggressive problems score. These predictors collectively accounted for a significant amount of the variance in the ARI-P total scores, F(5, 112) = 17.46, p < .001, R2 = .45, R2 adjusted = .42. CBCL aggressive problems score, but not child age, CBCL anxiety problems score, CBCL anxious/depressed score, or CBCL attention problems score, uniquely predicted ARI-P total score, b = .67, t(112) = 7.55, p < .001.
Convergent and Concurrent Validity of the ARI-P.
Note. ARI-P = Affective Reactivity Index–Parent Version; CBCL = Child Behavior Checklist; Partial correlations in table control for child age.
p < .05. **p < .01. ***p < .001.
Discussion
The present study offers the first psychometric evaluation of the ARI-P specifically in a clinical sample of children 8 years and younger. Building on ARI-P psychometric studies with older and/or community youth (DeSousa et al., 2013; Evans et al., 2020; Stringaris et al., 2012), findings from our clinical sample provide further support for the internal consistency of the ARI-P and its single-factor structure in early childhood. Although prior ARI-P psychometric evaluations have included younger children (e.g., Dougherty et al., 2020; Evans et al., 2020; Stringaris et al., 2012), these analyses were conducted on community samples and/or pooled samples that included much older youth. Accordingly, the present analysis offers the first findings that can speak specifically to the performance of the ARI-P in clinical populations ages 8 years and below. The significant association between the ARI-P and the CBCL irritability scale was large in magnitude and provides strong evidence of convergent validity. Moreover, evidence of concurrent validity is provided by significant, small-to-large sized associations between ARI-P scores and several CBCL clinical scales, which is consistent with previous research (e.g., Stoddard et al., 2014) finding irritability to be associated with a range of internalizing and externalizing clinical problems.
Consistent with previous longitudinal research examining trajectories of preschool irritability (e.g., Pagliaccio et al., 2018), ARI-P scores in the present cross-sectional sample declined with age. This is also consistent with research on child development showing that, normatively, tantrums are most common in children between 18 and 60 months, with the highest frequency occurring between 3 and 5 years of age (Potegal & Davidson, 2003). Although the present study speaks to the factor structure, internal consistency, and convergent and concurrent validity of the ARI-P, future work is needed that integrates both clinical and community samples of children to determine the extent to which ARI-P scores can distinguish clinical from nonclinical youth in the context of normative child development. ARI-P analyses using item response theory in this age group, similar to what has been done with the MAP-DB (Wakschlag et al., 2012), might help to distinguish normative irritable behaviors assessed by the ARI-P from more atypical ones that may be more indicative of severe irritability.
Although anxiety, depression, aggression, and attention problems were all positively associated with ARI-P scores in this sample, most of the unique variance in ARI-P scores was accounted for by externalizing symptoms. This pattern of findings runs counter to findings in older samples showing that internalizing symptoms and negative affect uniquely predict a considerable proportion of variance in irritability, particularly among girls (Humphreys et al., 2019). This further highlights the importance of studying irritability within focused developmental periods and underscores the need for caution when downward extending findings from older youth to early childhood populations. Moreover, the associations between ARI-P scores and parent-reports of child aggression were rather large. Although more research is needed, it is possible that treatment-seeking parents of children in this age range have difficulty distinguishing irritability from aggressive child behavior.
Despite several strengths of this study—including our recruitment of a diverse, clinical sample of children between the ages of 3 and 8 years, several study limitations warrant comment. First, the present clinical sample was recruited from treatment-seeking families, and thus findings may not generalize to clinical populations who do not access services. Relatedly, given the nature of the sample, we are unable to draw conclusions about ARI-P performance differences between clinical and nonclinical youth, or the measure’s ability to distinguish clinical from community youth. Second, this study was not powered to examine differences between various diagnostic groups, particularly in light of the frequency of comorbidity in the sample. Third, data collection occurred at a single point in time (i.e., during pretreatment intake) and therefore the present study cannot speak to test–retest reliability or the instrument’s sensitivity to treatment. Fourth, the median annual household income in the sample was $70,000, and roughly one third of participating families were earning over $100,000 per year, which may reflect that this was a treatment-seeking sample. Future work will do well to recruit families that are more financially reflective of the U.S. population. Fifth, despite collecting a more diverse sample than is typical in pediatric irritability research (e.g., Cornacchio et al., 2016), roughly half of the present participants were Latinx and/or Hispanic. ARI-P evaluations are needed in Black/African American and Asian American samples, among other minority groups, to evaluate ARI-P performance across racial and cultural groups. Sixth, our early childhood sample did not include children between the ages of 0 to 2 years, and thus there is no present indication that the ARI is an appropriate measure for assessing irritability or fussiness in infants and toddlers. Last, primary caregivers were the only informants of irritability in this study. Although child-reports of irritability, particularly in younger samples, may not provide much meaningful information (Cardinale et al., 2019; Dougherty et al., 2020; Kircanski, White et al., 2018), future research would do well to incorporate other informants (e.g., teachers) and/or modes of assessment (e.g., structured observation) to capture additional perspectives.
Despite these limitations, the present study offers a rare statistical portrait of the psychometric properties of the ARI-P in children ages 8 years and below presenting for mental health treatment. Findings support the ARI-P as a promising parent-report tool for assessing irritability in early childhood, particularly in clinical samples. With continued psychometric support in early childhood, the concise ARI-P can facilitate prospective research designs that examine temporal relationships between irritability and psychopathology across the lifespan. Moreover, in light of promising advances unfolding in the treatment of pediatric irritability (e.g., Hawks et al., 2020; Kircanski, Clayton et al., 2018; Linke et al., 2020; Roy & Comer, 2020), the ARI-P may play an increasing role in informing the treatment of young children in clinical settings.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institute of Mental Health (K23 MH090247, PI: Comer; F31 112296, PI: Cornacchio) and by a grant from the Andrew Kukes Foundation for Social Anxiety (PI: Comer)
