Abstract
Given the high prevalence of depressive disorder and generalized anxiety disorder (GAD) globally and their comorbidity, it is imperative to have reliable and valid measures of these constructs. The International Depression Questionnaire (IDQ) and International Anxiety Questionnaire (IAQ) are recently developed measures to assess for depressive disorder and GAD, respectively, based on ICD-11 criteria. The current study examined the reliability and validity of the IDQ and IAQ in English- and Spanish-language translations in four samples from Chile, Mexico, and the United States. The Spanish IDQ and IAQ demonstrated good to excellent internal consistency (IDQ omega = .87–.91 and IAQ omega = .86–.93) and strong convergent validity. Overall prevalence for ICD-11 depressive disorder was 15.3%, and the GAD prevalence was 22.4% across samples. Confirmatory factor analyses (CFA) found good to excellent fit across the four samples, supporting the unidimensionality of the IDQ and IAQ. Results indicate the Spanish translations of the IDQ and IAQ are psychometrically sound and appear appropriate for use in North and South American samples of Spanish speakers. Results also supported the psychometric properties of the English IDQ and IAQ in the United States.
Introduction
Depressive disorder and generalized anxiety disorder (GAD) are common psychiatric conditions; however, until recently, there were no specific self-report measures aligned with ICD-11 criteria for these diagnoses, making assessment and screening inconsistent. Depressive disorder and GAD are highly comorbid and associated with functional impairment and disability (Lecrubier, 2001). For example, a recent meta-analysis found the prevalence of depression to be increasing globally over time (Moreno-Agostino et al., 2021). International epidemiological studies have found high rates of depressive disorder, with an estimated 10.8% lifetime prevalence rate in community samples in a meta-analysis across 30 countries (Lim et al., 2018). In this meta-analysis, South America as a continent had 20.6% prevalence of depression, the highest rate across six continents (Lim et al., 2018). Other studies have found high rates of depressive disorders in South American or Hispanic/Latin samples. For example, in a nationally representative Chilean sample, the lifetime MDD prevalence was 11.1% overall, with higher rates in women (16.8%) than men (5.1%; Ministerio de Salud, 2018). In a Chilean primary care sample, 36.8% of women and 11.2% of men met criteria for a current major depressive episode (Gater et al., 1998). In a nationally representative sample from Mexico, 11.2% of adults met criteria for major depressive disorder at some time in their lifetime, and of those, 45.7% had a comorbid anxiety disorder (Kessler et al., 2015). In a U.S. sample, the lifetime prevalence of MDD was 22.2% in Puerto Rican participants, 17.4% in Cuban Americans, and 14.5% in Mexican Americans (Gonzalez et al., 2010).
Similarly, GAD is associated with social, occupational, and functional impairment (as reviewed in Wittchen, 2002), even when symptoms are subthreshold and do not meet full diagnostic criteria (Haller et al., 2014). An epidemiological survey of 26 countries found the lifetime prevalence of GAD to be 3.7% (Ruscio et al., 2017). GAD has been found to have a higher prevalence in higher-income countries (Ruscio et al., 2017), and this study also found the lifetime prevalence of GAD in a nationally representative sample in Mexico was 1.1%. In Chile, in a representative national sample, the prevalence of GAD was 2.6% overall, with higher prevalence in women (4.1%) than men (0.9%; Vicente et al., 2006). However, in Chilean primary care, the prevalence of GAD was 21.8% in women and 11.0% in men (Gater et al., 1998). Among Hispanic adults in the United States, approximately 5.8% endorsed GAD criteria (Asnaani et al., 2010).
Given the prevalence of depressive disorder and GAD globally, it is important to have reliable and valid measures of these disorders that are available for research and clinical practice. Extensive research relies on the Patient Health Questionnaire-9 (PHQ-9; Kroenke et al., 2001) and the Generalized Anxiety Disorder 7-item scale (GAD-7; Spitzer et al., 2006); however, these measures rely on the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR; American Psychiatric Association [APA], 2000), which differs from the criteria outlined in the 11th edition of the International Classification of Diseases (ICD-11; World Health Organization, 2019). For instance, for threshold expression, the DSM-5-TR uses fixed symptom counts (e.g., five out of nine symptoms for depressive disorder), whereas ICD-11 emphasizes clinical judgment and functional impairment (First et al., 2021). There is strong overlap between ICD-11 and DSM-5-TR criteria; however, there are some small symptom level differences, such as the content of the item assessing hopelessness in DSM-5-TR major depressive disorder compared with ICD-11 depressive disorder. To address this gap and to anchor ICD-11 criteria into self-report measures, Shevlin et al. (2023) developed the International Depression Questionnaire (IDQ) and the International Anxiety Questionnaire (IAQ) to assess depressive disorder and GAD, respectively. The original IDQ and IAQ were developed and validated in English in a representative sample from the United Kingdom. Their IDQ and the IAQ demonstrated strong convergent validity with the PHQ-9 (r = .90) and the GAD-7 (r = .89), respectively (Shevlin et al., 2023). They both had good item-total correlations with IDQ values ranging from .77 to .88 and IAQ values ranging from .75 to .90. Item response theory indicated good model fit, and using the diagnostic scoring procedure, 7.4% of participants screened positive for depressive disorder and 7.1% screened positive for GAD in the original development article.
Hyland et al. (2023) completed an initial replication of the reliability and validity of the IDQ and IAQ in two samples of bereaved adults in the United Kingdom and Ireland. Both the IDQ and IAQ demonstrated good reliability (ω = .95–.96) and factor structure, and both scales met criteria for configural and metric invariance. The one limitation of Hyland et al. (2023) is that they examined the factor structure of the two scales combined with one depression factor and one anxiety factor. Although the model had a good fit, it would be beneficial to examine the factor structure of the IDQ and the IAQ independently. The IDQ and IAQ have been further examined in the United Kingdom to examine the longitudinal impacts of COVID-19 on mental health outcomes, and this study found the IDQ clustered with the PHQ-9 in network, whereas the IAQ clustered with the GAD-7, indicating convergent validity (McElroy et al., 2024). Notably, the factor structure and psychometric properties of the IDQ and IAQ have not been established in a U.S. sample.
Initial translation and replication efforts have begun to increase the utility of the IDQ and IAQ. The measures have been translated into Turkish (Alpay et al., 2023), Ukrainian (Martsenkovskyi et al., 2024), and Persian (Yousefi & Mayeli, 2023). These initial replications have found good support for the IDQ and IAQ. The translated measures had high internal consistency in the Turkish translated version, with both the IDQ and IAQ having a Cronbach’s α = .89 (Hyland et al., 2025). In the Ukrainian sample, the IAQ and IDQ were strongly correlated, with r = .90 (Martsenkovskyi et al., 2024). Multiple studies have found women tend to score higher on the IAQ compared with men (Alpay et al., 2023; Martsenkovskyi et al., 2024), and though this was found in the initial development, it was noted to be a small effect size (Shevlin et al., 2023). Similarly, the IDQ was found to be higher in women in some samples (Alpay et al., 2023), with a small effect size (Shevlin et al., 2023). These initial findings are promising; however, given the international reach of the ICD-11 criteria, further replication and validation in geographically and culturally diverse samples is needed to support the use of the measures more broadly.
The current study intended to replicate and extend these findings by examining the psychometric properties of the IDQ and the IAQ in a U.S. English-speaking sample and three Spanish-speaking samples from Chile, Mexico, and the United States. It is estimated that there are approximately 519 million native Spanish speakers in the world (Anuario del Instituo Cervantes, 2025). In the United States, Spanish is the second most common language spoken in the home after English (U.S. Census Bureau, 2022), and an estimated 18.7% of the U.S. population identified as Hispanic or Latino in the 2020 Census (U.S. Census Bureau, 2024). Thus, having a valid and culturally appropriate Spanish translation of the IDQ and the IAQ would be important for research and clinical practice. The present investigation wanted to create and validate a Spanish-language translation of the IDQ and the IAQ, and, to increase generalizability, samples from three countries, Chile, Mexico, and the United States, were examined. We intended to replicate previous research on the prevalence of depressive disorder and GAD, the descriptives and reliability of the IDQ and IAQ, and the underlying factor structure of both measures. In addition to examining the Spanish IDQ and IAQ in the United States, we also examined the English language versions in an independent U.S. sample. This is the first paper to our knowledge that examines the IDQ and IAQ in an English-speaking U.S. sample, which also extends previous psychometric investigations of the original English language version.
Method
Participants
The total sample consisted of 1,043 college students from Chile (n = 252), Mexico (n = 211), and the United States (n = 580).
Chilean Sample
Participants were recruited through direct outreach on the university campus. After signing the informed consent form, participants were given a QR code to access the questionnaires on the QuestionPro digital platform using their mobile phone or laptop. A research assistant was available to answer questions and provide guidance as needed by participants. Given the in-person data collection procedures, attention checks were not implemented. Participants age ranged from 18 to 27 years (M = 20.29, SD = 1.96). Of the total, 65.5% identified as women, 30.0% as men, 0.8% as non-binary, 4.0% as other not included in the list, and 2.8% preferred not to disclose their gender. Regarding sexual orientation, 72.2% identified as heterosexual, 16.7% as bisexual, 3.6% as pansexual, 2.0% as gay, 1.6% as lesbian, 1.2% as asexual, and 2.8% as other not included in the list. Finally, in terms of relationship status, 65.5% were single, 34.1% were dating someone, and 0.4% were married.
Mexican Sample
Participants were recruited from classrooms after seeking permission from course instructors. Participation was voluntary with no incentives. Consenting participants were given instructions and asked to complete the survey as much as possible. Students who did not consent to participate were given other research-based activities. Participants took the survey on the Qualtrics platform through their personal electronic devices. No attention checks were not implemented. Participants’ ages ranged from 17 to 29 years (M = 21.17, SD = 3.93). Most participants identified as women (73.7%), followed by men (22.1%) and other gender identities not listed in the questionnaire (4.3%). Regarding sexual orientation, 70.1% identified as heterosexual, 19.0% as bisexual, 3.3% as gay, 2.8% as pansexual, 1.9% as lesbian, and 2.9% as another orientation not included in the questionnaire. In terms of relationship status, 56.9% were single, 39.3% were dating someone, 2.4% were married, and 1.4% were divorced.
U.S. Sample
The Spanish-speaking sample from the United States initially consisted of 155 participants, and 34 individuals were removed due to missing data on all questionnaires, resulting in 121 participants. All participants received attention checks and passed them to be retained in the final sample. Participants were recruited via SONA systems, an online study management platform, from students enrolled in a psychology course. Students received course credit for completion of the online survey. Participants’ ages ranged from 18 to 24 years (M = 19.37, SD = 1.30). The majority of the participants identified as Hispanic or Latino/Latinx (95.0%, n = 115), with smaller proportions identifying as Native American or Alaskan Native (1.7%, n = 2), Other (1.7%, n = 2), White, not of Hispanic origin (0.8%, n = 1). Most participants identified as women (74.8%), followed by men (18.2%), non-binary individuals (0.8%), and other gender identities not included in the list (5.9%). Regarding sexual orientation, 80.7% identified as heterosexual, 10.9% as bisexual, 1.7% as pansexual, 0.8% as gay, 0.8% as lesbian, and 5.0% as another orientation not included in the list. In terms of relationship status, 64.7% were single, and the remaining 35.3% were dating someone or in a stable relationship.
The English-speaking sample from the United States initially consisted of 700 participants, and after removing participants with missing data on all questionnaires, the sample size was 462 participants. Of those that received attention checks, three participants answered incorrectly to at least one of the attention checks and were removed from the final sample, leaving 459 participants. As above, participants were recruited from psychology courses to complete the online survey for course credit. This sample’s age ranged from 18 to 47 years (M = 20.53, SD = 4.02). The 53.5% identify as women, 29.1% as men, and 17.3% as queer or other gender identity not listed in the questionnaire. The majority identified as Hispanic or Latino/Latinx (89.0%, n = 404), followed by White and not of Hispanic origin (7.3%, n = 33), Black or African American (1.1%, n = 5), Asian (1.1%, n = 5), Native American or Alaskan Native/American Indian (0.9%, n = 4), and Biracial/Multiracial (0.7%, n = 3). Regarding sexual orientation, 78.5% identified as heterosexual, 16.3% as bisexual, 3.6% as gay, and 1.6% as lesbian. Finally, 48.5% were single, 13.2% dating, 34.8% committed to a relationship, 2.9% married, 0.4% divorced or widowed, and 0.2% preferred not to disclose their relationship status.
Measurements
ICD-11 Depression Disorder was measured using the International Depression Questionnaire (IDQ; Shevlin et al., 2023). Participants were asked about the frequency with which they have felt each of the symptoms over the last 2 weeks on a 5-point Likert-type scale ranging from 0 (Never) to 4 (Every day). Possible scores ranged from 0 to 36, where higher scores indicate greater symptomatology. To test for diagnostic criteria, endorsing five or more items was required, including at least one core symptom (items 1 or 2). An additional item assesses functional impairment; endorsement of this item (Yes) is necessary to meet diagnostic criteria.
For prevalence estimation and confirmatory factor analysis using binary indicators, items were dichotomized such that responses equal to 3 or higher (i.e., “nearly every day” or “every day”) were coded as 1 (symptom present), and responses from 0 to 2 were coded as 0 (symptom absent). This criterion reflects a conservative threshold capturing high-frequency symptom endorsement (Alpay et al., 2023). Internal consistency for the scale was α = .87 and ω = .87 for the Chilean sample; α = .91 and ω = .91 for the Mexican sample; α = .89 and ω = .89 for the U.S. Spanish-speaking sample; and α = .94 and ω = .94 for the United States English-speaking sample.
ICD-11 Generalized Anxiety Disorder was measured using International Anxiety Questionnaire (IAQ; Shevlin et al., 2023). It consists of eight self-report items assessing symptoms of generalized anxiety disorder based on ICD-11 criteria. Participants reported how often they have experienced each symptom during the past several months on a 5-point Likert-type scale ranging from 0 (Never) to 4 (Every day). Possible total scores range from 0 to 32, with higher scores indicating greater anxiety severity. To meet diagnostic criteria, participants must endorse at least four symptoms, including at least one core symptom (Items 1 or 2), as well as the additional supplementary item that evaluates associated functional impairment. For consistency with the depression measure, the same dichotomization criterion was applied for prevalence estimation and CFA with binary indicators, with responses of 3 or higher coded as 1 (symptom present; Alpay et al., 2023). Internal consistency for the scale was α = .87 and ω = .87 for the Chilean sample; α = .86 and ω = .86 for the Mexican sample; α = .92 and ω = .93 for the U.S. Spanish-speaking sample; and α =.93 and ω = .94 for the U.S. English-speaking sample.
DSM-5-TR Major Depressive Disorder (MDD) was measured using the Patient Health Questionnaire-9 (PHQ-9; Kroenke et al., 2001). It is a nine-item self-report instrument that assesses symptoms of major depressive disorder as defined in the DSM-IV-TR (4th ed.; APA, 2000) and is consistent with the more recent DSM-5-TR (APA, 2022). Participants indicated how often they have been bothered by each symptom over the past 2 weeks using a 4-point Likert-type scale ranging from 0 (Not at all) to 3 (Nearly every day). Total scores range from 0 to 27, with higher scores reflecting greater symptom severity. A cut-off score of ≥10 is used to identify probable MDD cases (Moriarty et al., 2015). Internal consistency for the scale was α = .88 and ω = .88 for the Chilean sample; α = .86 and ω = .86 for the Mexican sample; α = .90 and ω = .90 for the U.S. Spanish-speaking sample; and α = .91 and ω = .92 for the U.S. English-speaking sample.
DSM-5-TR Generalized Anxiety Disorder (GAD) was measured using the Generalized Anxiety Disorder-7 (GAD-7; Spitzer et al., 2006). It is a seven-item self-report measure assessing generalized anxiety symptoms according to DSM-IV-TR criteria (APA, 2000) and consistent with DSM-5-TR criteria. Participants rate how often they have experienced each symptom during the past 2 weeks on a 4-point Likert-type scale ranging from 0 (Not at all) to 3 (Nearly every day). Total scores range from 0 to 21, with higher scores indicating greater anxiety severity. A cut-off score of ≥10 is applied to identify probable GAD cases (Spitzer et al., 2006). Internal consistency for the scale was α = .88 and ω = .88 for the Chilean sample; α = .89 and ω = .89 for the Mexican sample; α = .88 and ω = .87 for the U.S. Spanish-speaking sample; and α = .92 and ω = .93 for the U.S. English-speaking sample.
Procedure
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. First, the IAQ and IDQ were independently translated from English into Spanish by one bilingual translator (Spanish English; ICG) with doctoral training in clinical psychology. The translator was a native Spanish speaker (from Spain) with advanced proficiency in English. This initial Spanish version was then reviewed by two additional translators from Mexico (RRT) and Chile (AF) to identify linguistic equivalences and potential discrepancies across regional versions of Spanish. Based on their feedback, suggestions were made to ensure that the wording was clear and culturally appropriate for participants in each country. The revised versions were then returned to the original translators, who integrated the proposed changes into a unified version. This new version was subsequently reviewed by all translators from the three countries to reach a final consensus. The agreed-upon Spanish version was back-translated into English by an independent bilingual translator who was blinded to the original instruments (VG). Finally, a bilingual native English speaker who was blinded to the process compared the original English version and the back-translated version of the instruments (AyF).
The final Spanish versions of the questionnaires (see Supplemental Material Tables 4S and 5S) were administered online using the Qualtrics platform in all three countries. Data collection was conducted during scheduled class hours across participating universities in Chile (June 2024 to July 2024) and Mexico (February 2023 to December 2024). In the United States, data were collected from February 2023 to April 2025. Students took approximately 35 minutes to complete the full set of questionnaires. All study procedures were reviewed and approved by the institutional ethics committee in Chile, Mexico, and the United States.
Data Analysis
Item-level descriptive statistics were first computed to summarize sample scores. Specifically, the distribution of item responses and mean scores was examined for each item across the four samples. Then, bivariate Pearson correlations were conducted to assess convergent validity among the IAQ, IDQ, GAD-7, and PHQ-9 scales across the four samples.
Next, a series of factor analytic models was tested to evaluate the unidimensionality of the IAQ and IDQ scales. The IDQ and IAQ can be scored to reflect levels of severity based on the sum of the Likert-type-scored items; this is the ‘continuous scoring’ method. In addition, the scale can also be used to test meeting diagnostic requirements based on a binary version of the items. Therefore, 6 one-factor models were tested: A total of 24 models were estimated (six per sample). For Model 1, all binary IDQ items were specified to load onto a single latent depression factor; for Model 2, all continuous IDQ items were specified to load onto a single latent depression factor. Model 3 included all binary IAQ items loading onto a single latent anxiety factor, and Model 4 included all continuous IAQ items loading onto a single latent anxiety factor. Model 5 included all binary IAQ and IDQ items, allowing the two latent variables (anxiety and depression) to covary and Model 6 included all continuous IAQ and IDQ items, allowing the two latent variables (anxiety and depression) to covary. Models 5 and 6 were tested to evaluate the degree of shared variance between anxiety and depression. These steps were important as anxiety and depression are theoretically related (often comorbid), and assessing their covariance helps verify discriminant validity.
All models were estimated in Mplus Version 9 (Muthén & Muthén, 2017) using the robust weighted least squares estimator (WLSMV). Model fit was evaluated according to standard criteria: a non-significant chi-square (χ²) statistic indicated good fit; comparative fit index (CFI) and Tucker–Lewis index (TLI) values ≥ 0.90 and ≥ 0.95 indicated acceptable and excellent fit, respectively; and root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR) values ≤ 0.08 indicated good fit (Hu & Bentler, 1999). Models 2, 4, and 6 were also tested using robust maximum likelihood (MLR) estimation to compare model fit when treating the individual items as ordinal in WLSMV and as more continuous using MLR. Furthermore, multi-group measurement invariance was tested for the IDQ and IAQ, examining if there were differences in measurement across the four unique samples using the one-factor model for each assessment and the continuous items.
There were no missing data in the Chile sample. However, the datasets from Mexico and the United States contained some missing values: 5.01% in the Mexican sample, 6.6% in the U.S. English-speaking sample, and 7.5% in the U.S. Spanish-speaking sample. Analysis of the missing values mechanisms was tested using Little’s Missing Completely at Random test (MCAR). In all cases, MCAR test results were non-significant: Mexico χ2 (114) = 112.86, p = .97; U.S. English-speaking sample χ2 (127) = 124.24, p = .617; U.S. Spanish-speaking sample χ2 (17) = 17.53, p = .42. Therefore, full information maximum likelihood was used for handling missing values. Finally, prevalence rates of ICD-11 depressive disorder and general anxiety disorder were estimated across samples. Differences in prevalence for the two diagnostic criteria were tested based on the country of where the sample was taken.
Results
Descriptives
Table 1 presents the means, standard deviations, skewness, and kurtosis for the items of the IAQ and the IDQ across the four samples. Overall, item means ranged from low to moderate values, with greater variability observed in the U.S. samples. Skew and kurtosis indices were generally within acceptable limits for univariate normality (skew < |2|, kurtosis < |3|; Kim, 2013), although some items, particularly IDQ6, showed more extreme values across the four samples, suggesting a possible skew toward lower scores. Taken together, the descriptive statistics indicate an adequate dispersion of responses across samples, supporting the use of confirmatory factor analyses to examine the unidimensionality of the scales.
Means, Standard Deviations, Skew, and Kurtosis for the Eight Items of the IAQ and the Nine Items of the IDQ Across the Four Samples.
Confirmatory Factor Analyses
As presented in Table 2, the confirmatory factor analyses indicated that the IAQ and IDQ demonstrated generally good model fit across samples, especially when items were treated as binary indicators. For the Chilean, Mexican, U.S. Spanish-speaking samples and the U.S. English-speaking sample, the one-factor models for both scales supported their unidimensionality, with fit indices within acceptable or excellent ranges. Binary models consistently outperformed models with continuous items, showing better overall fit and lower residual error. The combined models including both IAQ and IDQ items also showed adequate fit, suggesting moderate covariance between anxiety and depression factors, consistent with their theoretical association, yet supporting discriminant validity. Values for the covariance between IAQ and IDQ scales varied from B = .865 to .944.
Fit Indexes for the CFA Models Across Samples.
Note. Model 1 = all binary IDQ items; Model 2 = all continuous IDQ items; Model 3 = all binary IAQ items; Model 4 = all continuous IAQ items; Model 5 = all binary IAQ and IDQ items, allowing the two latent variables to covary; Model 6 = all continuous IAQ and IDQ items, allowing the two latent variables to covary; df = degrees of freedom; TLI = Tucker–Lewis Index; CFI = comparative fit index; RMSEA = root mean square error of approximation; CI = confidence interval; SRMR = standardized root mean residual.
p < .05. **p < .01. ***p < .001.
Factor loadings and standard deviations for the models with binary items (Model 5) and models with continuous items (Model 6) solutions across samples are presented in Tables 3 and 4, respectively. For Model 5, which examined the factor structure using binary indicators, a highly consistent pattern of factor loadings was observed across samples, confirming the robustness of both the IAQ and IDQ latent constructs. All items were statistically significant and meaningfully associated with their respective factors.
Standardized Values Across Samples for Binary IAQ and IDQ Items (Model 5).
Note. STDYX standardizations are displayed; IDQ = International Depression Questionnaire; IAQ = International Anxiety Questionnaire; US_S = U.S. Spanish-speaking sample; US_E = U.S. English-speaking sample.
p < .001.
Standardized Values Across Samples for Continuous IAQ and IDQ Items (Model 6).
Note. STDYX standardizations are displayed; IDQ = International Depression Questionnaire; IAQ = International Anxiety Questionnaire; US_S = U.S. Spanish-speaking sample; US_E = U.S. English-speaking sample.
p < .001.
In Model 6, which included continuous indicators of depressive items (IDQ) and anxiety items (IAQ), all standardized factor loadings were statistically significant (p < .001) and demonstrated strong associations with their respective latent constructs across the four samples (Chile, Mexico, U.S. Spanish-speaking, and U.S. English-speaking). The pattern of results indicates that both, IDQ and IAQ, adequately captured their intended constructs in diverse linguistic and cultural contexts.
Minor variations were observed for a few items, with slightly lower loadings for IDQ6, IAQ7, and IAQ8 in the Spanish-speaking samples. This pattern was consistent across both Model 5 (binary indicators) and Model 6 (continuous indicators). In particular, IAQ7 (“been easily annoyed by different things”; B = .513, SD = .07) in the Chilean sample and IAQ8 (“experienced sleep disturbances”; B = .506, SD = .09) in the Mexican sample were the only items with standardized coefficients below .600. Despite these small deviations, the general pattern supports the unidimensionality and structural validity of both scales across all groups.
Models 2, 4, and 6 were respecified using the maximum likelihood estimator with robust standard errors (MLR), which is appropriate for continuous data and provides robustness to non-normality. The corresponding fit indices are reported in Table 1S of the supplemental material, and the standardized factor loadings are presented in Table 2S. Interestingly, in three of the four samples (Chile, Mexico, and U.S. English-speaking), the fit indices for Model 6 (allowing both scales to covary) were not fully satisfactory, particularly in terms of CFI and RMSEA. To further explore sources of misfit, modification indices (MIs) were inspected. Results indicated that IDQ3 and IAQ6 shared additional residual variance beyond that explained by their respective latent factors. Conceptually, both items assess difficulties in concentration, a symptom common to both depression and anxiety within the ICD-11 framework. Therefore, an additional model (Model 6b) was tested in each sample, allowing the residuals of IDQ3 and IAQ6 to covary.
The inclusion of this theoretically justified residual covariance resulted in improved model fit in the Chilean, Mexican, and U.S. English-speaking samples, as shown in Supplemental Table 1S. In these samples, CFI and TLI increased and RMSEA decreased to more acceptable levels, indicating that the localized strain in Model 6 was largely attributable to the overlapping symptom content of these two items.
Measurement Invariance
A multi-group analysis was conducted to test measurement invariance across samples using Model 2 for the IDQ and Model 4 for the IAQ, both of which are continuous unidimensional models. First, measurement invariance was tested for the IDQ with ordinal indicators. The configural model yielded acceptable fit indices: χ2 (127) = 391.00, p < .001, TLI = 0.9789, CFI = 0.988, RMSEA = 0.092 (90% CI [0.082, 0.102]), SRMR = .052. Except for the RMSEA, which was slightly above conventional cut-off values, the modification indices suggested the presence of local misfit, particularly in the Mexican and U.S. English-speaking samples. Both samples showed high modification indices (MI) values, indicating that several adjustments would be necessary. Therefore, full configural invariance cannot be assumed. All other fit indices suggested that the basic factorial structure of the IDQ scale was consistent across groups, supporting partial configural invariance.
Next, measurement invariance was tested for IAQ with ordinal indicators. A configural model was tested, similarly to what was observed for the IDQ, it yielded acceptable fit indices: χ2 (97) = 464.96, p < .001, TLI = 0.982, CFI = 0.979, RMSEA = 0.123 (90% CI [0.112, 0.134]), SRMR = 0.072. With the exception of the RMSEA, which was above conventional cut-off values, the overall model fit was considered adequate. Inspection of the modification indices (MI) revealed elevated values particularly in the U.S. English-speaking and Mexican samples. Similar to IAQ, full configural invariance cannot be assumed. Nevertheless, the remaining fit indices indicate that the basic factorial structure of the IAQ scale is largely consistent across groups, supporting partial configural invariance.
Convergent Validity
The IAQ and IDQ demonstrated strong positive associations across all samples, indicating the expected comorbidity between anxiety and depression. Correlations between the IAQ and IDQ ranged from r = .79 to .83 across the four groups. In support of convergent validity, the IAQ showed high correlations with the GAD-7 (r = .69–.86) and similarly strong correlations with the PHQ-9 (r = .67–.81). Similarly, the IDQ correlated strongly with the PHQ-9 (r = .75–.91) and with the GAD-7 (r = .75–.81). These patterns were consistent across samples from Chile, Mexico, and U.S. Spanish- and English-speaking participants. Overall, the strength and pattern of correlations provide evidence for the convergent validity of the Spanish versions of the IAQ and IDQ, confirming that both scales capture the anxiety and depression construct consistent with established measures (GAD-7 and PHQ-9). Total scores, means, standard deviations, and correlation among scales are displayed in Table 5. Spearman correlations across samples are reported in Supplemental Table 3S.
Pearson Correlations, Means, and Standard Deviations Among IAQ, IDQ, GAD-7, and PHQ-9 Across the Four Samples.
Note. IAQ = International Anxiety Questionnaire; IDQ = International Depression Questionnaire; GAD-7 = Generalized Anxiety Disorder 7-item scale; PHQ-9 = Patient Health Questionnaire 9-item scale.
p < .001.
Prevalence Estimates
Table 6 presents the prevalence rates for depressive episodes and GAD across the four samples using IDQ and IAQ, respectively. Table 6 also presents prevalence rates for MDD and GAD using cut-off scores for PHQ-9 and GAD-7 scales, respectively. Overall, 15.3% of participants met the diagnostic criteria for MDD and 22.4% for GAD per ICD-11. Significant cross-national differences were observed for ICD-11 GAD, χ2 (3) = 40.70, p < .001, with notably higher rates in Chile (39.3%) and Mexico (34.6%) compared with the U.S. Spanish-(13.2%) and English-speaking (22.4%) samples. Differences across countries for MDD were smaller and did not reach statistical significance, χ2 (3) = 4.74, p = .19.
Prevalence Rates for ICD-11 Depressive Disorder and General Anxiety Disorder Across Samples.
Note. GAD = generalized anxiety disorder; DD = depressive disorder.
p < .001.
Regarding comorbidity, 13.0% of the total sample met criteria for both disorders simultaneously (i.e., overlap between depressive disorder and GAD). Among those diagnosed with depressive disorder, 81.4% also met criteria for GAD, whereas 46.7% of participants with GAD also fulfilled criteria for depressive disorder. These findings indicate a substantial degree of diagnostic overlap, particularly among individuals presenting with depressive symptomatology.
Discussion
The current study aimed to examine the psychometric properties of translated Spanish language versions of IDQ and IAQ in three Spanish-speaking samples from Chile, Mexico, and the United States, as well as examine the psychometric properties of English versions of the IDQ and IAQ in an English-speaking sample from the United States. This is the first study to report on the use of the IDQ and IAQ in the United States, and it is the first to adapt the IDQ and IAQ into Spanish. Overall, the IDQ and IAQ had good reliability, validity, and factor structure. Based on the current study, both measures seem appropriate for use in Spanish-speaking samples, pending additional replication.
Across all three samples, the Spanish IDQ and IAQ had good to excellent reliability, as indicated by both Cronbach’s α and McDonald’s ω. Across samples, IDQ item 6 (suicidal ideation) was found to have elevated skew and kurtosis values. This is to be expected given the distribution and base rate of suicide ideation and behaviors (Kessler et al., 2005); however, it is unclear if this is consistent with previous findings of the IDQ and IAQ, given these studies did not provide item-level skew and kurtosis values. Furthermore, item-level statistics suggested there was a good dispersion of scores without notable floor or ceiling effects, indicating good measurement in the samples. Regarding the prevalence of depressive episodes, the current study found 15.3% of participants across samples met criteria based on the IDQ diagnostic algorithm. This is lower than the 26.6% found in a Turkish sample (Alpay et al., 2023), but higher than the 7.4% found in adults from United Kingdom (Shevlin et al., 2023) and 8.1% in a Ukrainian sample (Martsenkovskyi et al., 2024). Similarly, the ICD-11 prevalence of GAD as measured by the IAQ using the diagnostic algorithm, in the current sample was 22.4%, while other studies have found ranges from 7.1% (Shevlin et al., 2023) to 28.5% (Alpay et al., 2023). In the present study, using cut-off scores, probable depression per PHQ-9 was 47.3% and probable GAD per the GAD-7 was 32%, which is much higher than 25.5% and 20.7% reported depression and anxiety scores using the same scales, in a representative sample from the U.K. (Shevlin et al., 2023). It is worth noting that the IDQ had a much lower overall caseness of depressive disorder relative to the PHQ-9 (15.3% compared with 47.3%). Although we cannot test sensitivity and specificity in the current sample, it appears the PHQ-9 is likely a more sensitive measure of depressive symptoms relative to the IDQ. The IAQ also had lower caseness compared with the GAD-7 (22.4% vs. 32.0%), but it was less discrepant. This could also be related to the Alpay et al. (2023) scoring criterion that emphasized more conservative endorsement of symptoms. Notably, the bivariate correlations with the PHQ-9 and GAD-7 were high and similar to previous investigations (Hyland et al., 2023; Shevlin et al., 2023), supporting the convergent validity of the measures. Finally, the Spanish IDQ and IAQ were highly correlated with each other (r = .78–.82), which is consistent with the overlap of generalized anxiety and depression.
This is the first study to examine the English language IDQ and IAQ in a sample from the United States. Results supported strong reliability and convergent validity in the English version in this sample. The means and standard deviations for both the IDQ and IAQ, as well as the prevalence rates of ICD-11 depressive disorder and GAD, were higher in the present sample compared with previous samples from the United Kingdom and Ireland (Hyland et al., 2023; Shevlin et al., 2023). This may be related to sample demographic differences, as the present study was composed of college students, whereas previous samples were bereaved adults (Hyland et al., 2023) and a representative sample of U.K. adults (Shevlin et al., 2023). Other studies have found higher prevalence of major depressive disorder (e.g., Kessler et al., 2015) and GAD (e.g., Ruscio et al., 2017) in U.S. samples compared with other high-income countries. In the CFAs, the English IDQ and IAQ also had good to excellent model fit, and factor loadings were high across the different models, supporting the unidimensional factor structure of each. Although the ICD-11 criteria are not used clinically in the United States, the measures are psychometrically appropriate for use. Item content for GAD and depressive disorder/MDD differ slightly between the two diagnostic systems (ICD-11 and DSM-5-TR). For instance, the DSM-5-TR major depressive disorder criteria require endorsement of five symptoms out of nine, whereas ICD-11 includes 10 symptoms. The additional symptom in ICD-11 is hopelessness about the future, whereas DSM-5-TR includes “feeling hopeless” as an example of a subjective indicator of depressed mood (First et al., 2021). Similarly, ICD-11 and DSM-5-TR share five out of the six symptoms for GAD, including a symptom of sympathetic autonomic overactivity rather than the symptom of “easily fatigued” in the DSM-5-TR classification (First et al., 2021). Despite these nosological difference, the strong reliability, validity, and factor structure of the IDQ and IAQ in U.S. samples is insightful for future international collaborative studies of GAD and depressive disorder that may include a U.S. sample.
When examining the latent factor structure of the IDQ and IAQ, nine different factor models were tested across all four of the samples, resulting in a thorough and comprehensive description of the fit of these models. In general, the various analyses met the benchmarks for good or excellent fit (Hu & Bentler, 1999). The IDQ and IAQ fit well individually but also demonstrated acceptable fit when considered together as two covarying factors. Comparing the results in Table 2 shows slightly lower fit using continuous scoring algorithms, especially for Model 6, which is the model where the IDQ and IAQ were tested together and allowed to covary. The MLR estimation with specified residual covariance improved model fit. Similarly, Alpay et al. (2023) found that the binary scoring had better model fit relative to the continuous scoring of the Turkish versions of the IDQ and IAQ, though both had acceptable model fit. We expanded upon Hyland et al. (2023) who examined the factor structure of the English IDQ and IAQ together, as a two-factor model, by also testing the IDQ and IAQ individually. Relative model fit indices of the English language IDQ and IAQ were better in the current study relative to Hyland et al. (2023).
However, in Model 6, where the IDQ and IAQ were examined together using the original non-binary scoring, there were two instances of standardized factor loadings greater than one. The first was in the Chilean sample for IDQ item 3 (difficulty concentrating), and the second was in the U.S. Spanish-speaking sample for IAQ item 6 (difficulty concentrating). Given that these two items are essentially identical across both measures, including them within the same model introduces redundancy that can distort parameter estimation. Such redundancy can produce estimation anomalies, specifically negative residual variances (i.e., Heywood cases), as in present models, as the model is unable to disentangle the unique variance of two items that convey nearly the same information. Results were consistent using the MLR estimation, indicating these two items, when measured and modeled simultaneously, need to share residual variance due to overlapping item content. Because of the overlapping items, if both the IDQ and IAQ are used concurrently, there is a lack of discriminant validity for the concentration item, so researchers and clinicians cannot determine if concentration difficulties are related to anxiety, depression, or generalized distress. Considering the content and item overlap between the IDQ and IAQ, it may be valuable to consider examining these measures separately rather than as a combined two-factor model to prevent model misspecification problems.
The measurement invariance testing also identified limits to the measurement and structure of the IDQ and IAQ. The IDQ demonstrated partial configural invariance, and the IAQ suggested it had partial metric invariance. This level of factor structure and item loading differences across groups is not ideal. Local model fit appeared to differ across the subsamples, and further examination is warranted. Invariance may have been affected by the different sample sizes for each group, as large sample sizes are associated with increased precision of estimated parameters (Meade & Bauer, 2007), especially given that only one of our samples was over 400 participants. Although cross-cultural measurement invariance may have provided additional support to the use of the revised IDQ and IAQ, there are limitations in language, theoretical concepts, and measurement that may be associated with multi-group cross-cultural invariance testing (see Fischer et al., 2025).
The current study should be evaluated in the context of some limitations. First, the samples were composed of convenience samples of college students, which may limit the generalizability of the results. Given that the current study was not in a clinical population, there may be limitations on the external validity and construct validity of the current results. Furthermore, the results relied on self-report measures of depression and anxiety, and future research should consider replicating these results with the support of multimodal assessments or clinician-rated interview measures. All diagnostic prevalence rates are estimates based on the scoring procedure described in Shevlin et al. (2023), and we do not have verified clinical diagnoses as a comparison. Data were collected online, which supported accessibility and privacy for participants; however, there may be concerns about participants’ attention or level of distraction in online data collection. There were some slight data collection differences across sites, given the unique context and resources of each of the universities in the three countries, and future research should examine the measurement of the Spanish- and English-language IDQ and IAQ in other contexts (e.g., paper-pencil) or with more controls for data quality assurances.
There are several strengths of the current study worth noting. Four samples were collected across three countries to increase sample size and generalizability of the findings. The overall samples were large and adequately powered for the primary analyses. We tested multiple factor structure models (covarying factors vs. each measure independently) and scoring procedures (binary vs. ordinal items) to be comprehensive in our analyses. Best practices in translation and back-translation were implemented to create culturally and linguistically appropriate versions of the IDQ and IAQ.
Given the results of the present analyses from three countries, the initial reliability and validity of the Spanish language IDQ and IAQ are promising. Results are generally consistent with other published psychometric examinations of the IDQ and IAQ, both the original English language version as well as other translations, and thus the current study expands the current literature and fills an important gap. Additional replication is required to support the use of the Spanish-language IDQ and IAQ and their implementation in clinical practice and research, as well as further examination of the IDQ and IAQ in English-speaking samples from the United States. Including a clinician-rated diagnostic interview would allow for sensitivity and specificity analyses to be conducted in the future to establish clinically elevated cut-off scores. Especially given the discrepancies in the rates of clinically elevated depression and anxiety symptoms on the IDQ and IAQ relative to the PHQ-9 and GAD-7, additional investigations into the performance of the new measures are vital. Future research should also consider examining the IDQ and IAQ in clinical and treatment-seeking samples to determine its utility in higher severity samples. Especially given the heterogeneity of the Spanish language, it will be imperative to examine performance of the IDQ and IAQ in additional samples for generalizability.
Supplemental Material
sj-docx-1-asm-10.1177_10731911261455671 – Supplemental material for Reliability and Validity of the International Depression Questionnaire and International Anxiety Questionnaire in English- and Spanish-Speaking Samples in Chile, Mexico, and the United States
Supplemental material, sj-docx-1-asm-10.1177_10731911261455671 for Reliability and Validity of the International Depression Questionnaire and International Anxiety Questionnaire in English- and Spanish-Speaking Samples in Chile, Mexico, and the United States by Brianna M. Byllesby, Román Ronzón-Tirado, Ines Cano-Gonzalez, Andrés Fresno, Yuriria González Bonilla and Ruby Charak in Assessment
Footnotes
Acknowledgements
We are grateful to Vanessa Gonzalez and Ayleen Flores for their help with the translations and backtranslations for the Spanish versions of the IDQ and IAQ scales.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data collection in Chile was supported by FONDECYT Project No. 1230715, awarded to Andrés Fresno.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
