Abstract
The Hopkins Symptom Checklist–25 (HSCL-25) is a widely applied measure of depression and anxiety. The present study examines two of its short forms—the HSCL-5 and HSCL-10, which have been proposed by previous research—in a representative sample of the German general population. To this end, we conducted exploratory and confirmatory analysis on two subsamples (n = 1,246 and n = 1,216). Our results suggest that, compared with the HSCL-25, both short forms represent economical ways of assessing depression and anxiety. Model fit was good and correlations with established measures demonstrate convergent validity. Both HSCL short forms are strongly invariant across sex, and we found evidence for partial strong invariance across age groups. Further analyses showed that differences in HSCL can be partially explained by sociodemographic variables. Finally, we report normative values for usage by researchers and clinicians. We recommend the HSCL-5 and HSCL-10 for clinical and research-oriented application.
Since its development in the 1950s (Parloff, Kelman, & Frank, 1954), the Hopkins Symptom Checklist (HSCL) has become established in clinical research. Many HSCL versions have been published and evaluated consisting of 9 to 90 items (Petermann & Brähler, 2013; Prinz et al., 2008). One of the most prominent versions of the HSCL family is the HSCL-25 (Glaesmer et al., 2014; Petermann & Brähler, 2013). The HSCL-25 is a widely applied screening tool for symptoms of depression, anxiety, and psychological distress in general. It consists of 25 items in total: 10 of these capture anxiety symptoms, while the other 15 deal with symptoms of depression. The sum score of the depression and anxiety subscale represents psychological distress.
Both diagnoses—depression and anxiety—are among the most prevalent mental disorders in the general population: More than 10% of all people suffer from either one or both impairments at least once during their lifetime (Bandelow & Michaelis, 2015; Jacobi et al., 2004; Kessler & Bromet, 2013; Wittchen et al., 2011). This can partly be explained by the high comorbidity of not only depression and anxiety disorders (Brady & Kendall, 1992; Roy-Byrne et al., 2000) but also the high comorbidity with other diseases, such as coronary heart disease and diabetes, as well as others (Anderson, Freedland, Clouse, & Lustman, 2001; Rudisch & Nemeroff, 2003).
Therefore, a valid and reliable assessment of symptoms of anxiety and depression is of the utmost importance in clinically applied psychology and in research settings alike. Despite the popularity of the HSCL-25, evaluations of the psychometric properties of the questionnaire pointed out some shortcomings. For instance, Glaesmer et al. (2014) evaluated the HSCL-25’s psychometric properties in a representative sample (N = 2,520; age range: 14-91 years), finding a barely acceptable model fit along with good reliability for the 25-item scale. Additionally, the facets depression and anxiety showed a very high correlation (.78), along with a lack of differential correlations of the subscales with external criteria. In a comprehensive study based on a student population comprising a total data set of N = 13,525 students (Skogen, Øverland, Smith, & Aarø, 2017), the unique variance attributed to the subscales when accounting for the general factor was comparatively low.
Strand, Dalgard, Tambs, and Rognerud (2003) examined two alternative measures: a 5-item and a 10-item version of the HSCL. These scales represent two brief, economical instruments for the assessment of symptoms of anxiety and depression. Considering disadvantaged people in terms of cognitive processing (i.e., older individuals, psychiatric patients) or settings where a brief screening is more desirable (i.e., large scale health surveys, repeated measurement in experimental settings), 25 items appear excessive. Shrout and Yager (1989) argue that the shortening of an initially reliable scale is easily feasible without any substantial losses in specificity or sensitivity, which Strand et al. (2003) empirically demonstrated in their study. However, a more in-depth investigation of psychometric properties—specifically, model fit and measurement invariance—is still lacking. In particular, the factor structure—whether the measurement model is best represented by one or two factors—has not been investigated as of yet.
The present study evaluates two short versions of the HSCL, which respectively consist of 5 and 10 items. We will examine item and scale descriptive statistics, and run exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) to judge model fit and measurement invariance regarding sex and age. Furthermore, we will inspect convergent validity with related measures of depression and anxiety, also comparing the HSCL-5 and HSCL-10 with the 25-item version. Finally, we will present norm values for the German general population.
Method
Participants
The study sample was recruited as part of the annual representative general population survey conducted by the University of Leipzig. With the assistance of a demographic consulting company (USUMA, Berlin, Germany), we selected a representative sample of participants, using multistage sampling. First, 258 sample point regions from all regions in Germany were randomly drawn from the most recent political election register. The second stage consisted of a random selection of households using the random route procedure. The third stage involved a random selection of household respondents using the Kish selection grid. The aim of the sampling procedure was to obtain a sample that was representative of the German population in terms of age, sex, and education. Inclusion criteria for the current study were of age 14 years or older and the ability to read and understand the German language. All participants were visited by a trained study assistant and informed about the investigation. The participants were provided with self-rating questionnaires.
Out of the 4,069 valid addresses, 2,520 participants took part in the present study. In 539 cases, the household refused to give any information, 441 targeted persons were not willing to participate, in 513 cases no person from the target household could be contacted after three attempts, 45 persons were out of town, and 11 persons were sick. We excluded those participants who had missing values on at least one of the HSCL items (n = 58), leading to a final sample of 2,462 participants (equaling a response rate of 60.5%). The sociodemographic makeup of the sample is described in Table 1. Based on the employed random selection of participants, one would expect the present sample to be fairly representative of the German general population (Jacobsen & Richter, 2019). Comparing the study sample with data from the Federal Statistical Office of Germany (2019), it becomes evident that this is the case for participant age and sex. Namely, there is a 5% mismatch for participant sex, a 12.5% mismatch for participant age. The higher mismatch for age groups is likely due to the underrepresentation of younger participants. The approximated median net income in our sample is 1,723€, compared with 1,827€ in the population—a mismatch of 6%.
Sample Description Based on HSCL-5 and HSCL-10.
Note. HSCL = Hopkins Symptom Checklist; % in pop. = population values according to the Federal Statistical Office of Germany (2019).
Ethics Statement
All participants were informed of the study procedures, data collection and anonymization of all personal data. Additionally, a detailed data privacy statement was delivered by the study assistant. The present study posed a low risk to the participants as procedures such as medical treatments, invasive diagnostics or procedures causing psychological, spiritual or social harm were not included in the present study. Therefore, according to the German law, all participants provided verbal informed consent. For underage participants, parental consent was obtained. Furthermore, the study adhered to the guidelines of the ICC/ESOMAR International Code of Marketing and Social Research Practice.
Measures
The HSCL-25 (Glaesmer et al., 2014; Petermann & Brähler, 2013) assesses symptoms of anxiety and depression using 25 items on a 4-point scale, ranging from not at all to extremely. It consists of two subscales, anxiety (10 items; ω = .856) and depression (15 items; ω = .925), which are calculated by summing up the item scores. The two scale scores can then be aggregated to a total (ω = .942) score assessing psychological distress.
The Patient Health Questionnaire (PHQ-4; Kroenke, Spitzer, Williams, & Löwe, 2009; Löwe et al., 2010) is a brief screening instrument. It uses 2 items each to assess depression (ω = .813) and anxiety (ω = .848). A Total score (ω = .869) is calculated by summing up all items. Response options range from 0 (not at all) to 3 (nearly every day).
The Brief Symptom Inventory (BSI-18; Franke et al., 2017; Petrowski, Schmalbach, Jagla, Franke, & Brähler, 2018) measures symptoms of somatization (ω = .821), depression (ω = .870), and anxiety (ω = .831). Six items per subscale inquire into the extent to which participants suffered from relevant symptoms on a 5-point scale from 0 (not at all) to 4 (extremely). The global severity index (ω = .931) is calculated by summing up all 18 items.
The Jenkins Sleep Scale assesses sleep-related disturbances using four items (Jenkins, Stanton, Niemcryk, & Rose, 1988). Participants rate the frequency of experiencing certain difficulties within the duration of 1 month on a 5-point scale, ranging from 0 (never) to 4 (22-31 days). Internal consistency in the present study was ω = .912.
Analysis Plan
First, we split the study sample into two comparable random samples for the purpose of conducting EFA (n = 1,246) and CFA (n = 1,216) on different samples. For the EFA, we employed three methods. First, we used Principal Axis Factoring with Oblimin rotation in SPSS to obtain factor loadings. Second, we used the Minimum Average Partial test (MAP; Velicer, 1976). Third, we employed Parallel Analysis (PA; Hayton, Allen, & Scarpello, 2004; Horn, 1965). The MAP aims to minimize the average partial correlations between components. On the other hand, PA extracts eigenvalues based on random correlation matrices which are parallel to the empirical data. They are then compared for significant differences. O’Connor (2000) provides a syntax for MAP and PA.
Then, we conducted the CFA using R and the packages lavaan and semTools (Rosseel, 2012; semTools Contributors, 2016). Since the HSCL offers only a 4-point response format, we treated the items as ordinal data. Consequently, we used robust diagonally weighted least squares estimation (Li, 2016). To evaluate goodness of fit, we utilized popular fit indices with commonly recommended cutoff criteria for good fit (Hu & Bentler, 1998, 1999; Schermelleh-Engel, Moosbrugger, & Müller, 2003): The χ2-test which should ideally not be significant; χ2 divided by degrees of freedom (χ2/df), which should be smaller than 3; the comparative fit index (CFI) and the Tucker–Lewis index (TLI), which should both be larger than .95 to show good, .90 to show acceptable fit, the root mean square error of approximation (RMSEA) and its 90% confidence interval and the standardized root means square residual (SRMR), which should both be smaller than .05 for a good or .08 for an acceptable fit.
Additionally, we examined the questionnaire’s invariance across sex and age groups by comparing increasingly constrained models (Cheung & Rensvold, 2002; Milfont & Fischer, 2010). First, we constrained factor loadings to be equal to establish weak (or metric) invariance. Second, we additionally constrained intercepts to be equal in order to test for strong (or scalar) invariance. Third, we tested strict invariance by comparing the scalar model with a model that also constrains residuals to be equal across tested groups. As recommended by Milfont and Fischer (2010), we evaluated model comparisons using the χ2 test as well as differences in CFI and gamma hat (GH; Steiger, 1989). χ2 should ideally not be significant, and CFI and GH should not decline more than .01 between models. In cases where full invariance was not given, we tested for partial invariance, by successively releasing constraints for individual indicators.
As per recommendations from Trizano-Hermosilla and Alvarado (2016), we report McDonald’s (1999) ω as a measure of internal consistency. We differentiate between three types of the coefficient: ωtotal, which is comparable to the traditional α as a global measure of internal consistency, ωhierachical, which indicates the share of variance attributable to a general factor, and ωsubscale, which indicates the proportion of variance traceable to the specific subscale (Rodriguez, Reise, & Haviland, 2016). Additionally, we calculated analyses of variance, comparing both HSCL versions in the sociodemographic groups of the sample.
Results
Descriptive Statistics
We report item and scale descriptive statistics in Table 2. The analysis of skewness and kurtosis suggests nonnormal distributions for the majority of the HSCL’s items, when considering the cutoff values (2 for skewness and 4 for excess kurtosis) provided by West, Finch, and Curran (1995). For both, the HSCL-5 and HSCL-10, all three methods of EFA revealed evidence for a single factor (see Table 3). A majority of the variance (or close to it in the case of the HSCL-10) is traceable to the first latent factor. The lowest partial correlations were and eigenvalues exceeding the randomly generated ones were found for the unifactorial solutions. Factor loadings of all items were in excess of .500. The corrected item-total correlations were larger than the strict .500 cutoff for most items (Hair, Black, Babin, & Anderson, 2010).
Descriptive Statistics of the HSCL-5 and HSCL-10 Items and Scales.
Note. HSCL = Hopkins Symptom Checklist; γ1 = skewness; γ2 = excessive kurtosis; rit = corrected item-total correlation with respective subscale; F1 = factor loadings in the exploratory factor analysis.
Results for Minimum Average Partial Test and Parallel Analysis for the HSCL-5 and HSCL-10.
Note. HSCL = Hopkins Symptom Checklist; MAP = Minimum Average Partial test; PA = Parallel Analysis; CI = confidence interval. The lowest average partial correlation and the smallest raw data eigenvalue which is still larger than the upper limit of the 95% CI represent the preferred number of factors.
CFA and Reliability Analysis
Based on the findings from the EFA, we first tested a one-factor solution for both scales. Next, we investigated if a two-factor or a bifactor would be more adequate in representing the data. The results of the CFA can be found in Table 4. All models had significant χ2 tests, which is to be expected given the large sample size (Bentler & Bonett, 1980). Apart from that, CFI, RMSEA, and SRMR were acceptable—even good—for all models, while TLI indicated slightly worse, but still acceptable fit. Specifically for the HSCL-5, we found that both the one- and two-factor models are viable solutions and roughly equivalent in fit. Additionally, we constructed a bifactor model—which for the HSCL-25 evinced the best fit (Glaesmer et al., 2014). For the HSCL-5, however, the bifactor model is just-identified and thus not informative, in terms of model fit. But even so, it did not converge. For the 10-item HSCL, on the other hand, the two-factor model—and even more so the bifactor model—exhibited markedly improved fit over the one-factor solution. As a more parsimonious alternative to the bifactor model, we tested a one-factor model that allowed for the errors of Items 1 and 2 to correlate. When comparing the phrasing of all anxiety items, it becomes clear that Items 1 and 2 additionally address aspects of affect, which are not included in the remaining items. This model evinced acceptable fit across all indices, which speaks to the validity of a unifactorial solution. Thus, overall model fit showed a substantial improvement over the 25-item HSCL. We report the factor loadings from all models in Table 5. All loadings were significant, except for some subscale items in the bifactor model. Internal consistency was satisfactory for the one-factor and two-factor models (see Table 6). Merely, the anxiety subscale exhibited a mediocre coefficient. The bifactor model had very good internal consistency for the total score and the depression subscale but also showed moderate reliability for the anxiety subscale. Additionally, the hierarchical and subscale coefficients reveal that more than two thirds of variance can be attributed to a general factor, as opposed to the specific factors, for both subscales, depression (69.6%) and anxiety (70.3%).
CFA Results.
Note. CFA = confirmatory factor analysis; HSCL = Hopkins Symptom Checklist; df = degrees of freedom; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root mean square error of approximation; CI = confidence interval; SRMR = standardized root means square residual. The bifactor model did not converge for the HSCL-5.
This model allowed for the correlation of the error terms of Items 1 and 2.
Standardized Factor Loadings From the Confirmatory Factor Analyses of All Tested Models.
Note. HSCL = Hopkins Symptom Checklist; Anx = anxiety factor; Dep = depression factor; G = general factor. The bifactor model did not converge for the HSCL-5.
Factor loading was not significant (at p < .05).
Reliability Coefficients of HSCL-5 and HSCL-10.
Note. HSCL = Hopkins Symptom Checklist; ωh = hierarchical omega; ωs = subscale omega.
Measurement Invariance
Next, we tested measurement invariance of the two-factor model for both, the HSCL-5 and HSCL-10. To obtain comparable group sizes for the multigroup analysis across age groups, we combined the groups of 14- to 19-year-olds and 20- to 29-year-olds (see the appendix for more information). Tables 7 and 8 presents the results of these analyses. We found evidence for strict invariance across sex groups, but only partial strict invariance across age groups. More precisely, the HSCL-5 can be reasonably expected to be strictly invariant across age groups, while we had to release multiple constraints for the HSCL-10 to exhibit acceptable fit in the more complex models.
Tests of Measurement Invariance of the HSCL-5.
Note. HSCL = Hopkins Symptom Checklist; df = degrees of freedom; CFI = comparative fit index; GH = gamma hat.
The residual of Item 4 was freed to vary between groups. bThe intercepts of Items 4 and 9 were freed to vary between groups. cThe residuals of Items 2 and 9 were freed to vary between groups.
Tests of Measurement Invariance of the HSCL-10.
Note. HSCL = Hopkins Symptom Checklist; df = degrees of freedom; CFI = comparative fit index; GH = gamma hat.
The intercepts of Items 1, 3, 7, and 9 were freed to vary between groups. bThe residuals of Items 1, 3, and 7 were freed to vary between groups.
Convergent Validity.
We correlated the HSCL-5 and HSCL-10 with the full HSCL-25 and other related measures to demonstrate convergent validity (see Table 9). We found very high correlations—in excess of .80 and .90—with the 25-item version, as well as high correlations with the respective subscales of the PHQ-4 and the BSI-18. Additionally, all HSCL versions and subscales had moderately high associations with sleep disturbances as measured by the Jenkins Sleep Scale–4.
Correlation Matrix of Different HSCL-5, HSCL-10, and HSCL-25 With Other Distress Measures.
Note. HSCL = Hopkins Symptom Checklist; PHQ-4 = Patient Health Questionnaire–4; BSI-18 = Brief Symptom Inventory–18; JSS-4 = Jenkins Sleep Scale–4: GSI = global severity index. All correlations are significant at the p < .001 level.
Sociodemographic Influences
We tested for differences in the HSCL-5 and HSCL-10’s subscales and total with regard to sociodemographic variables (see Table 1). All comparisons were significant, which is not surprising given the sample size. Sex had only a very small effect in explaining differences in anxiety, depression, and the total score. Age, education, and employment status, on the other hand, exhibited slightly larger effect sizes. The largest effect, however, was traceable to groups of household income, which explained close to 10% of the HSCL-10’s variance.
Norm Values
In Tables 10 to 13, we report percentile ranks partitioned by sex and by age groups.
Percentile Ranks for the HSCL-5 Scales (Female).
Note. HSCL = Hopkins Symptom Checklist; Anx = anxiety factor; Dep = depression factor.
Percentile Ranks for the HSCL-10 Scales (Female).
Note. HSCL = Hopkins Symptom Checklist; Anx = anxiety factor; Dep = depression factor.
Percentile Ranks for the HSCL-5 Scales (Male).
Note. HSCL = Hopkins Symptom Checklist; Anx = anxiety factor; Dep = depression factor.
Percentile Ranks for the HSCL-10 Scales (Male).
Note. HSCL = Hopkins Symptom Checklist; Anx = anxiety factor; Dep = depression factor.
Discussion
The aim of the present study was to test the merits of two short versions of the HSCL-25: the HSCL-5 and HSCL-10. These two instruments would allow for a more economical assessment of mental health in a variety of contexts. We found good psychometric properties for both versions. The model fit was acceptable, even good. This represents a marked improvement over the 25-item HSCL, which exhibited ambiguous fit (see Glaesmer et al., 2014). Reliability of both scales was comparable to the original, except for the anxiety subscale, which had mediocre internal consistency. Considering that both instruments use just two and four items to measure anxiety, this relatively low reliability comes as no surprise. The hierarchical and subscale ω coefficients for the bifactor model further indicate that a majority of the HSCL-10’s variance can be traced back to a general factor, further justifying the construction of a total score.
The present study found evidence for strict measurement invariance across sex groups and partial strict invariance across age groups. This is an important and novel finding as this level of invariance was not previously shown for the HSCL. For the German HSCL specifically, measurement invariance has not previously been established, which makes this a particularly important finding as meaningful comparisons between groups are not possible without invariance.
The very high correlations with the 25-item version with the shortened HSCL’s point out its capability of adequately capturing anxiety- and depression-related symptoms, even with less than half of the original scale’s items. Correlations between the anxiety and depression subscales were relatively high, albeit slightly lower than for the HSCL-25. Most crucial for evidence of the validity of both HSCL short forms are the high associations of the analogous subscales (short and long HSCL versions) with the PHQ-4 and the BSI-18 demonstrating the measures convergent validity. Finally, we found a moderately high correlation with sleep difficulties, as has been shown previously (Breslau, Roth, Rosenthal, & Andreski, 1996).
As has been previously found by Glaesmer et al. (2014), the HSCL struggles to differentiate between anxiety and depression on a level sufficient for a clinical diagnosis. Regardless of used measures, however, previous research provided sound evidence emphasizing the high comorbidity between the anxiety and depression facets making it almost impossible to differentiate both constructs. Specifically, negative affect has been shown to be the source of shared variance, while bodily hyperarousal is assumed to be specific to anxiety and anhedonia to depression (Renner, Hock, Bergner-Köther, & Laux, 2018). This substantial interplay is also represented in the bifactor model, which exhibits inconsistent factor loadings on the specific factors in addition to explaining a majority of variance by means of the general factor. The adapted one factor model—allowing for a correlation between Items 1 and 2—provides further evidence that the HSCL—at least when the short versions are concerned—actually measures a single characteristic.
In sum, our results indicate that both short versions of the HSCL should mainly be used as brief symptom assessment measures and do not replace an in-depth clinical assessment of anxiety or depression, respectively. The five-item version is recommended as a screening instrument (e.g., in large-scale surveys), and the 10-item questionnaire for a more reliable but still brief assessment of symptoms of depression and anxiety (e.g., experimental setting). As displayed in the norm value tables, younger female individuals (<30 years) reach the maximum score range in HSCL-5 (20 of 20) and almost in HSCL-10 (38 of 40). Lower maximum range scores were reported for the corresponding male population and for all other older female and male age groups. In general, female individuals need to report higher HSCL-scores (for HSCL-5: ≥11, for HSCL-10: ≥21) to be classified to the upper 5% of the distribution compared with male individuals (for HSCL-5: ≥6, for HSCL-10: ≥18).
Comparisons between sociodemographic groups evinced a small to moderate influence of group membership on symptom severity. In particular, employment status and monthly income appeared to be significant predictors of psychological distress. This fits well with previous research demonstrating a link between socioeconomic status and mental health (Barrett & Turner, 2005; Williams, Yu, Jackson, & Anderson, 1997).
Limitations
The present study used a survey that employed the HSCL-25 to test two short versions. A cross-validation using only the respective final 5 and 10 items should complement the present findings.
Additionally, an investigation of the HSCL-5 and HSCL-10 in children, adolescents and young adults is still pending. The analyses of the present study focused on a representative sample of the German adult population, and only a small number of respondents were between the ages of 14 and 19 years (n = 137). Future research should examine the suitability of the HSCL as a psychometric instrument for younger populations.
Conclusion
The HSCL-5 and HSCL-10 are reliable and valid short forms of the HSCL-25. Both should be preferred for the economical assessment of symptoms of depression and anxiety. We recommend the 5-item version as a screening instrument, and the 10-item questionnaire for a more reliable but still brief assessment of symptoms of depression and anxiety.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
