Abstract
In the Diagnostic and Statistical Manual of Mental Disorders (4th ed., DSM-IV), attention-deficit/hyperactivity disorder (ADHD) is categorized as a disorder usually first diagnosed in childhood or adolescence (American Psychiatric Association, 2000). However, recent research indicates that symptoms of ADHD, such as inattention and impulsivity, often persist into adulthood (Biederman, Monuteaux et al., 2006; Faraone & Biederman, 2005; Kessler, Adler, Barkley et al., 2005; Mannuzza, Klein, & Moulton, 2003). With growing awareness of adult ADHD among clinicians and researchers, more pharmacological and psychosocial treatments are being tested in adults with this condition (Davidson, 2008; Kolar et al., 2008; Safren, Sprich, Chulvick, & Otto, 2004; Simpson & Plosker, 2004; Spencer, Biederman, & Wilens, 2004a, 2004b; Wilens, 2008). Consequently, there is an increasing need for reliable and valid patient-reported outcome (PRO) measures assessing symptoms and effects of adult ADHD.
Therefore, the purpose of this study was to examine the test–retest reliability of two previously developed ADHD-specific measures. Reliability is the degree to which an instrument is free of measurement error (Scientific Advisory Committee of the Medical Outcomes Trust, 2002). Test–retest reliability is the extent to which scores on a given instrument remain stable over time (Leidy, Revicki, & Geneste, 1999). Assessment of test–retest reliability is considered to be an important step in PRO validation because it provides an indication of the instrument’s ability to consistently measure the construct of interest without excessive measurement error (Guyatt, Feeny, & Patrick, 1993; Leidy et al., 1999; Patrick et al., 2007; Scientific Advisory Committee of the Medical Outcomes Trust, 2002). The recently issued Food and Drug Administration Draft PRO Guidance has highlighted the importance of thoroughly validating PROs, including documentation of an instrument’s stability over time (Food & Drug Administration, 2006; Patrick et al., 2007).
The first ADHD-specific measure to be examined in this study is the Adult ADHD Quality-of-Life Measure (AAQoL), which was designed to assess the effect of adult ADHD on health-related quality of life (HRQL). The items were generated on the basis of clinical expert input, patient reports, and published literature indicating that adult ADHD has an effect on multiple domains of HRQL, including work, daily activities, relationships, psychological well-being, and physical well-being (Biederman, Faraone et al., 2006; Brod, Perwien, Adler, Spencer, & Johnston, 2005; Eakin et al., 2004; Friedman et al., 2003; Kessler, Adler, Ames et al., 2005). The AAQoL has demonstrated good factor structure, internal consistency reliability, construct validity, discriminant validity, and responsiveness to change (Brod, Johnston, Able, & Swindle, 2006; Matza, Johnston, Faries, Malley, & Brod, 2007). Test–retest reliability of the AAQoL has not previously been assessed.
The second measure was the World Health Organization Adult ADHD Self-Report Scale (ASRS) Screener, which was designed to screen for ADHD in the general population (Kessler et al., 2007). In the initial validation of this screener, test–retest reliability was examined with Pearson correlations comparing assessments at three time points over a four-month period (Kessler et al., 2007). However, it is generally recommended that test–retest reliability be examined over a shorter time window using intraclass correlations (ICCs; Leidy et al., 1999). Thus, test–retest reliability of the ASRS Screener and AAQoL was assessed in the current study by comparing scores over a two-week period with ICCs.
Method
Study Design
A sample of adult patients with ADHD was recruited at a U.S. clinic specializing in adult ADHD. Participants attended two study visits, 14 (±two days) apart. To assess test–retest reliability, respondents’ symptoms and treatment must remain stable between questionnaire administrations in order to ensure that score changes primarily represent measurement error rather than true change in patients’ condition (Leidy et al., 1999). Therefore, a measure of change was administered at the second study visit to assess symptom stability between the two visits, and test–retest reliability was assessed in the stable subgroup.
Participants were required to meet the DSM-IV (American Psychiatric Association, 2000) criteria for ADHD as confirmed by clinical chart review—that they be at least 18 years old, they can read and understand English, and they can travel to the clinic for two scheduled visits. Potential participants were ineligible if they had a change in treatment for ADHD or another psychiatric disorder in the two weeks before study enrollment. Participants were also ineligible if they expected a change in treatment during the two weeks following their baseline visit.
Measures
AAQoL
The AAQoL was designed to assess HRQL during the past two weeks among adults with ADHD (Brod et al., 2005, 2006). Each of the 29 AAQoL items is rated by patients on a five-point Likert-type scale ranging from “Not at all/Never” (1) to “Extremely/Very Often” (5). The AAQoL yields a total score (based on all items) and four subscale scores: life productivity (11 items, including getting things done on time, completing projects or tasks, remembering important things, and balancing multiple projects), psychological health (six items, including feeling anxious, overwhelmed, and fatigued), life outlook (seven items, including perceptions that energy is well-spent, people enjoy spending time with you, and you can successfully manage your life), and relationships (five items, including tension, annoyance, and frustration in relationships). The complete list of items has been published previously (Brod et al., 2006).
Total and subscale scores are computed by (a) reversing scores for all items except the seven items in the life outlook subscale; (b) transforming all item scores to a 0-100-point scale (1 = 0, 2 = 25, 3 = 50, 4 = 75, 5 = 100), with higher scores indicating better QoL; (c) summing item scores; and (d) dividing by the item count to generate subscale and total scores. The total score is computed with up to three missing items, and each subscale score is computed with up to one missing item.
ASRS Screener
The original ASRS is a patient-reported questionnaire with 18 items assessing the frequency of all 18 DSM-IV symptoms of ADHD, with language modified to reflect the adult presentation of ADHD symptoms (Adler et al., 2006). The six-item ASRS Screener was derived from the original version based on examination of prediction accuracy and psychometric characteristics of various six-item subsets (Kessler et al., 2007). The screener was found to have strong concordance with clinician diagnoses and the ability to discriminate between DSM-IV cases and noncases. Two scoring algorithms were proposed (Kessler et al., 2007), and the current study used the scoring algorithm that takes advantage of the full range of the response options. Each of the six items is rated on a 0 to four scale (never, rarely, sometimes, often, and very often), and all responses were summed, yielding a summary score with a theoretical range of 0 to 24.
Overall Treatment Effect Scale
The stability of participants’ ADHD symptoms between Visit 1 and Visit 2 was assessed using the Overall Treatment Effect Scale (OTE) at Visit 2 (Jaeschke, Singer, & Guyatt, 1989). The first question asks patients to indicate whether their symptoms have improved, remained the same, or worsened since Visit 1. If patients indicate that symptoms have improved, they rate the degree of improvement on a seven-point scale from “(1) Almost the same, hardly better at all” to “(7) A very great deal better.” If symptoms have worsened, they rate the degree of worsening on a seven-point scale from “(−1) Almost the same, hardly worse at all” to “(−7) A very great deal worse.” A score of 0 indicates no change in ADHD symptoms. Patients with OTE scores of −1, 0, or 1 were categorized as stable between Visit 1 and Visit 2. Patients with OTE scores less than −1 or greater than 1 were considered to be unstable.
Statistical Analysis
Descriptive statistics were conducted using data from the total enrolled sample, stable participants, unstable participants, and the participants who did not attend a second visit. The test–retest analyses were conducted only with the stable subgroup. To assess test–retest reliability of the AAQoL and the ASRS, intraclass correlation coefficients (ICC) and Spearman’s correlations were conducted to evaluate the degree of association between scores at Visit 1 and Visit 2. Calculation of the ICC assumed a fixed-effects ANOVA model (Deyo, Diehr, & Patrick, 1991). ICCs above approximately .60 in stable patients over a two-week interval are generally considered acceptable (Leidy et al., 1999). In addition, paired t tests were used to evaluate whether there were statistically significant score changes between Visit 1 and Visit 2. All tests of statistical significance were two tailed and conducted with an alpha level of .05.
Results
Sample Description
A total of 74 participants were enrolled in the study (Table 1). Of the 74 enrolled participants, 65 (88% of enrolled participants) returned for the second visit, and 9 (12%) failed to attend the second visit. Of the 65 participants with two visits, 22 were excluded from the test–retest analysis because they reported that their symptoms were not stable between the two visits (14 improved and eight became worse). Thus, there was a stable sample of 43 participants for inclusion in the test–retest analyses.
Patient-Reported Demographic Characteristics
N = 74.
N = 43.
N = 22.
N = 9.
Patients with Overall Treatment Effect Scale (OTE) scores of −1, 0, or 1 were categorized as stable between Visit 1 and Visit 2. Patients with OTE scores less than −1 or greater than 1 were considered to be unstable.
The stable sample consisted of 65.1% men and was 90.7% White, with a mean age of 39.3 years. About half (51.2%) of the stable-sample participants were married, and the majority of them (58.1%) were employed full-time. Compared with the stable sample, the unstable participants were more likely to be a woman, less likely to be married, and more likely to be living alone. Comorbid depression was reported by 47.3% of participants, whereas 29.7% of the participants reported having “other mental health conditions.”
On the basis of chart review, the total sample was diagnosed an average of 8.5 years before study enrollment (Table 2). This duration of ADHD was similar among the stable and unstable participants. However, the participants who did not attend Visit 2 were diagnosed a mean of 15.3 years before study enrollment. About two thirds (63.5%) of the participants were diagnosed with ADHD of the predominantly inattentive type, although the majority (55.6%) of them who did not attend Visit 2 were diagnosed with ADHD of the combined type (i.e., inattentive and hyperactive/impulsive). The most common comorbid psychiatric diagnosis in patients’ charts was depression (36.5% of the total sample), followed by anxiety disorders (9.5%), and bipolar disorder (8.1%). Almost all participants (95.9% of the total sample) were currently receiving medication for the treatment of ADHD, whereas about half (51.4%) of them were also receiving medication for other psychiatric conditions.
Clinical Characteristics Based on Chart Review
See notes to Table 1.
Test–Retest Reliability of the AAQoL
Mean AAQoL subscale scores for the stable sample ranged from 63.0 to 67.0 at Visit 1 and 66.0 to 68.0 at Visit 2, with changes in the subscale scores ranging from 0.9 to 3.2 (Table 3). The mean AAQoL total score was 65.0 at Visit 1 and 66.5 at Visit 2. The t tests found no statistically significant differences between Visit 1 and Visit 2 scores. The ICCs for the AAQoL subscales were .88 (life productivity), .75 (psychological health), .74 (life outlook), .78 (relationships), and .86 (total score; Table 4). All five of these ICC coefficients suggest strong agreement between Visit 1 and Visit 2 AAQoL scores. Spearman correlations were in a similarly high range, and all were statistically significant (p < .0001).
Comparison of AAQoL and ASRS Scores at Visit 1 and Visit 2 a
Note: AAQoL = Adult ADHD Quality of Life Measure; ASRS = Adult ADHD Self-Report Screener.
Data in this table are based on the stable subgroup as determined by the Overall Treatment Effect Scale (OTE) at Visit 2 (n = 43).
Correlations Between AAQoL and ASRS Scores at Visit 1 and Visit 2 a
Note: AAQoL = Adult ADHD Quality of Life Measure; ASRS = Adult ADHD Self-Report Screener.
Data in this table are based on the stable subgroup as determined by the Overall Treatment Effect Scale (OTE) at Visit 2 (n = 43).
Test–Retest Reliability of the ASRS Screener
The mean ASRS Screener score for the stable sample was 11.5 at Visit 1 and 11.0 at Visit 2, and this decrease of 0.5 points was not found to be statistically significant (Table 3). The ICC for the ASRS was .86, and the Spearman correlation was .87 (p < .0001; Table 3). Both correlation coefficients suggest strong agreement between Visit 1 and Visit 2 scores.
Discussion
The AAQoL and ASRS Screener demonstrated good test–retest reliability, with ICCs exceeding generally accepted standards for association between scores at the two administrations (Leidy et al., 1999). Mean differences between AAQoL scores at Visit 1 and Visit 2 ranged from 0.9 to 3.2, which are unlikely to represent clinically meaningful change on these 100-point scales. ASRS Screener change was similarly small, with a nonsignificant difference of 0.5 on a 24-point scale.
Overall, findings add to previous results supporting the psychometric properties of these two instruments in samples of adults with ADHD (Brod et al., 2006; Kessler et al., 2007; Matza et al., 2007). The AAQoL has now met many of the standards for PRO instrument development as described in the final PRO Guidance, recently issued by the Food and Drug Administration (FDA, 2009). After items were generated on the basis of patient perceptions, clinical expert input, and published literature, empirical research has demonstrated internal consistency reliability, validity, sensitivity to change, and in the current study, test–retest reliability (Brod et al., 2005, 2006; Matza et al., 2007). Next steps in the analysis of this instrument may be to identify interpretation guidelines, including the minimally important difference, a responder definition for use in clinical trials, and score cutoffs for ranges of HRQL impairment.
Generalizability of the results may be limited by the small sample as well as its other characteristics. For example, 25.6% of the stable-group participants completed a postgraduate degree, suggesting that this particular sample is highly educated. The degree to which results would generalize to less educated patients is not known. Furthermore, data on psychiatric comorbidities were gathered through chart review rather than through diagnostic assessment. Some patients may have had comorbid psychiatric conditions not captured in charts. Therefore, it is not possible to provide a comprehensive listing of all psychiatric comorbidities.
The level of ADHD symptom severity could also limit generalizability of the findings. The stable sample’s mean ASRS Screener score is considered to be in the “low negative range,” which is indicative of relatively mild ADHD symptoms (Kessler et al., 2007). Furthermore, the mean AAQoL scores were similar to those of a clinical trial sample following eight weeks of treatment (Matza et al., 2007). Taken together, the ASRS and AAQoL scores suggest that current results demonstrate test–retest reliability in a sample of patients with relatively mild ADHD symptoms. It is likely that symptoms were mild in this group because almost all participants were receiving treatment at the time data were collected. It is also possible that more severe ADHD symptoms could have decreased the likelihood that patients would attend the second visit, thus excluding more severe patients from test–retest analyses. Furthermore, more severe ADHD symptoms may be less likely to remain stable between the two study visits. If so, patients with more severe symptoms would be less likely to meet criteria for inclusion in the stable sample. Therefore, the extent to which current results are generalizable across a broad spectrum of patients with ADHD is not known. Future research conducted with samples of recently diagnosed yet untreated patients could assess psychometric characteristics of the AAQoL and ASRS Screener in patients with more severe ADHD symptoms.
Despite limitations, findings provide additional support for these potentially useful PRO measures. The ASRS Screener may be used as a brief index of ADHD symptoms to identify patients for whom further assessment is warranted. The AAQoL may be useful in both clinical practice and research. Clinicians could use this instrument to monitor the effect of treatment on various domains of HRQL. In addition, the AAQoL could be added to symptom measures in clinical trials of treatment for adult ADHD to provide a broader picture of treatment outcomes.
Footnotes
Acknowledgements
The authors thank Julie Meilak and Aria Gray for production assistance and Ray Hsieh for statistical programming. This study was funded by Eli Lilly and Company.
Funding for this study was provided by Eli Lilly and Company.
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
