Abstract

Andrew Page, University of Western Australia, Geoff Hooke, Perth Clinic, Elizabeth Rutherford, University of Western Australia, Australia:
The following response by Andrew Page, Geoff Hooke and Elizabeth Rutherford to Simon Stafrace (‘Doubts about HoNOS’ 32:270) should have accompanied his letter in the same issue.
We appreciate Dr Stafrace's comments on our article [1] and that we have been given the opportunity to respond, as the letter raises some important issues.
First, Dr Stafrace infers from the low response rate that outcome measures acceptable to clinicians may not be feasible for routine use. We agree that acceptability and feasibility are not necessarily related, but would argue that the response rate is ultimately a management issue. Psychiatric facilities must measure and be able to report outcomes and therefore low response rates are a challenge for management, not an excuse to conclude that measuring outcomes is infeasible. For instance, at the Perth Clinic, our data highlighted to management the low response rates and indicated that the provision of training was insufficient to ensure high response rates. Therefore, hospital procedures were reviewed and are in the processes of ongoing modification to increase compliance. Interestingly, compliance with the HoNOS has increased dramatically to nearly 90% of all patients at admission and discharge (apparently because this is a staff-administered measure), but compliance with the SF-36 has been increased to nearly 70% of all patients (since this relies on patients to complete questionnaires), possibly reflecting the degree to which management practices impact more immediately and strongly upon staff than patients. Thus, we would argue that the HoNOS and SF-36 are acceptable and feasible, provided the hospital structure supports the implementation of such measures.
Second, Dr Stafrace states that we cannot conclude that the tests are reliable and valid due to the absence of test–retest and inter-rater reliability. While it is true that we did not report these data and we cannot make an absolute claim about the reliability and validity of the instruments, it would be false to assume that our data do not speak favourably to these issues. Our measure of reliability (i.e. Cronbach's alpha) is an index of internal consistency, demonstrating that the items in the tests are measuring the same construct. Likewise, comparisons with other instruments are not sufficient to demonstrate construct validity, but we reiterate our claim (p. 379) in the article that these correlations provide partial support of construct validity. Together with other published data, our data converge with their conclusions to provide some support for these instruments.
Third, Dr Stafrace comments that we were unable to conclude that the sample was comparable to other hospitals because we did not provide a statistical test. While it is not usual practice in benchmarking to use statistical tests, one way to achieve this end is to place a 95% confidence interval around means. Doing so would support the conclusion that the present sample were no different at admission to the six private psychiatric hospitals [2] or the private psychiatric hospital reported by Boot et al. [3] but less severe than the public psychiatric patients; however, the patients at discharge were significantly more healthy than those at any of the comparison hospitals. We chose not to report such statistical comparisons, because we were concerned that due to the large sample sizes statistically significant differences may not accurately reflect outcomes that are more similar than different in terms of their clinical significance. Such conclusions are better based on studies (e.g. [3]) in which data from a number of sites are collected simultaneously by a common research team under comparable conditions.
Finally, Dr Stafrace suggests that the fewer significant differences for the SF-36 relative to the HoNOS cannot be taken as evidence for a lack of sensitivity without examining significant change with other measures. Two points need to be made here. First, although we did report p-values from a repeated measures ANOVA, we also reported effect size measures. The reason for reporting effect sizes is that they are not dependent upon sample size and therefore provide a better estimate of the sensitivity of the measures. An examination of these indices clearly shows the HoNOS to be more sensitive to change than the SF-36. Second, we did report changes with five other measures (reported in Table 1) and the effect sizes of the symptom measures are comparable with the HoNOS and higher than with the SF-36. It is possible that the SF-36 measures something stable that is different to all these other measures, but if that is so, it is not going to be a good measure of psychiatric outcome. Nevertheless, we must reiterate that the SF-36 used in the study was a 4-week version and greater effect sizes may be evident when a shorter window is used.
