Abstract
Background:
Preference-based measures of health-related quality of life (HRQL) are used as primary or secondary endpoints in multiple sclerosis (MS) research.
Objective:
The purpose of this paper was to evaluate the structural, convergent, and known-groups validity of the preference-based multiple sclerosis index (PBMSI) of HRQL in people with MS.
Methods:
Participants were recruited from three MS clinics in Montreal. Structural validity was assessed using polychoric correlation coefficients and factor analysis. To assess convergent validity, hypotheses were formulated about the strength of correlations between the PBMSI and other HRQL measures. Known-groups validity was assessed against different measures of disability.
Results:
The average age of the sample was 46 and 77% were women. Factor analysis supported the structural validity of the PBMSI; the items collectively were measuring one underlying construct. The PBMSI showed convergent validity against generic measures of HRQL, and known-groups validity between persons with different levels of disability.
Conclusion:
The results of this study support the construct validity of the PBMSI as an outcome measure of HRQL in MS. The PBMSI overcomes limitations observed with currently used HRQL measures in MS and may be used to contrast different interventions for people with MS.
Introduction
Health-related quality of life (HRQL) refers to the health aspects of quality of life, reflecting the impact of disease and treatment on disability and daily functioning. 1 This important construct is often used as a primary or secondary endpoint in multiple sclerosis (MS) research to evaluate the effectiveness of existing and new therapies from the patient’s perspective. 2
An established approach to measuring HRQL is through the use of preference-based measures 3 such as the EuroQol-5 dimensions (EQ-5D), 4 the Health Utilities Index 2 and 3 (HUI 2 and 3),5,6 and the Short Form-6 dimension (SF-6D). 7 A disease-specific preference-based measure for people with MS was recently developed called the preference-based multiple sclerosis index (PBMSI). 8 The domains for the PBMSI were established based on semi-structured interviews with a random sample of 185 people with MS recruited from three different MS clinics. 9 Individual items best reflecting each domain of quality of life were identified using Rasch analysis. 10 As per the US Food and Drug Administration (FDA) guidelines, the items then underwent qualitative reviewing using both expert (n = 24) and patient feedback (n = 22). 11 Patient preferences were then elicited for the PBMSI items using the rating scale, and a scoring algorithm was developed. 8
The PBMSI comprises five items that patients with MS identified to be most important to their quality of life: walking, fatigue, concentration, mood, and roles and responsibilities. Each item includes three response levels, producing 243 (35) combination of responses. The PBMSI items have been previously published. 11
The next step is to evaluate the construct validity of the PBMSI in the population for which it was developed for (i.e. people with MS). Construct validity refers to the extent to which scores of a measure are consistent with hypotheses formulated regarding internal relationships (structural validity), relationships with other measures (convergent validity), and differences between groups (known-groups validity). 12
Therefore, the objective of this paper was to evaluate the structural, convergent, and known-groups validity of the PBMSI in people with MS.
Methods
Setting and procedure
The sample under study were people with MS participating in a randomized clinical trial of exercise. The protocol for this study has been published. 13 Participants were recruited from three MS clinics in the Montreal area and were aged 19–65, diagnosed after 1994, ambulatory, and able to speak and read English or French. Participants were excluded if they had an additional illness that restricted their function, had suffered at least one relapse during the past 30 days, or were unable to respond to simple questions on orientation and memory.
The study was approved by the hospital’s ethics committee and written informed consent was obtained from participants prior to study commencement.
Measures
Measures of HRQL
PBMSI
The PBMSI is a brief patient-reported outcome measure of HRQL. It consists of five items: walking, fatigue, mood, concentration, and roles and responsibilities. 11 Each item has three response options, and the recall time frame is ‘over the past week’. The PBMSI scoring algorithm was developed using a multiplicative multi-attribute utility function. 14 The algorithm provides a score of HRQL from 0 (dead) to 1 (perfect health).
RAND-36
The RAND-36 Item Health Survey (RAND-36) is a generic health profile that consists of two summary scales: a physical component summary (PCS) and a mental component summary (MCS). The RAND-36 was included as a comparison measure because a published systematic review revealed that the RAND-36 was the most widely used generic health profile in MS. 2 Furthermore, it has demonstrated good internal consistency, convergent and discriminate validity in MS.15,16
SF-6D
The SF-6D is a generic preference-based measure derived from the SF-36 Health Survey (or RAND-36). 7 The SF-6D has six domains: physical functioning, role limitation, social functioning, pain, mental health, and vitality. The SF-6D has demonstrated moderate to strong correlations against other measures of participation and HRQL in MS 17 and is part of the set of preference-based measures recommended by the Canadian Agency for Drugs and Technologies in Health (CADTH).
EQ-5D
The EQ-5D is a generic preference-based measure of HRQL that consists of five items or domains: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression.4,18–20 Each domain has three levels: no problems, some problems, and extreme problems. The EQ-5D has demonstrated small to moderate correlations against other measures evaluating activity, participation, and HRQL in MS. 17 The EQ-5D is a recommended preference-based measure for economic evaluation by CADTH and the National Institute for Health and Care Excellence (NICE).
The patient-generated index
The patient-generated index (PGI) is an individualized measure of quality of life. 21 Participants were asked to identify up to five of the most important areas of their lives affected by MS. The PGI produces a score from 0 to 100. The PGI has demonstrated moderate correlations with the generic preference-based measures EQ-5D and SF-6D in MS. 17
Self-rated health
Self-rated health is as an individual’s rating of health based on his or her own perception, experience, and frame of reference. 1 Participants were asked to rate their health state today and over the past week using a visual analogue scale (VAS) from 0 to 100, where 0 is the worst imaginable health state and 100 is the best imaginable health state. These two scales are labeled as VAS-today and VAS-week, respectively.
MS symptom checklist
Subjects were provided with a symptom list and asked to identify if they had experienced any of the symptoms over the past week. Symptoms listed were as follows: loss of co-ordination, weakness in the lower extremities, unsteadiness or loss of balance, problems with bladder, muscle stiffness or spasms, pain, feeling frustrated, and problems with sleep.
Measures of disability
6-minute walk test
The 6-minute walk test (6-MWT) is a simple performance-based test that measures functional exercise capacity. Individuals are instructed to walk as far as possible in an empty corridor, in 6 minutes at their own intensity. The 6-MWT has demonstrated excellent intra- and inter-rater reliability in MS. 22
Patient-determined disease steps
The patient-determined disease steps (PDDS) is a self-reported outcome of disability in MS. 23 It has nine ordinal levels ranging between 0 (normal) and 8 (Bedridden) and PDDS scores can be converted into classifications of mild, moderate, or severe disability. The PDDS is a surrogate measure of the Expanded Disability Status Scale (EDSS) and has shown to be strongly correlated with the EDSS. 23
Peak power
Peak power was measured using an incremental graded cycle ergometer test. All persons started the test at a minimal power output of 10 W with a gradual increase of power output by 10 W per minute. The measure of performance was the highest power output individuals were able to complete. Peak power output was included as a disability measure as it has been shown to be directly associated with peak exercise capacity (VO2peak). 24 Individuals who are able to achieve higher work load have better lung function and greater breathing reserve. 25
Statistical analysis
Floor and ceiling effects
Floor and ceiling effects were calculated for the PBMSI, EQ-5D, SF-6D, and self-rated health. The percentage of respondents who had minimum and maximum scores on the measures was calculated. Values >15% were indicative of a floor or ceiling effect. 26
Structural validity
Preference-based measures developed using multi-attribute utility theory like the PBMSI should comprise items that are independent or semi-independent from each other. 14 Therefore, polychoric correlation coefficients were calculated between the items in the PBMSI to assess structural independence between the items. Polychoric correlations are an extension of tetrachoric correlations but for variables that have been categorized into three or more classes (i.e. PBMSI had three response levels per item). 1 We hypothesized that the correlation coefficient values between items would be low to moderate.
Structural validity was further assessed using exploratory common factor analysis. The Kaiser–Guttman rule, 27 which states that all factors having an eigenvalue greater than 1 should be retained, was used to identify the number of factors in the PBMSI. We hypothesized that one factor would be retained from the analysis representing the construct of HRQL.
Convergent validity
To demonstrate convergent validity, hypotheses were formulated about the strength of correlations between the PBMSI and other HRQL measures. A correlation ⩽0.30 was considered small, between 0.31 and 0.59 moderate, and ⩾0.60 as strong.28,29
We anticipated low to moderate correlations between the PBMSI and measures of disability (PDDS, 6-MWT, and peak power). We expected to observe moderate correlations between the PBMSI and the generic HRQL measures (RAND-36, EQ-5D, SF-6D, VAS-today, and VAS-week) as some items would be similar while others different (specific to MS). Furthermore, we expected to observe moderate correlations with the individualized measure (PGI), as the PGI measures the broader construct of quality of life.
Known-groups validity
Known-groups validity for the PBMSI was assessed against different measures of disability, namely, the PDDS, the 6-MWT, and peak power. For the PDDS, individuals with PDDS scores between 0 and 2 were classified as mild disability, scores 3 and 4 were classified as moderate disability, and scores of 5 and greater were severe disability. These cut-offs were based on individual’s level of impairment with walking as described by the PDDS descriptive system. Individuals in levels 1 and 2 had mild symptoms but no limitations in walking ability. Individuals in levels 3 and 4 had some limitations in walking and needed to use a cane occasionally, and individuals in levels 5 and 6 needed constant use of a cane for walking or a scooter for long distances. For the 6-MWT, individuals who were able to walk greater than 500 m30,31 were categorized as having high functional walking capacity, those who walked between 300 and 499 m were classified as moderate, and less than 300 as poor. As for peak power (in watts) measured using the cycle ergometer test, power output between 140 and 240 W was high, 60–139 W moderate, and less than 60 W poor. We hypothesized that individuals with higher levels of disability would have lower scores on the PBMSI than those with lower levels of disability. The EQ-5D and SF-6D’s ability to discriminate between different levels of disability were also assessed and compared with the PBMSI.
Known-groups validity was also assessed against the presence/absence of MS symptoms: specifically, loss of co-ordination, weakness in the lower extremities, unsteadiness or loss of balance, problems with bladder, muscle stiffness or spasms, pain, feeling frustrated, and problems with sleep. We hypothesized that PBMSI scores would be lower in individuals who reported experiencing a symptom than those who did not. For example, we expected people who experienced unsteadiness or loss of balance to have lower scores on the PBMSI than individuals who did not experience this symptom.
Statistically significant differences between known-groups were assessed using independent t-test for dichotomous variables and analysis of variance (ANOVA) for variables with more than two categories. Effect sizes (ESs) and 95% confidence intervals (CIs) 32 were calculated to determine the magnitude of difference between the different known groups. Cohen’s criteria 33 were used for interpreting magnitude of ES, where an ES of ~0.2 is small, ~0.5 is moderate, and ~0.8 is large. An ES was statistically significant if the CI excluded 0.
Results
Sample
Table 1 presents the demographic and clinical characteristics for women and men. The average age for women and men was similar at 46 and 47 years, respectively. The average number of years since diagnosis was 6.4 for women and 9.6 for men. As for the PDDS, approximately the same proportion of men and women (30%–34%) reported having minor MS symptoms. In total, 41% of women and 26% of men reported having limitations in daily activities or physically demanding activities.
Characteristics of study sample (n = 113).
SD: standard deviation.
Total sample size = 110 for Education, 52 for Expanded Disability Status Scale, 96 for patient-determined disease steps, and 102 for general health perception.
Structural validity
Table 2 presents the correlations between the items, which ranged from −0.05 to 0.60, thus supporting our a priori hypothesis that the items were low to moderately correlated with each other.
Correlation matrix of PBMSI items.
PBMSI: preference-based multiple sclerosis index.
The first factor had an eigenvalue greater than 1 (= 1.19) and was the only one retained. The remaining eigenvalues were 0.28, −0.11, −0.20, and −0.21.
Table 3 presents the mean scores on the PBMSI, EQ-5D, and SF-6D, which were 0.25, 0.81, and 0.68, respectively. The mean values on the VAS for health state today and over the past week were similar, with the former being 74 and the latter being 70 out of 100. The PBMSI demonstrated no floor effects. As for ceiling effects, there were 10 individuals (9% of the sample) who had a score of 1.0 on the PBMSI, but this value was less than the cut-off of 15%. The EQ-5D had no floor effects but did demonstrate ceiling effects as 19% of the sample had a score of 1.0. For the SF-6D, the full range of health (0–1) is not represented in this scale, as the theoretical range is between 0.3 and 1.0. However, within this range, there were no individuals (0%) who reported the lowest and highest scores possible.
Percentage of respondents with minimum (floor effect) and maximum (ceiling effect) scores on the PBMSI, EQ-5D, SF-6D, PGI, and VAS.
EQ-5D: EuroQol-5 dimensions; PBMSI: preference-based multiple sclerosis index; SF-6D: Short Form-6 dimension; VAS: visual analogue scale.
For VAS-week n = 80 and VAS-today n = 103.
Table 4 presents the responses to the five items in the PBMSI and the EQ-5D. A person who responded as having no problems on the five items was classified as 11111. A person who reported having moderate problems on any one of the five items was classified as 21111 or 12111 or 11211. There were less people who reported having no problems (11111) on the five items for the PBMSI (9%) than the EQ-5D (19%). There were more people who reported having some problems (i.e. chose response option 2) and severe problems (i.e. chose response option 3) on the PBMSI than the EQ-5D.
Reported health states on the PBMSI and EQ-5D.
PBMSI: preference-based multiple sclerosis index; EQ-5D: EuroQol-5 dimensions.
n = 111; **n = 102.
Convergent validity
Table 5 presents the correlation coefficient values between the PBMSI and other measures of HRQL. The correlation between the PBMSI and EQ-5D was 0.37, and the SF-6D was 0.66. The association between the PBMSI and the PGI was low to moderate at r = 0.32 (p value = 0.001). Also as expected, the correlations between the PBMSI and the physical tests were moderate.
Convergent validity: correlation (and p values) between the PBMSI, other measures of HRQL, self-rated health, disease severity, and functional capacity.
PBMSI: preference-based multiple sclerosis index; EQ-5D: EuroQol-5 dimensions; HRQL: health-related quality of life; PDDS: patient-determined disease step; SF-6D: Short Form-6 dimension; VAS: visual analogue scale; 6-MWT: 6-minute walk test; PGI: patient-generated index; PCS: physical component summary; MCS: mental component summary.
Furthermore, moderate correlations were observed between the PBMSI and the RAND-36 PCS (r = 0.40, p value = <0.0001) and MCS (r = 0.42, p value = <0.0001). Moderate correlations were also observed between the PBMSI and the VAS for health state today (r = 0.4, p value = <0.0001), and slightly higher correlations between the PBMSI and the VAS for health state over the past week (r = 0.5, p value = <0.0001).
Known-groups validity
Table 6 presents the known-groups validity results for the PBMSI, EQ-5D, and SF-6D against different levels of the PDDS, 6-MWT, and peak power.
Known-groups validity: PBMSI, EQ-5D, and SF-6D scores by disease severity, distance walked in 6 minutes, and peak power in watts.
PBMSI: preference-based multiple sclerosis index; CI: confidence interval; EQ-5D: EuroQol-5 dimensions; PDDS: patient-determined disease step; SD: standard deviation; SF-6D: Short Form-6 dimension; 6-MWT: 6-minute walk test.
p < 0.05 using analysis of variance (ANOVA) to discriminate across the three levels for the PBMSI and EQ-5D only.
For the PDDS, the PBMSI was able to differentiate between individuals with mild and moderate disability (ES = 0.57), as was the EQ-5D (ES = 0.67) and the SF-6D (ES = 0.36). The PBMSI was able to differentiate between people with moderate and severe disability (ES = 0.21), whereas the EQ-5D and SF-6D were not able to (ES = 0.09 and ES = –0.31, respectively).
For the 6-MWT, the PBMSI was able to differentiate between individuals with high and moderate functional capacity (ES = 0.55) and moderate and low functional capacity (ES = 0.46). The EQ-5D was also able to differentiate between the different levels of walking capacity, whereas the SF-6D was only able to differentiate between high and moderate. An ES of 0 was observed between moderate and low functional capacity on the SF-6D.
PBMSI scores decreased as peak power output decreased. The PBMSI was able to differentiate between high and moderate peak power better than the EQ-5D and SF-6D.
Figure 1 presents the results of the symptom checklist. The most prevalent symptoms were unsteadiness or loss of balance (47%), weakness in the legs (42%), and muscle stiffness or spasms (38%). Figure 2 shows that the PBMSI was able to differentiate between people who reported having a symptom versus those who did not. The ES values ranged from moderate (ES = 0.41) to large (ES = 0.91). The ES values for the PBMSI were all statistically significant, as the 95% CI excluded zero. ES values were larger for the PBMSI, than the EQ-5D and the SF-6D, when differentiating between people with and without balance problems, weakness in legs, muscle stiffness, and feeling frustrated.

Results of the MS symptom checklist.

Known-groups validity of the PBMSI, EQ-5D and SF-6D in terms of ability to discriminate between people with and without MS symptoms.
Discussion
This study evaluated the construct validity of the PBMSI in the setting of a randomized controlled trial (RCT) in people with MS, where one-third of the sample reported minor symptoms and described themselves as being in excellent or very good health. The PBMSI showed good convergent validity against generic measures of HRQL, supporting a priori hypothesized correlation values. The PBMSI was moderately correlated with the EQ-5D and strongly correlated with the SF-6D. According to the Kaiser and Guttman rule, only one factor was retained. In other words, factor analysis supported the structural validity of the PBMSI that the items collectively were measuring one underlying construct (HRQL). Known-groups validity was demonstrated between persons with different levels of disability. The PBMSI was able to discriminate better than the EQ-5D and the SF-6D for certain MS-specific symptoms, including balance, weakness in legs, muscle stiffness, and feeling frustrated.
The PBMSI in comparison to the EQ-5D and SF-6D
The PBMSI was moderately associated with the EQ-5D and strongly associated with the SF-6D. Higher correlations were observed between the PBMSI and SF-6D than between the PBMSI and EQ-5D probably because the SF-6D includes an item on fatigue (i.e. vitality) but the EQ-5D does not.
The PBMSI scores were on average lower than the EQ-5D and SF-6D scores. The sample had a mean HRQL score of 0.25 on the PBMSI, but a mean score of 0.81 and 0.68 on the EQ-5D and SF-6D, respectively. This may be because the items in the PBMSI were more relevant and likely to be impacted by MS than the items in the generic measures. As demonstrated in Table 4, there were less people who reported having no problems on the PBMSI (n = 10) than the EQ-5D (n = 19). Conversely, there were more people who reported having moderate or severe problems on the PBMSI (n = 101) than the EQ-5D (n = 83).
Another explanation for the discrepancy in scores may be because the preference weights for the PBMSI were obtained from people with MS, whereas the weights for the EQ-5D and SF-6D were obtained from the general public. Interesting to note is that the average score for the PGI, whose weighting system is based on patient preferences, was also less than the generic preference-based measures. Furthermore, the methods that were used to obtain preference weights differed between each of the preference-based measures. The PBMSI scoring algorithm was based on the rating scale method, the EQ-5D was based on the time trade-off, and the SF-6D on the standard gamble, which may also account for the discrepancies observed in scores.
Bias and generalizability
A limitation of this study was that the sample was predominantly individuals with mild to moderate disease severity. The inclusion criteria for the clinical trial required participants to be ambulatory, therefore the highest PDDS score observed was a level 5 or 6 (requiring support to walk 25 feet, wheelchair for greater distances), who were only 6% of the sample. The other 94% were individuals between levels 1 (minor symptoms) and 4 (need of a cane or crutch). For this reason, we were unable to thoroughly assess the ability of the PBMSI to differentiate between moderate and severe MS disability.
Furthermore, the PBMSI was not compared to disease-specific health measures, such as the Multiple Sclerosis Impact Scale (MSIS)-2934,35 or the MSIS-8.36,37 Future work should entail performing a direct comparison of the PBMSI against these measures.
Conclusion
Disease-specific preference-based measures have been developed for different conditions such as stroke, 38 cancer, 39 and asthma. 40 Disease-specific measures are designed to fill in the gaps in generic measures by tapping specific domains. The PBMSI is the first preference-based measure developed in MS using patient preferences.
In conclusion, the domains and items for the PBMSI were developed based on interviews with people with MS, ensuring that the measure had content validity in this clinical population. The results of this study support the structural validity of the PBMSI as an outcome measure of HRQL and its convergent validity against other measures of HRQL. Moreover, the PBMSI was able to differentiate between individuals with and without MS symptoms, as well as those with mild and moderate disability. However, validation is a continuous process that develops as a measure is applied in new situations and contexts. Therefore, future work with the PBMSI will need to involve assessing its longitudinal validity including responsiveness and sensitivity to change.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The randomized controlled trial was supported by the Canadian Institutes of Health Research (grant number 119282).
