Abstract
The need to fully assess upper limb function in multiple sclerosis (MS) has become increasingly clear with recent studies revealing a high prevalence of upper limb dysfunction in persons with MS leading to increased dependency and reduced quality of life. It is important that clinicians and researchers use tailored outcome measures to systematically describe upper limb (dys)function and evaluate potential deterioration or improvement on treatment. This topical review provides a comprehensive summary of currently used upper limb outcome measures in MS, classified according to the levels of the International Classification of Functioning (ICF). The clinical utility, strengths, weaknesses and psychometric properties of common upper limb outcome measures are discussed. Based on this information, recommendations for selecting appropriate upper limb outcome measures are given. The current shortcomings in assessment which need to be addressed are identified.
Introduction
The importance of addressing upper limb function in multiple sclerosis (MS) is increasingly recognised, with recent studies revealing a high percentage of persons with MS (PwMS) reporting upper limb dysfunction, even in the early stages of the disease. In a study by Holper et al., more than 50% of 205 PwMS (mean Expanded Disability Status Scale (EDSS)=3.5, SD=2.0) reported impairment or restriction related to upper limb function, 1 with the highest prevalence of upper limb disability found in the group with progressive disease. These high percentages of self-reported impairments and restrictions were confirmed by Johansson et al. in 219 PwMS (EDSS range=0–9.5) using an objective measure of manual dexterity, the Nine-Hole Peg Test (NHPT); 76% of the patients included had disability in their manual dexterity. 2 Kierkegaard et al. found manual dexterity (measured by the NHPT) to be an important predictor of overall activity and participation within the community (as assessed by the Katz Activities of Daily Living (ADL) index and Frenchay Activity Index). 3 Upper limb dysfunction in MS contributes to reduced ability to perform ADL, resulting in decreased independence and quality of life. 4 Therefore, it is important that clinicians and researchers pay attention to the progression of upper limb dysfunction in PwMS and consider management strategies.
In order to evaluate both disease progression and the effect of management strategies on the functioning of the upper limbs, tailored outcome measures are needed. In the 1990s, a task force appointed by the National Multiple Sclerosis Society’s Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis developed a new outcome measure in response to the shortcomings of the EDSS.5,6 This new outcome measure, the Multiple Sclerosis Functional Composite (MSFC), incorporated measures addressing three important domains: ambulation, cognition and upper limb function. Since then, the NHPT has been the most frequently used outcome measure to assess upper limb dysfunction in clinical trials. 7 However, the NHPT assesses only the ability to perform fine dextrous manual movements and does not evaluate other important aspects of upper limb function, such as in-hand manipulation of objects, movements in the proximal part of the upper limb (e.g. reaching, lifting), gross manual dexterity (e.g. grasping and handling of large objects) or complex coordinated bimanual tasks (e.g. eating with knife and fork) which are crucial to perform ADL. In this wider context, one must consider lesser known outcome measures that are so far only infrequently used, mainly restricted to MS rehabilitation studies. Most of these outcome measures were originally developed to assess upper limb dysfunction in other neurological diseases (e.g. stroke). There are several reviews published describing their clinical utility and psychometric properties in the setting of other diseases;8–11 however, it has to be verified whether these outcome measures are also appropriate to evaluate upper limb function in PwMS.
Upper limb assessment according to International Classification of Functioning (ICF) levels
The ICF is a well-established framework that can be used to classify outcome measures according to three levels: ‘body functions and structures’, ‘activity’ and ‘participation’. Figure 1 provides an overview of previously used upper limb outcome measures in PwMS classified according to the ICF levels. 7

Upper limb outcome measures classified according to International Classification of Functioning (ICF) levels.
Outcome measures on the ‘participation’ level assess restrictions an individual may have in involvement in life situations. 12 The Katz Personal and Instrumental ADL index, the Frenchay Activities Index and the Functional Independence Measure are frequently used outcome measures assessing restriction on ‘activity’ and ‘participation’ level.2,3,13–15 These outcome measures are all, however, strongly influenced by the person’s walking ability, their ability to independently perform a transfer and cognitive function. There is no outcome measure available which investigates only the impact of upper limb dysfunction on participation.
Outcome measures on ‘activity’ level are applied to understand the problems PwMS have in performing ADL related to the upper limb. Within this level, capacity and performances measures are differentiated. Measures of capacity, by definition, assess the maximal ability to execute a task or an action in a given domain at a given moment in a standardised environment while measures of performance assess the person’s habitual performance of a task or action in his or her normal environment. 12
The most established upper limb capacity measure in MS is the NHPT. 7 However, other outcome measures such as the Box and Block Test (BBT) and the Action Research Arm Test (ARAT) are also frequently used in rehabilitation trials.7,16 Table 1 provides information about the clinical utility, strengths and weaknesses of seven common upper limb capacity outcome measures. It is apparent from this table that all capacity outcome measures include test items requiring grasp and/or manipulation function, which may be difficult to perform for PwMS with severe hand and upper limb dysfunction, possibly leading to a floor effect if the object cannot be grasped properly. The NHPT, BBT and Purdue Pegboard mainly assess the ability to repeatedly perform one particular type of grasp function as quickly as possible, while the ARAT, Jebsen Taylor Hand Function Test (JTT), Test d’Evaluation des Membres Supérieurs de Personnes Âgées (TEMPA) and Wolf Motor Function Test (WMFT) assess the ability to perform different ADL-like tasks requiring manipulation or transportation of standardized small and large objects using different grasp, grip and pinch functions. The score of the Purdue Pegboard Test is expressed in seconds while the score of the NHPT can be expressed in seconds or pegs per second to avoid floor effects in persons with more severe hand impairment. Items of the JTT are scored using only a time score while in both the TEMPA and WMFT, an ordinal functional rating score along with a time score is given for each test item. The ARAT uses an ordinal scale, which takes the time needed to perform the task into account when scoring the test items. Besides a total score of the ARAT, the subscores for the different grip functions and gross movements can also be reported separately. 17 A drawback of outcome measures applying an ordinal scale is the risk of ceiling effects in persons with only mild impairment given that the difficulty level of included test items may vary (e.g. ARAT18,19). Furthermore, it is notable that only three out of the seven capacity outcome measures include one or more items which are performed bilaterally. All our measures assessing upper limb capacity are thus, perhaps, of limited value given the finding of Kilbreath and Heard, 20 who reported that healthy subjects mainly use both upper limbs together to perform tasks in their daily life.
Description of upper limb capacity measures.
ADL: activities of daily living; TEMPA: Test d’Evaluation des Membres Supérieurs de Personnes Âgées. Test instructions, score forms and equipment specifications of the different outcome measures can be obtained through the references or online resources (e.g. www.rehabmeasures.org, www.strokengine.ca, www.cebp.nl).
Performance measures are relatively recently introduced in MS and can be either subjective (perceived performance) or objective (actual performance). The ABILHAND,27–29 Disabilities of the Arm, Shoulder, and Hand Scale (DASH), 30 Manual Ability Measure-36 (MAM-36)31,32 and Motor Activity Log (MAL)18,33,34 are used to assess perceived performance, while accelerometers are used to assess actual performance. Description, strengths and weaknesses of the perceived performances measures are provided in Table 2. All questionnaires include rating on unilateral and bilateral tasks, except the ABILHAND, which includes only bilateral tasks. The ABILHAND, DASH and MAM-36 assess the perceived ease/difficulty when performing ADL regardless of which upper limb is used, while the MAL assesses the ‘amount of upper limb use’ and ‘quality of upper limb use’ during ADL. For both the ABILHAND and MAM-36, a conversion table is available to obtain a Rasch-derived score, which is regarded as superior to a summed score or calculated scores as used in the MAL and DASH. 35
Description of upper limb perceived performance measures.
ADL: activities of daily living; DASH: Disabilities of the Arm, Shoulder, and Hand Scale; MAM-36: Manual Ability Measure-36; MAL: Motor Activity Log. Test instructions and score forms of the different questionnaires can be obtained through the references or online resources (e.g. www.rehabmeasures.org, www.strokengine.ca, www.cebp.nl, www.dash.iwh.on.ca, www.rehab-scales.org).
The use of accelerometers to objectively assess upper limb performance in daily life, similar to step count for the lower limbs, is attractive but currently still has technical challenges (e.g. filtering out general body movements or wheelchair acceleration) dictating the need to interpret the results with caution. However, a reasonable correlation with upper limb capacity measures has been shown while differences between dominant and non-dominant hand can be captured. 18
Outcome measures on the ‘activity’ level provide information on the problems that PwMS are facing during their ADL. However, to optimize treatment strategies, one should also include outcome measures identifying the contributing impairments at the ‘body functions and structures’ level. Outcome measures on this level assess the physiological functions of anatomical parts of the body. 12 A large number of different outcome measures are available to assess impairments such as ataxia, tremor, muscle weakness, decreased active range of motion, sensory dysfunction and spasticity. The most frequently used outcome measure on this level is hand grip strength, followed by ataxia and tremor rating scales, Motricity Index to assess general upper limb strength, the Modified Ashworth scale to evaluate spasticity, Semmes Weinstein Monofilaments and tuning fork to assess sensory function. 7 The majority of these outcome measures are ordinal rating scales performed by physicians, as such relying on the competence and consistency of the rater, which makes them less suitable for intervention trials.
Psychometric properties of measures at the ‘activity’ level
Knowledge of psychometric properties such as validity, reliability and responsiveness is crucial to select the most appropriate outcome measure for a given purpose. In this review, we focus on the psychometric properties of capacity measures on ‘activity’ level, which are presented in Table 3. The results of these psychometric studies should be interpreted with caution as these values are influenced by sample size, upper limb disability level of the included study sample and the statistics that have been performed.
Psychometric properties of upper limb capacity measures.
TEMPA: Test d’Evaluation des Membres Supérieurs de Personnes Âgées; ICC: intraclass correlation coefficient; (95% confidence interval); ρ: Spearman correlation coefficient; r: Pearson correlation coefficient; SEM: standard error of measurement; CV: coefficient of variation; MIC: minimally important change; MDC: minimal detectable change; SRC: smallest real change. ap < 0.05. bSignificance not reported.
In general, the psychometric properties are all insufficiently documented for most of these outcome measures, except for the NHPT, which has been found to be reliable and valid. For all other measures, although only few psychometric studies have been reported, the available values are encouraging. Remarkably, reliability in PwMS has not yet been investigated for the TEMPA and WMFT while most of the outcome measures have been validated against the NHPT as a gold standard. Responsiveness was rarely investigated for any outcome measure and, when done, was studied only in relation to the longitudinal deterioration caused by disease progression. For the NHPT, a minimally important change between 0.30 and 0.51 seconds has been described, 40 with other studies suggesting a 20% threshold for clinical meaningful change.51–54 Data on responsiveness have been reported for the ARAT 40 and BBT, 43 while responsiveness of the JTT, Purdue Pegboard Test, TEMPA and WMFT has not yet been described.
Psychometric properties of the perceived performance measures were mostly investigated using Rasch measurement methods.27,30,32 The ABILHAND 27 and MAM-36 32 appeared to be reliable and valid in MS, in contrast to the low psychometric properties that were found for the DASH 30 and the lack of data for the MAL. Responsiveness has not yet been investigated for any of the perceived performance measures in MS. There are no psychometric data available yet for accelerometry as an objective performance measure in MS. Correlation coefficients between capacity and perceived performance measures were variable indicating that both types of measures assess different aspects of upper limb function. This finding suggests the need to use both types of outcome measures to adequately assess the upper limb in MS.
Selecting appropriate upper limb outcome measures
The selection of appropriate upper limb outcome measures depends on the intended purpose, which may range from fast screening for abnormalities to a more detailed documentation of upper limb dysfunction and evaluation of management strategies.
The NHPT seems acceptable as a fast screening tool to discriminate between normal and impaired manual dexterity based on normative data. In this regard, Kierkegaard et al. proposed a cut-off value of the NHPT which may be used to identify PwMS at risk of limitation in activity or restriction of participation. 3 However, the proposed cut-off value (0.5 pegs/seconds or 18 seconds) seems very strict: healthy subjects in different age categories are not able to perform the NHPT in less than 18 seconds. 55 Alternatively, one could also opt to use the BBT or Purdue Pegboard Test as fast screening tools. In contrast to the NHPT, the BBT requires a less precise grip and more involvement of the proximal upper limb muscles while the Purdue Pegboard Test may detect more subtle deficits, given the smaller pegs and bimanual movements.
However, abnormal scores on these fast screening tools are not perfectly related to the disability found with other more comprehensive capacity measures (Table 3). By using more comprehensive capacity measures such as the ARAT, TEMPA, WMFT or JTT, one is able to better understand which components of the upper limb movement are affected (e.g. type of grasp, grip or pinch, presence of proximal or distal muscle weakness, spasticity or incoordination). Conducting these measures may also guide and facilitate individual management strategies to improve upper limb-related ADL. To select the right capacity measures, one must take into account both the clinical utility of the outcome measures and also the severity of upper limb dysfunction anticipated. However, only two studies have reported on the ceiling effect of capacity measures, and comparative data on floor and ceiling effects in MS are needed to draw firm conclusions.18, 19
Recently, more attention has been paid to patient-reported outcome measures. 35 These perceived performance measures give more information about the difficulties PwMS experience when performing specific upper limb tasks in their home situation, the latter being considered as the ultimate goal of treatment. Recent studies18,33 have indicated that although scores on capacity measures are (almost) normal, PwMS still report upper limb disability affecting their ADL performance. Therefore, we recommend using the ABILHAND or MAM-36, depending on the upper limb severity level of the patient sample, to assess the ease/difficulty when performing ADL related to the upper limb. The MAL and accelerometry can be used to assess the use of the upper limbs in daily life. Albeit attractive as a concept, the psychometric properties of these need further investigation and the technical challenges for accelerometers need to be resolved.
Gaps in the upper limb assessment
We must acknowledge that, despite the large number of available assessment tools, current upper limb assessment in MS still has limitations. Studies18,31,48,56 have indicated that the relationship between capacity measures on ‘activity’ level and impairments on ‘body function and structures’ level such as muscle weakness, spasticity, impaired coordination and sensory function, are inconsistent. With current knowledge, the use of technology could provide more detailed information on other impairments which are rarely addressed such as incoordination, decreased force control, movement variability and selectivity. In this regard, instrumented assessment tools such as finger tapping, 57 virtual peg boards,58,59 and instrumented objects 60 assessing more complex and integrated motor function are promising tools which can give more information about the presence of impairments on ‘body functions and structures’ level while performing a functional task on ‘activity’ level. By providing more objective and quantitative data, these assessment tools may be able to be more sensitive for the early detection of upper limb dysfunction and for changes due to deterioration/intervention. Furthermore, they also may facilitate the evolution of rehabilitation content and strategies. However, more research regarding the benefits of these instrumented assessment tools and their psychometric properties is needed.
A major limitation of the current outcome measures of ‘activity’ level is that they often assess only the ability to perform a relatively short task while PwMS sometimes report increased fatigability interfering with the prolonged execution of functional activities such as typing or eating. Static and dynamic motor fatigue indexes, calculated based on a sustained and repeated hand grip contractions, were originally developed by Schwid et al. 61 but are rarely applied. However, the discriminative power of both indexes needs further investigation while the relation with prolonged daily life activities remains to be determined.
Conclusion
In this review, we have highlighted firstly the lack of studies investigating psychometric properties of upper limb outcome measures in MS, especially regarding the responsiveness of outcome measures after an intervention. Secondly, as described in other neurological pathologies8,10 and obvious from this review, there is no single outcome measure available that covers the entire range of upper limb functionality and is applicable across different upper limb disability levels. Outcome measures on the different ICF levels are essential to fully understand upper limb dysfunction and improve (the evaluation of) management strategies. The correct use and interpretation of existing scales, and the development of improved instruments, should be a priority for those looking to use quantitative measures of upper limb function in their clinical practice or in the design of clinical trials.
Footnotes
Conflict of interest
None declared.
Funding
Ilse Lamers is supported by a PhD fellowship from the Research Council of Hasselt University (BOF-grant).
