Abstract
Stroke survivors experience complex combinations of impairments, activity limitations, and participation restrictions. The essential components of stroke rehabilitation remain elusive. Determining efficacy in randomized controlled trials (RCTs) is challenging; there is no commonly agreed primary outcome measure for rehabilitation trials. Clinical guidelines depend on proof of efficacy in RCTs and meta-analyses. However, diverse trial aims, differing methods, inconsistent data collection, and use of multiple assessment tools hinder comparability across trials. Consistent data collection in acute stroke trials has facilitated meta-analyses to inform trial design and clinical practice. With few exceptions, inconsistent data collection has hindered similar progress in stroke rehabilitation research. There is an urgent need for the routine collection of a core dataset of common variables in rehabilitation trials. The European Stroke Organisation Outcomes Working Group, the National Institutes of Neurological Disorders and Stroke Common Data Elements project, and the Collaborative Stroke Audit and Research project have called for consistency in data collection in stroke trials. Standardizing data collection can decrease study start up times, facilitate data sharing, and inform clinical guidelines. Although achieving consensus on which outcome measures to use in stroke rehabilitation trials is a considerable task, perhaps a feasible starting point is to achieve consistency in the collection of data on demography, stroke severity, and stroke onset to inclusion times. Longer term goals could include the development of a consensus process to establish the core dataset. This should be endorsed by researchers, funders, and journal editors in order to facilitate sustainable change.
Introduction
Stroke survivors experience unique combinations of impairments, activity limitations, and participation restrictions (Table 1), which add to the complex challenge of stroke rehabilitation. Factors such as the rehabilitation setting, time since stroke, and coexisting impairments further contribute to this complexity. Rehabilitation is multidisciplinary, bringing together medical consultants, stroke nurses, clinical support workers or auxiliary nursing staff, physiotherapists, and occupational and speech and language therapists. The multidisciplinary team may also include dieticians, podiatrists, orthoptists, orthotists, social workers, and neuropsychologists. Together they aim to reduce the impact of stroke on activities of daily living, maximize recovery, restitution and participation, minimize the impact of any changes in ability, and prevent avoidable complications (1). The essential components of stroke rehabilitation remain elusive. Although the majority of stroke patients receive both physiotherapy and occupational therapy, consensus on the optimum treatment regimen is lacking (2).
Selected issues faced by individual stroke survivors
Randomized controlled trials (RCTs) and meta-analyses form the evidence base for clinical practice. However, due to the challenging nature of poststroke sequelae, RCTs in stroke rehabilitation are particularly complex. Meaningful recovery varies between patients: what is considered as a good outcome by one stroke survivor may not be similarly rated by another. The use of a wide range of discipline-specific outcome measures coupled with the relatively small population sizes and limited scope of trials severely limits our ability to compare results between trials and to conduct meta-analyses. Nevertheless, there is a good indication that RCTs of many rehabilitation interventions are feasible and improve the evidence base for rehabilitation care (3,4). Meta-analyses of rehabilitation trial data have helped to drive practice change as evidenced in work by the Stroke Unit Trialists’ Collaboration (2), early supported discharge (5) and in community rehabilitation (6). The recognition of a need for high-quality evidence of efficacy has been accompanied by an increase in the number of stroke rehabilitation trials. Within the field of physiotherapy alone, the number of trials conducted since 1982 has grown expo-nentially (7). However, the translation of findings into clinical practice has been disproportionately low in comparison with the number of completed trials.
A multitude of outcome measures
Determining the efficacy of rehabilitation interventions in meta-analyses is problematic as there is no commonly agreed primary outcome measure; multiple assessment tools exist to describe similar impairments. For example, in a recent review, 129 outcome measures were recorded for trials of interventions to improve upper limb function (8). Similarly, a review of speech and language interventions for aphasia found 100 outcome measures were recorded across 39 trials (9). Examination of stroke trials published between 2001 and 2006 revealed the routine use of up to 47 different functional outcome measures (10) and a similar review of trials conducted between 2000 and 2011 highlighted 300 different assessments for cognitive and mood measures, across 408 studies (11).
The feasibility and practicability of consistent data collection has been exemplified in acute stroke research. The requirement for Clinical Trials of Investigational Medicinal Products in the acute setting to include functional outcome measures led to recommendations for preferred use of the modified Rankin Scale (mRS). Training and certification followed to ensure consistency of scoring between assessors. Consistency in data collection has also been aided by the fact that acute stroke trials have published guidance on outcome data collection (12), typically involve interventions administered within a brief therapeutic window, record standardized physiological measures, and typically only follow up patients up to three-month poststroke. In spite of different individual trial aims, acute stroke trials show a remarkable level of consistency in the types of data collected. This is evidenced in the acute section of the Virtual International Stroke Trials Archive (VISTA-Acute). This clinical trials resource was established to facilitate novel exploratory analyses of secondary data to inform trial design (13). Of 29 acute stroke trials within the archive, all 29 recorded patient demographic details, medical history, and Barthel Index (BI), 19 recorded mRS, 15 recorded the National Institutes of Health Stroke Scales, and 11 recorded the Scandinavian Stroke Scale. This uniformity in data collection has facilitated meta-analyses of trial data to quantify treatment effects (14–18) and secondary analyses to generate hypotheses and pilot trial design issues (19). Although there are some instruments in fairly widespread use, such as the Functional Independence Measure (FIM Uniform Data System for Medical Rehabilitation)™, which is a standard measurement tool in some countries (20), such consensus and consistency in the collection of outcome measures has not been evident in stroke rehabilitation research.
Many different impairment and outcome measures exist in stroke rehabilitation trials. To date, the rehabilitation section of the VISTA-Rehab (21) has amassed data from 38 trials, involving 10 224 patients and reporting outcomes using at least 44 different measures. Despite the wealth of data collected, there is little overlap in the outcome measures recorded (Fig. 1a, b). Of the current VISTA-Rehab contributions, 25/38 trials recorded the BI, 14 trials recorded the Extended Activities of Daily Living Scale, 8/38 recorded data on the European Quality of Life Score, 9/38 recorded mRS, and 7/38 included data on the Rivermead Mobility Index. Inconsistency in data collection also extends to baseline measures of stroke severity and demographics that play an important role in facilitating data comparability. As recovery from stroke depends on a range of factors such as age, gender, type and severity of stroke, location and size of lesion, adverse events, and comorbidities (22–24), the inclusion of these data are essential in order to correctly interpret analyses. Within VISTA-Rehab, only 6/38 trials recorded the type of stroke experienced by the patient, 6/38 described medical history variables, 13/38 described the time from stroke onset to intervention, 17/38 described a measure of initial impairment/activity limitation at baseline, and 25/38 trials recorded the cerebral hemisphere affected by stroke. Despite the partial overlap across some baseline and outcome variables, analyses of data from VISTA-Rehab are further limited because of variation in the times from index stroke to enrollment, a paucity of data on initial stroke severity and confounding variables such as demography, cognition, prestroke presentation, and medical history. Diverse trial aims, differing methods, and the multiplicity of assessment tools hinders data comparability across trials, even within a similarly impaired patient population. All of this renders secondary and meta-analyses of these data problematic, especially when further complicated by different time points for follow-up data collection and questionable consistency in training and administration of assessment tools.

(a) Data collection in trials within VISTA-Rehab.

(b) Range of patient measures in VISTA-Rehab. VISTA, Virtual International Stroke Trials Archive.
Why are there so many different outcome measures in current use?
Multidisciplinary interventions aim to maximize the individual's functional recovery but do so using discipline specific interventions, severity measures, and assessment tools and utilize individual time frames for intervention and follow-up. In the international field, this may be further complicated by language issues particularly in the field of communication impairment where the very structure of the language being assessed and rehabilitated may differ between sites. The complexity of impairments, activity limitations, and participation restrictions faced by a stroke survivor make this difficult to measure. This results in equally complex outcome assessments that can have limitations affecting reliability and validity.
Historically, new assessment tools were developed in response to a need to quantify impairments where existing scales lacked validity, did not adequately capture impairments targeted by the intervention, functionally relevant tasks or participation components. For example, the Fugl Meyer Assessment (FMA) was developed because contemporary measures of limb impairments lacked quantitative numerical assessment properties (25) and standardized measurements of posture and motor performance (25,26). Since the development of some of the earlier assessment tools, our understanding of the clinimetric properties and the rigor required to develop new tools have expanded considerably. Some stroke assessment tools may fail to meet current standards, but their long-standing, frequent application and the subsequent wealth of available data ensure their continued use. Improved understanding of assessment tools has also resulted in informed selection for use in RCTs. For example, the BI is widely known to have a ceiling effect in stroke patients (27), and the FMA is similarly limited in those with mild motor impairment. Therefore, some have postulated that use of the FMA in combination with the Chedoke-McMaster Disability Inventory may address the limitation of a ceiling effect in those with mild motor impairment (26). The need to compensate for the limitations of one assessment tool by using additional assessments has contributed to the vast array of measures in current use.
The impact of inconsistent data collection
Meta-analyses and design of future trials are hindered if existing sources of evidence utilize different patient populations, enroll patients within disparate poststroke time points, use different assessment tools, and if data collection lacks consistency. As clinical practice is driven by evidence from RCTs and meta-analyses of trial data, the issue of comparability across different trials is of utmost importance. Furthermore, the collation of data from different trials and subsequent standardization of outcome measures by calculating the standard mean difference for use in meta-analyses can lead to bias, which may also influence findings in systematic reviews (28).
A call for consensus on a core dataset
The need for consistency in data collection and appropriate selection of outcome measures is a theme that has been repeated by many investigators. Numerous initiatives have individually sought to produce recommendations for standardized data collection. From an acute stroke perspective, the European Stroke Organisation Outcomes Working Group (12) recommended the use of the mRS as a primary outcome measure in acute stroke trials. This consensus was reached through exhaustive review of available evidence and examination of scale properties.
In the United States, the National Institutes of Neurological Disorders and Stroke Common Data Elements project (29) employed a combination of rigorous review and consultation including a working group of experts in cerebrovascular disease trials, epidemiology, and biostatistics. They recommended 980 common data elements for a range of fields including biospecimens and biomarkers, hospital course and acute therapies, imaging, laboratory tests and vital signs, long-term therapies, medical history and prior health status, outcomes and end-points, stroke presentation, and stroke types and subtypes.
Within the rehabilitation settings of some countries, comparisons between patient outcomes across different locations has been facilitated by the standardized use of the FIM as a measure of functional status and disability (20). However, use of this outcome measure has yet to be formally adopted on a global scale. In the United Kingdom, rehabilitation-specific recommendations have been investigated in the Collaborative Stroke Audit and Research (COSTAR) initiative (30). This project sought to develop a standard method of classifying, measuring, and timing rehabilitation interventions, to promote collaboration among centers and across disciplines, to establish a set of recommended outcome measures for stroke, and to identify areas where new scales need to be developed. COSTAR highlighted the need for collection of a core dataset of common variables, including case mix indicators and outcome measures in order to facilitate data comparability. However, in spite of the efforts of these initiatives, there still remains a need for greater consistency in current data collection practices, reporting of a core dataset and standardized outcomes across stroke rehabilitation trials. The American Physical Therapy Association's Neurology Section generated recommendations for use of 54 physiotherapy outcome measures for people with stroke (31). This is still an exceedingly high number of outcomes and is not conducive to effective analyses. This highlights the need to refine and recommend a smaller core set of variables for collection.
Recommendations of appropriate outcome measures should be based on methods that synthesize high-quality evidence of reliability and validity. Consensus meetings between clinicians, researchers, and those directly affected by stroke could help to identify recommended assessment measures.
In a climate of scarce resources, it is vital that design and delivery of rehabilitation interventions are informed by a robust evidence base, ideally across several trials. Future efforts should also focus on those interventions that are most likely to have efficacy. This may be achievable through meta-analyses and use of pooled data to generate and test hypotheses. In order to maximize the value, utility, and comparability of these data, there is an urgent need for standardization of data collection. In addition, there is a need to increase awareness of using ordinal scales that requires a different statistical approach than analysis of continuous data and for valid translations of instruments into other languages, taking into account different cultural influences.
Previous initiatives have identified the need for collection of a core dataset of common variables, and in the acute stroke research setting, rigorous protocols have been implemented to identify what these common data elements should be. The next steps should involve similarly rigorous identification and consistent implementation of data collection in stroke rehabilitation research. Standardizing data elements can decrease study startup times, facilitate data sharing, and promote informed clinical guidelines (29). Although there is scope to utilize assessment tools specific to the needs of the individual trial, there is a greater need for clinicians and researchers to use consistent measurement tools to ensure comparability of data in future meta-analyses and secondary analyses. Rehabilitation deals inherently with change. Therefore, stroke rehabilitation trials need to employ robust outcome measures that can detect clinically meaningful changes in the patient population. Although achieving consensus on which outcome measures to use in stroke rehabilitation trials is a considerable task, perhaps a feasible starting point is to have consistency across all rehabilitation trials in the collection of data on demography, stroke severity, and stroke onset to inclusion times. Longer term goals could include the development of a consensus process to establish the core dataset, which would be endorsed by leading researchers, funders, and journal editors.
Summary
Stroke rehabilitation is challenging; this is reflected in the complex nature of rehabilitation trials. Issues such as the lack of data comparability, the multitude of trial or impairment-specific outcome measures, inconsistent collection of data, and relatively small sizes of rehabilitation trials hinder meta-analyses to inform clinical practice. These challenges can be overcome, and road maps have been put forward by numerous initiatives. Achieving consensus on the selection of outcome measures may take time; however, a starting point would be to consistently collect data on patient demography, initial stroke severity, and time since stroke. Consensus on which outcome measures to collect will require broad agreement between clinicians, researchers, patient groups, funders, and journal editors in order to be sustainable.
