Abstract
Background:
To maximise the benefits obtainable from exercise-based cardiac rehabilitation, an evaluation of physical fitness using reliable, clinically relevant tests is strongly recommended. Recently, objective tests of physical fitness have been implemented in the SWEDEHEART register. The reliability of these tests has, however, not been examined for patients with acute coronary syndrome.
Aims:
The aim of this study was to assess the test–retest reliability and responsiveness to change of the symptom-limited bicycle ergometer test, the dynamic unilateral heel-lift test and the unilateral shoulder-flexion test in patients with acute coronary syndrome.
Methods:
In a longitudinal study design, a total of 40 patients (mean age 63.8 ± 9.5 years, five women), with ACS, aged < 75 years, were included at a university hospital in Sweden. The intra-class correlation coefficient (ICC) with a 95% confidence interval, standard error of measurement (SEM) and responsiveness in terms of the minimal detectable change were calculated.
Results:
Excellent reliability was found, showing ICC values of 0.98 (0.96–0.99), SEM 4.71 for the bicycle ergometer test, ICC 0.87 (0.75–0.93), SEM 4.62 for the shoulder-flexion test and ICC 0.84 (0.71–0.91), SEM 2.24 for the heel-lift test. The minimal detectable change was 13 W, 13 and 6 repetitions for the bicycle ergometer test, shoulder-flexion and heel-lift tests respectively.
Conclusions:
The test–retest reliability of clinical tests evaluating physical fitness in patients with acute coronary syndrome included in the SWEDEHEART register was excellent. This makes the future comparison and evaluation of treatment effects in large unselected clinical populations of acute coronary syndrome possible.
Introduction
Coronary artery disease (CAD) is the single most common cause of death worldwide. 1 During the last few decades, CAD mortality rates have fallen, mainly due to evidence-based treatments and a reduction in major risk factors, thereby increasing the number of people in need of secondary prevention. 2 Acute coronary syndrome (ACS) includes the diagnosis of ST-elevation myocardial infarction, non-ST elevation myocardial infarction and unstable angina pectoris. 3
Secondary prevention in patients with ACS, administered through comprehensive cardiac rehabilitation at hospital, is a Class 1 recommendation from the European Society of Cardiology (ESC) and the American Heart Association (AHA).4–6 Cardiac rehabilitation services are provided using an interdisciplinary approach and include specific core components comprising baseline patient assessment, nutritional counselling, risk factor management, psychosocial interventions, physical activity counselling and exercise, with exercise consistently identified as a central element.4–6 Exercise should start as soon as possible, within one to two weeks of the acute onset of ACS, and should continue for three to six months to ensure the greatest effect on left ventricular remodelling. 7
Meta-analyses clearly confirm the benefits of exercise-based cardiac rehabilitation in terms of marked reductions in cardiac mortality, a reduced risk of hospital admission, favourable effects on psychological well-being, cardiac risk markers and peak oxygen uptake.8–10 Aerobic exercise capacity is a strong predictor of mortality in patients with ACS and a small gain in oxygen uptake may therefore improve not only functional capacity and quality of life but also survival prospects. 11
Adherence to exercise-based cardiac rehabilitation guidelines is important in order to achieve the established positive health benefits. 4 Guidelines recommend continuous aerobic exercise for at least 20–30 min, three days a week, at 50–80% of VO2max.5,12 Resistance exercise should be recommended in addition to aerobic exercise, including eight to 10 upper- and lower-body exercises, with 10–15 repetitions in at least one set, two to three times a week. Combined aerobic and resistance exercise has been shown to have the greatest benefits on overall risk reduction and cardiovascular fitness. 13
To maximise the benefits obtainable from exercise-based cardiac rehabilitation, an evaluation of physical fitness through exercise testing prior to starting the exercise programme is strongly recommended.4,14 A cardiopulmonary exercise test has been proposed as the gold standard to provide the most accurate quantification of maximal aerobic capacity and subject effort, as well as risk stratification and control for physiological responses in the individual patient. This test is usually a continuous load increment exercise test on a bicycle ergometer or a treadmill.14–17 Although maximal testing provides the most accurate determination of aerobic capacity, submaximal (symptom-limited) testing does not require expensive, specialised equipment or electronically braked cycle ergometers and can therefore be performed in large clinical settings. 16 Several protocols from bicycle ergometer tests with various principles for load increments are being used in clinical practice. 14 Research studies often adhere to strict protocols, but, in clinical practice, protocols are often not standardised. 18 Since assessments of aerobic physical capacity and haemodynamic parameters are influenced by the type of protocol, a reliable protocol that is easily applicable in clinical practice appears desirable and would enhance the comparison of results from different centres. The symptom-limited bicycle ergometer test, according to the World Health Organization (WHO) protocol using stepwise load increments, has been used in clinical exercise-based cardiac rehabilitation settings. 19 According to a WHO report, experts suggested that the exercise test best suited to universal use was exercise on an upright bicycle ergometer, performed at continuous series of increasing workloads with an almost steady state at each level. 19 Even though this protocol has been recommended and is being used in exercise-based cardiac rehabilitation settings, it has previously not been evaluated in terms of reliability in patients with ACS.
When it comes to measuring muscle endurance, there is no gold standard. In research, the isokinetic technique, when muscle force and power are measured with regard to both the force and the velocity of limb movements, is most commonly used. 20 This is, however, expensive and requires advanced technology. There is therefore a need for clinically useful, reliable tests designed to evaluate muscle endurance in patients with ACS that are easy to perform. Two muscle endurance tests, the dynamic unilateral heel lift and unilateral shoulder flexion, have been used in exercise-based cardiac rehabilitation settings and in research and have been found to be reliable for patients with heart failure and for healthy persons. 21
The large Swedish quality register SWEDEHEART provides continuous information on patient care needs, treatments and treatment outcomes. 22 The aim of SWEDEHEART is also to register changes in the quality of care over time, to contribute decision support and to support continuous improvement efforts. Information is collected from almost all hospitals in Sweden and the work of SWEDEHEART therefore represents an important foundation for research on ACS and has resulted in a number of publications in the most highly ranked medical journals. Its results have consequently influenced the care of cardiac disease throughout the world. Objective tests of physical fitness to evaluate the effects of exercise-based cardiac rehabilitation have been lacking in the register. In 2016, the symptom-limited bicycle ergometer test according to the WHO protocol19,23 and the dynamic unilateral heel-lift and unilateral shoulder-flexion tests 21 were implemented in the Secondary Prevention after Heart Intensive Care Admission (SEPHIA), which is a part of the SWEDEHEART register. 22 This makes the future evaluation of treatment effects in a large unselected clinical population of ACS possible.
The test–retest reliability of these clinical tests of physical fitness has, however, not been examined in patients with ACS. As we believe that these tests will be widely used in future research and quality of care, it is most important to ensure stability over time and responsiveness to change in order to ensure that the data provided are an accurate representation of differences in patients’ actual performance.
The aim of this study was therefore to assess the test–retest reliability and responsiveness in terms of the smallest detectable change in the symptom-limited bicycle ergometer test according to the WHO protocol and the dynamic unilateral heel-lift and unilateral shoulder-flexion tests in patients with ACS.
Methods
Design
Longitudinal study design.
Study population
Patients with ACS from the coronary care unit at Örebro University Hospital (USÖ) were included in the study between September 2016 and April 2017. The inclusion criteria were current hospitalisation due to ACS, age < 75 years (according to SEPHIA) and living in the geographical area belonging to the USÖ. The exclusion criteria were coronary artery bypass grafting, severe physical or mental illness preventing performance of the tests and difficulties understanding spoken and written Swedish. In all, 97 patients were consecutively assessed for eligibility. Among these, 47 patients were included. Reasons for exclusion were not meeting inclusion criteria (n=22), declined to participate (n=18), other reasons (n=11). Among the 18 patients declining to participate the three main reasons were: long distance to hospital, lack of time and that they did not want to participate in a research study. Seven patients were lost to retest, leaving 40 patients with complete test–retest data for the analysis (see Figure 1). Informed, written consent was obtained from all patients prior to entering the study. The study protocol has been approved by the Regional Ethical Review Board at the University of Uppsala, Dnr 2016/168. The investigation conforms with the principles outlined in the Declaration of Helsinki (Br Med J 1964; ii: 177).

Flow chart of included patients.
Procedure
Baseline characteristics were obtained from patients’ medical records. Age, gender, type of coronary event (ST-elevation myocardial infarction, non-ST-elevation myocardial infarction, unstable angina pectoris), percutaneous coronary intervention, medication and comorbidity were registered. The patients’ height and weight were measured. The first test was performed one to two weeks after discharge from hospital, as this is the time point when the tests are performed at exercise-based cardiac rehabilitation. The second test was performed five to 10 days later to ensure recovery between tests and to minimise differences in results due to differences in physical fitness. The symptom-limited bicycle ergometer test, the dynamic unilateral heel-lift and unilateral shoulder-flexion tests were performed on the same day. The tests were performed at the same location and at approximately the same time of day to minimise the effects of diurnal variation. The participants were asked to abstain from strenuous exercise for 24 h prior to the tests and not to consume food, caffeine or nicotine for 2 h before each test session. Standardised instructions were given to the patients before each test was performed. The tests were performed by the same test leader, a physiotherapist with extensive experience of working on exercise-based cardiac rehabilitation.
Tests of physical fitness
Symptom-limited bicycle ergometer exercise test
The test was performed on a bicycle ergometer according to the WHO protocol 19 with an increased workload of 25 W every 4.5 min. 23 The initial starting load, 25 W or 50 W, is based on the patient’s exertion history. At rest while sitting on the bicycle (Monark ProVO2, Monark, Varberg, Sweden), patients were informed of the symptom-limited exercise test protocol and how to rate their perceived exertion according to the Borg rating of perceived exertion (RPE) scale, 24 dyspnoea and possible chest pain according to Borg’s Category Ratio Scale (CR-10). 24 Heart rate was registered at rest with a wireless heart-rate sensor (Polar H7, Polar Electro, Bromma, Sweden). Systolic and diastolic blood pressure were measured manually in both arms and registered for the arm with the highest blood pressure. If the pressure was equal in both arms, the right arm was used for further measurements. At 2 and 4 min of each workload, heart rate and the ratings on Borg’s scales were registered. At 3 min, the systolic blood pressure was registered. The exercise test was discontinued at 17 on Borg’s RPE scale and/or 7 on Borg’s CR-10 scale. Other criteria for discontinuing the test included chest pains, drop in blood pressure, failure to increase heart rate, dizziness or some other discomfort. The time at the last increment was noted. The patient was encouraged to remain on the bicycle unless dizzy and was monitored for 4 min. Heart rate and chest pains (if present) were registered at 2 and 4 min. Blood pressure was registered at 3 min. If the patient did not complete a full 4.5-min workload on the last increment, a corrected maximal workload was calculated using Strandell’s formula 25 (submaximal workload) + (25 × n/4.5), where the submaximal workload is the watt level prior to the step of termination and n is the number of minutes completed at the watt level of the end-point of exercise.
Unilateral shoulder flexion
The test was executed while sitting on a stool with a light back support and both feet on the ground holding a dumbbell (2 kg for women and 3 kg for men). The patient chose which shoulder to test. The patient was asked to elevate his/her arm, from 0° to 90° of flexion, as many times as possible using a pace of 20 contractions per minute maintained by a metronome (Matrix MR-500, Quartz Metronome). The test was terminated when the patient was unable to reach 90°, performed the motion with a bent elbow or was unable to maintain the pace. The number of repetitions was registered.
Unilateral heel lift
The test was performed unilaterally on a straight leg standing in the middle on a 10° tilted wedge. The patient chose which leg to test. The toes were placed at the front in the middle of the wedge and the other leg was held slightly above the floor. A support for balance was permitted. At first, the patient was asked to perform one maximal heel lift to mark the height on a measuring stick. The pace, 30 contractions a minute, was set with a metronome (Matrix MR-500, Quartz Metronome). The patient was then asked to perform as many heel lifts as possible. The test was terminated when the patient was unable to reach the mark, bent his/her knee or was unable to maintain the pace. The number of repetitions was registered.
Statistical analyses
Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS 20.0, Chicago, IL, USA). All the continuous data were normally distributed and were described as the mean ± one standard deviation (SD). Ordinal data was presented as median (25th-75th percentile). Absolute numbers and proportions were used to describe categorical variables. The intraclass correlation coefficient (ICC), absolute-agreement, two-way mixed effects single-measurement model with a 95% confidence interval (CI), was used to analyse the test–retest reliability. 26 The ICC varies from 0 to 1, where 1 is equivalent to perfect reliability. In this study, an ICC over 0.75 was considered excellent, 0.4 to 0.75 as fair to good and less than 0.4 as poor reliability. 27 A complementary standard error of measurement (SEM) and SEM% were presented in relation to the ICC to describe the within-subject variability attributable to repeated measurements. Responsiveness was calculated using the minimal detectable change with a 95% CI, which means that 95% of stable patients demonstrate a random variation of less than this amount when tested on multiple occasions. 28
Bland–Altman plots are presented to visualise the between-session difference versus the mean difference value of the two sessions for each test. For normally distributed data, the scatter should be evenly dispersed along the x-axis and the mean should be close to zero. Ninety-five per cent of the difference should be less than two standard deviations (limits of agreement) for good repeatability. 29 To compare systematic differences between test and retest, paired sample t-tests were used for interval data, wilcoxon signed rank test for ordinal data and Pearson’s Chi-Squere test for independence for nominal data. A value of p < 0.05 was considered statistically significant.
We conducted a sample size calculation based on a desired reliability coefficient of 0.90 as demonstrated for the muscle endurance tests in a previous reliability study 21 and a minimum acceptable coefficient of 0.80. 28 With a one-sided 95% CI, an alpha value of 0.05 and two testing sessions, a minimum sample size of 46 was required. 30
Results
The baseline characteristics of the study population are shown in Table 1. Descriptive data from the symptom-limited bicycle ergometer test, shoulder-flexion test and heel-lift test are listed in Table 2. The same shoulder and leg were tested both times. There were no significant differences between test 1 and test 2, except for the corrected workload (p=0.03) and the number of shoulder flexions (p=0.01), with a higher test score on test 2. In overall terms, individual reasons for terminating the symptom-limited bicycle ergometer test were the same for both test sessions. A few patients changed from submaximal exertion to submaximal dyspnoea or vice versa.
Baseline characteristics of the study population (N=40).
Nominal data are presented as n (%). Interval data are expressed as the mean ± 1 SD.
SD: standard deviation; CAD: coronary artery disease; STEMI: ST-elevation myocardial infarction; UAP: unstable angina pectoris; PCI: percutaneous coronary intervention; ACE: angiotensin-converting enzyme; COPD: chronic obstructive pulmonary disease.
Descriptive data from each test session (N=40).
Nominal data are presented as n (%). Ordinal data are described as median (25th - 75th percentile). Interval data are expressed as the mean ±1 standard deviation (SD).
Max: maximal; RPE: rating of perceived exertion; CR-10: category ratio scale; n.o,: number of.
Test–retest reliability and responsiveness to change
The results showed excellent test–retest reliability for the symptom-limited bicycle ergometer exercise test (ICC 0.98, 95% CI 0.96–0.99), as well as the unilateral shoulder-flexion test (ICC 0.87, 95% CI 0.75–0.93) and the unilateral heel-lift test (ICC 0.84, 95% CI 0.71–0.91). The SEM and MDC show the test–retest differences in absolute values, using the same unit as the test (Table 3).
Test–retest reliability scores, SEM and MDC for all three tests.
SEM: standard error of measurement; MDC: minimal detectable change; ICC: intraclass correlation 2,1; CI: confidence interval; n.o.: number of.
In addition, Bland–Altman plots showed that the between-session difference against the mean difference of the test–retest values was close to zero. A few observations were positioned outside the limits of agreement. There were no obvious patterns of heteroscedasticity or proportionality. Further information is given in Figures 2–4.

Bland–Altman plot of the symptom-limited bicycle ergometer exercise test showing the mean of test and retest (horizontal axis) versus the difference between test and retest (vertical axis). The lines represent the mean difference and the mean difference ± 2 SD.

Bland–Altman plot of the shoulder elevation test showing the mean of test and retest (horizontal axis) versus the difference between test and retest (vertical axis). The lines represent the mean difference and the mean difference ± 2 SD.

Bland–Altman plot of the heel-lift test showing the mean of test and retest (horizontal axis) versus the difference between test and retest (vertical axis). The lines represent the mean difference and the mean difference ± 2 SD.
Discussion
The results of this study support the reliability of using the symptom-limited bicycle ergometer test, the unilateral shoulder-flexion test and the unilateral heel-lift test in patients with ACS. Determining the reproducibility of a test is essential when it is used to detect improvements in physical fitness in clinical practice, as well as in research.
The test–retest reliability of the symptom-limited bicycle ergometer test was found to be excellent in typical clinical conditions in an exercise-based cardiac rehabilitation setting. This result is consistent with a study by Michelsen 23 evaluating physical capacity in a maximal bicycle ergometer test in healthy men by using two different principles for increasing workload; stepwise increments of 50 W every 4 min or a continuous increment of 10 W at the end of every minute. Better reproducibility and a lower error percentage of cumulative work and heart rate were found for stepwise increments as compared with continuous increments. The drawback of exercise test protocols with large stage-to-stage increments in workload is, however, a weaker relationship between the measured VO2 and work rate.14,16,31 The test can be terminated because of local fatigue or orthopaedic factors in the thighs or knees, rather than cardiopulmonary end-points. 31 This may be especially pronounced in subjects who are elderly or have poor muscle strength.14,23 Increments of 25 W may therefore be regarded as a more appropriate, commonly used increase in patients with cardiovascular disease. 14 In addition, a recent systematic review has concluded that, based on validity measurements, submaximal step-test protocols are an acceptable method of estimating VO2max in the general population. 32 To date, however, the test–retest reliability of the majority of protocols has not been evaluated.
The current ESC 4 and AHA 14 guidelines for cardiac rehabilitation strongly recommend exercise testing before starting exercise-based cardiac rehabilitation programmes and regard submaximal exercise evaluation as feasible in clinical practice.4,33 Stepwise load increment exercise tests reaching steady state at each particular load may be a useful tool for prescribing individualised exercise programmes within cardiac rehabilitation. 15 A rating of perceived exertion can also be used to further refine exercise intensity and assist the patient in self-monitoring during exercise and the performance of other physical activities. 14 Although there is some variation among patients in their actual rating of perceived exertion, they appear to rate consistently between tests, 14 and ratings correlate sufficiently well with heart rate and VO2.33,34
The test–retest reliability of the unilateral shoulder-flexion test was excellent, with an ICC of 0.87, which is in agreement with the results of a previous study of patients with congestive heart failure (ICC 0.96) and in healthy adults (ICC right shoulder 0.8 and left shoulder 0.96). 21 Moreover, the unilateral heel-lift test showed excellent reliability with an ICC value of 0.84, which is slightly less compared with the same previous study of patients with congestive heart failure and healthy adults, where the ICC values were 0.98 and 0.94 for the right leg and 0.94 and 0.93 for the left leg respectively. 21 Similarly, Ross and Fontenot 35 reported an ICC value of 0.96 when testing healthy subjects. The present study followed the SEPHIA protocol, stating that only one shoulder and one leg as chosen by the patient are tested, which might have influenced our result, as compared with previous studies. Additional studies are needed to investigate intra-rater and inter-rater reliability for the symptom-limited bicycle ergometer test, the dynamic unilateral heel-lift and unilateral shoulder-flexion tests in patients with ACS.
Since no previous studies have reported absolute reliability in terms of SEM and MDC for these tests of physical fitness, our findings cannot be compared. As absolute reliability is expressed in the same units as the test of interest, results are easy to interpret in clinical practice. The values obtained in our study can be used to assess whether changes in muscle endurance (repetitions) and submaximal work capacity (watts) are due to a real change and not only to measurement error. For example, we found an MDC of six repetitions for the heel-lift test, which means that an improvement of six repetitions would be sufficient to know that measurement error has been exceeded. This change does not necessarily mean that a change in six repetitions are meaningful for the patients, however. For this purpose, the minimal clinical important difference should be evaluated in future studies.
Methodological considerations
Some of the day-to-day variation between the tests may be related to uncontrolled sources of variance, like food and beverage intake, but also to daily differences in subjective symptoms like fatigue. 36 Another important factor to consider is motivation, which is multi-faceted, and the upper limit of a performance is determined by whether the motivation or goal justifies the effort required. 37 From clinical experience, the motivational factor in this study is most important in relation to the shoulder-flexion test. When using a standardised dumbbell of 2 kg for women and 3 kg for men, some individuals are capable of performing many repetitions and may terminate the test due to motivational aspects rather than muscle fatigue. This is a limitation of the test, but, on the positive side, a standardised protocol makes results comparable.
The test leader may also influence the motivation in terms of verbal encouragement and facial expression.37,38 In the present study, the same experienced test leader performed all the tests, following standardised protocols according to SEPHIA, and no verbal encouragement was given, to minimise any impact of this kind. It was, however, still difficult to decide exactly when to terminate the muscle endurance tests, that is, to decide visually when the patient no longer reached 90° of shoulder flexion. This may have influenced the results and it would be interesting to study inter-rater reliability in future studies.
The short time period between test and retest may have caused some patients to remember the number of repetitions performed on the first test, which becomes a motivator of importance in strength testing. 38 In test–retest studies, the time frame needs to be long enough to prevent learning, carry-over effects and recall but short enough to prevent any actual change in physical fitness occurring.28,36 We chose the time period of five to 10 days between test and retest to make our results comparable with those of previous studies. 21
There was a systematic difference between test and retest for the bicycle ergometer test and for the shoulder-flexion test, with a slightly higher result on the second test. It is possible that some degree of learning effect may have occurred between the two test sessions. Furthermore, it is possible that patients experiencing fear of movement, kinesiophobia, at the first test session may have reduced their fear at retest and were consequently able to enhance their physical performance. 39 On the other hand, as the absolute difference between test sessions was small, the clinical relevance of the systematic difference is doubtful.
The study has some limitations on its generalisability. Women and ethnic minorities were underrepresented and moreover patients had fairly low co-morbidity. When compared with previous studies, the same factors have been associated with limited participation in cardiac rehabilitation. 40 Future studies are needed to confirm the reliability of the tests of physical fitness in the present study also for these sub-groups. In terms of generalisation, it is also important to consider that patients with a greater interest in exercise are more likely to attend this kind of study. Regarding sample size, we used a method based on a functional approximation to earlier exact results for a given number of observations, including the null (acceptable ICC) and alternative (desired ICC) values of reliability, significance level and power. 30 According to Portney and Watkins, 28 an ICC-value above 0.75 is indicative of good reliability. If we had used 0.75 as acceptable ICC in the sample size calculation a minimum sample size of 26 would have been required. For many clinical measurements, however, a higher ICC-value is recommended to justify sound clinical judgements. 28 Therefore, we chose an acceptable ICC-value of 0.8, which gave a recommended sample size of 46 patients. We included 47 patients, but seven patients were lost to retest, leaving 40 patients with complete test–retest data. This somewhat small sample size must be considered when interpreting the results. Results from a lower sample size may be valid, but more uncertain and with increased risk of type II error.
Clinical implications
The excellent reliability of the tests of physical fitness included in the SWEDEHEART register now make them available as outcomes to the benefit of patients, clinicians, researchers and health care administrators. As simple and affordable tests we recommend that they can be widely used in order to compare and evaluate treatment effects for patients with ACS. Since absolute reliability is expressed in the same units as the test of interest, results of minimal detectable change for each test are easy to interpret in clinical practice in terms of the amount of change that must be achieved to reflect a true difference within similar populations.
Conclusion
In conclusion, the results of this study showed excellent test–retest reliability in the symptom-limited bicycle ergometer exercise test, the unilateral shoulder-flexion test and the unilateral heel-lift test in patients with ACS. The absolute reliability for each test must be considered when evaluating treatment effects over time.
Footnotes
Acknowledgements
The authors thank the Department of Physiotherapy and the Department of Cardiology at Örebro University Hospital for financial support.
The results support the test–retest reliability of the symptom-limited bicycle ergometer exercise test in measuring submaximal exercise capacity in patients with acute coronary syndrome. The results support the test–retest reliability of the unilateral shoulder-flexion test and the unilateral heel-lift test in measuring muscle endurance in patients with acute coronary syndrome. The ease of administration makes these tests of physical fitness included in the SWEDEHEART register clinically useful alternatives in exercise-based cardiac rehabilitation settings. Reliable, standardised tests are important in order to compare and evaluate treatment effects in large clinical populations of patients with acute coronary syndrome.
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
