Clinician-administered performance-based tests via telehealth in people with chronic lower limb musculoskeletal disorders: Test–retest reliability and agreement with in-person assessment

Abstract

Introduction

Uptake of telehealth has surged, yet no previous studies have evaluated the clinimetric properties of clinician-administered performance-based tests of function, strength, and balance via telehealth in people with chronic lower limb musculoskeletal pain. This study investigated the: (i) test–retest reliability of performance-based tests via telehealth, and (ii) agreement between scores obtained via telehealth and in-person.

Methods

Fifty-seven adults aged ≥45 years with chronic lower limb musculoskeletal pain underwent three testing sessions: one in-person and two via videoconferencing. Tests included 30-s chair stand, 5-m fast-paced walk, stair climb, timed up and go, step test, timed single-leg stance, and calf raises. Test–retest reliability and agreement were assessed via intraclass correlation coefficients (ICC; lower limit of 95% confidence interval (CI) ≥0.70 considered acceptable). ICCs were interpreted as poor (<0.5), moderate (0.5–0.75), good (0.75–0.9), or excellent (>0.9).

Results

Test–retest reliability was good-excellent with acceptable lower CI for stair climb test, timed up and go, right leg timed single-leg stance, and calf raises (ICC = 0.84–0.91, 95% CI lower limit = 0.71–0.79). Agreement between telehealth and in-person was good-excellent with acceptable lower CI for 30-s chair stand, left leg single-leg stance, and calf raises (ICC = 0.82–0.91, 95% CI lower limit = 0.71–0.85).

Discussion

Stair climb, timed up and go, right leg timed single-leg stance, and calf raise tests have acceptable reliability for use via telehealth in research and clinical practice. If re-testing via a different mode (telehealth/in-person), clinicians and researchers should consider using the 30-s chair stand test, left leg timed single-leg stance, and calf raise tests.

Keywords

Telehealth telerehabilitation videoconferencing knee pain hip pain foot pain validity reliability performance-based tests physiotherapist outcome measures musculoskeletal measurement clinimetric properties

Introduction

Chronic lower limb musculoskeletal conditions, the most common of which include osteoarthritis, rheumatoid arthritis, and problems arising from accident/injury,¹ are major public health problems with enormous social, personal, and economic burden.² These conditions often give rise to chronic pain, causing individuals to seek care. In 2019, approximately 1.7 billion people worldwide had at least one musculoskeletal condition, an increase of 62% since 1990,³ contributing to one of the highest expenditures of all health conditions.^4,5

Chronic musculoskeletal conditions frequently lead to impairments in body functions, impacting an individual's ability to perform daily activities and limiting social and workforce participation.^6–8 As such, an important part of healthcare is the measurement of impairments and function over time in order to evaluate treatment response and adjust treatment if required. A range of self-report measures (e.g. questionnaires) and/or performance-based tests⁸ are typically clinician-administered and often used in both clinical and research settings. While easy to implement, questionnaires are highly subjective and influenced by personal and psychosocial factors like depression, self-efficacy, and pain,⁹ and evidence suggests that performance-based measures in people with musculoskeletal conditions assess different aspects of physical function than self-reported measures alone.^10–13 As such, the use of performance-based tests (e.g. measurement of walking speed, stair climbing ability, standing from sitting, balance, and strength) is important.

Physiotherapists are common providers of care for people with chronic musculoskeletal pain¹⁴ and traditionally, care has been provided in person. However, physiotherapists are increasingly delivering care via telehealth to patients who may otherwise have difficulty accessing services,^15,16 and there is evidence that these models of service delivery are effective.^17,18 Implementation of telehealth-delivered physiotherapy services, particularly using videoconferencing,^19,20 has exploded during the COVID-19 pandemic.^21,22 Whilst clinicians have reported using a range of functional tests to assess patients via telehealth,¹⁹ the reliability of administering such tests via videoconferencing, and their level of agreement with in-person test scores, is unclear. Indeed, clinicians have noted that difficulties with objectively assessing patients are a barrier to the implementation of telehealth.^19,20,23

There is emerging evidence that assessment of musculoskeletal disorders via telehealth is reliable and has a sufficient agreement with in-person assessment.^24,25 A recent systematic review identified 39 studies examining the clinimetric properties of physiotherapy assessments via telehealth, concluding that they were reliable for specific types of assessment in limited populations.²⁵ However, only four of the 39 studies focused on telehealth assessment of people with chronic lower limb musculoskeletal conditions.^26–29 Those four studies used sophisticated videoconferencing systems involving remote-controlled patient cameras and inbuilt measurement tools to quantify patient physical performance.^26–29 Such systems are not readily available to clinicians or patients, and thus findings from those studies are not generalisable to the telehealth services that are typically implemented in routine clinical practice, which tend to utilise simple internet-based videoconferencing software (e.g. Zoom) or videoconferencing modules within clinical software applications (e.g. Physitrack).¹⁹ Another recent systematic review of psychometric properties of performance-based tests via telehealth for people with chronic conditions concluded that the current evidence is of low to very low quality, reflecting the small number of studies that have been conducted.³⁰ Thus, the aims of this study were to investigate the: (i) test–retest reliability of clinician-administered performance-based tests via telehealth, and (ii) agreement between scores obtained via telehealth and in-person for people with chronic lower limb musculoskeletal pain.

Methods

Study design

A prospective within-participant repeated-measure design was used with participants assessed on three occasions (twice via telehealth and once in-person) over approximately two weeks. This study was reviewed and approved by the University of Melbourne Human Ethics Advisory Group and all participants provided informed written consent.

Participants and sample size

Participants with chronic lower limb musculoskeletal pain were recruited from the community in metropolitan Melbourne, Australia, via advertisements in online newsletters and on social media, as well as invitations emailed to our Centre's research volunteer database. Eligible participants were aged ≥45 years, had current chronic lower limb musculoskeletal pain (pain at any lower limb site on most days of the past three months) that interfered with function, had the pain of ≥2 out of 10 on an 11-point numeric rating scale, and had access to the internet and a device capable of videoconferencing (e.g. tablet or laptop/desktop computer). Participants were ineligible if they failed Exercise and Sports Science Australia adult pre-exercise screening,³¹ had any hearing or visual impairments that would preclude adequate participation in the telehealth assessment, had had any falls in the prior 12 months, had any neurological conditions that affected their balance, or were unable to understand written/spoken English.

A pre-defined intraclass correlation coefficient (ICC) of 0.80 was set as the optimal level of reliability or agreement for each performance-based test, with a minimum acceptable ICC of 0.70.³² Using the estimation method,³³ a total sample size of 51 is required for an expected reliability of 0.80 with two measurements and a 95% confidence interval (CI) width of 0.2. To account for any potential dropouts (or participants excluded from completing an individual test due to safety concerns or excluded from analyses because of changes in their condition over time (described below)), we anticipated 10% attrition and aimed to recruit 57 participants.

Clinician assessors

Physiotherapists were recruited as our clinician assessors. To ensure physiotherapist availability for assessments and to maximise generalisability, we recruited four registered physiotherapists with at least one year of clinical experience to conduct assessments (demographic details in Appendix 1). Physiotherapists were trained in measurement protocols and underwent a mock in-person and telehealth testing session with one of the researchers prior to the recruitment of participants.

Performance-based tests

Participants performed up to seven performance-based tests at each testing session (Table 1): (1) 30-s chair stand; (2) 5-m fast-paced walk; (3) stair climb test; (4) timed up and go; (5) step test (on each leg); (6) timed single-leg stance (on each leg); and (7) calf raise test (on both legs and each single leg).

Table 1.

Overview of performance-based tests performed in-person and via telehealth.

Brief description of procedure	Adaptations for telehealth sessions	Outcome
30-s chair stand ³⁶ (domain: strength)
Participant sits in a chair with arms crossed across the chest from sitting position, participant stands up so hip and knees are fully extended, then sits back down, so that bottom fully touches seat. Repeat for 30 s.	Participant: - Uses own chair - Positions camera so that it faces side-on to chair Physiotherapist: Observes performance and counts repetitions	Number of repetitions
5-m fast-paced walk ³⁶ (domain: function)
Participant walks as quickly as possible, without running, along 5-m flat walkway.	Participant: - Measures out a suitable walkway and marks each end with a cone/marker - Positions camera so that it faces down length of walkway Physiotherapist: Observes performance and measures time taken	Speed (seconds per metre)
Stair climb test ³⁶ (domain: function)
Participant ascends and descends a flight of stairs as quickly as possible.	Participant: - Must have stairs with at least four steps in a straight line (i.e. not curved staircase) otherwise test is not performed - Positions camera so that it faces up the flight of stairs Physiotherapist: Observes performance and measures time taken	Speed (seconds per step)
Timed up and go ³⁶ (domain: function)
Participant sits in a chair, stands up, walks at a regular pace to a mark 3 m away, turns around and returns to sit back in the chair.	Participant: - Uses own chair - Measures out 3 m from the chair and places a cone/marker - Positions camera so that physiotherapist can see chair and marker at 3 m Physiotherapist: Observes performance and measures time taken	Time taken
Step test ³⁷ (domain: balance)
Participant stands at bottom of a step with both feet on ground. Test leg remains grounded while opposite leg is stepped up onto the step and back down as fast as possible. Repeated for 15 s.	Participant: - Has step indoors or outdoors, or small box on floor, otherwise test is not performed - Measures out 5 cm from the step and places a marker - Positions camera side-on to the step Physiotherapist: Observes performance and counts repetitions	Number of repetitions on each leg
Timed single-leg stance ³⁷ (domain: balance)
In a standing position, participant lifts one leg off ground and balances for as long as possible or for a maximum of 30 s.	Participant: - Uses a flat area of floor near a support (wall, back of chair); - Positions camera front onto their body Physiotherapist: Observes performance and measures duration	Time taken on each leg
Calf raise test ³⁸ (domain: strength)
In (i) bilateral (ii) unilateral standing positions, participant raises heel(s) from floor onto toes. Participant completes as many raises as possible in 30 s.	Participant: - Uses flat area of floor near a support (wall, back of chair); - Positions camera front onto their body Physiotherapist: Observes performance and counts number of repetitions	Number of repetitions on: both legs; left leg; right leg

Physiotherapists were provided with a standardised testing manual that described each performance-based test (Appendix 2 with a simplified clinician user manual available online³⁴), which was adapted from existing resources describing administration in in-person settings.³⁵ Physiotherapists used their discretion to determine whether an individual participant should be excluded from performing a test if there were safety risks (e.g. those with a gait aid did not perform the single leg stance test). Participants without necessary ‘equipment’ in their own homes were unable to complete all tests (e.g. those without any stairs in their homes did not perform the stair climb test).

Testing sessions

Participants underwent three testing sessions over approximately two weeks: one in-person and two via telehealth. The order of assessments (in-person or telehealth), the assessing physiotherapist, and the order of performance-based tests (clustered by whether the test required shoes or bare feet) for each participant was randomised using password-protected software (REDCap™). The first two sessions (i.e. the in-person and first telehealth session) were performed within 1–3 days of each other at approximately the same time of day. To evaluate test–retest reliability, the second telehealth session was performed approximately two weeks later to allow sufficient time to limit recall of test scores whilst also limiting the chance of real change in clinical status (consistent with other studies evaluating test–retest reliability of performance-based tests in people with musculoskeletal conditions^39–41). The same physiotherapist assessed the same participant at all three sessions. The same order of tests as for the first session was followed for each subsequent session. Participant-reported global rating of change in pain was assessed (via a 5-point Likert with response options: ‘much worse’, ‘slightly worse’, ‘no change’, ‘slightly better’, and ‘much better’) prior to the second and third sessions to ensure only those whose condition was stable across test sessions were included. Participants who recorded ‘much worse’ or ‘much better’ did not complete the testing session.

Participants were assessed in-person in the human movement laboratory at the University of Melbourne. Equipment (e.g. chairs, tape measures, cones) was set up prior to each assessment. Physiotherapists actively guided participants through each test using the instructions described in the testing manual (Appendix 2).

Telehealth assessments were undertaken using Zoom (Zoom Video Communications, Inc., San Jose, CA). Participants were located in their own homes (or other private location) while physiotherapists were located at the University or their home/workplace. Prior to the session, participants were provided with instructions on how to download and use Zoom and a list of simple equipment to have on-hand during the session (including a chair without wheels (not of a specific height), a tape measure, and four cones/markers). Participants were encouraged to use a laptop or tablet, if possible, otherwise, a smartphone was acceptable. Physiotherapists verbally instructed the participant through each test using the instructions described in the testing manual (Appendix 2), including how to set up the environment (e.g. measure out a 5-m walkway) and where to position their device's camera so that the physiotherapist could view the participant performing the test.

Data collection

Participant demographic information was collected prior to the first assessment. Procedures for data collection for each performance-based test are described briefly in Table 1 and detail in Appendix 2.

For each test, clinical utility data were also collected, including the duration of each assessment session and whether the participant was unable to undertake any test (and reasons why). The type of equipment used by participants at home during the telehealth sessions was recorded. At the conclusion of each session, participants were asked to respond verbally to a series of 11-point numeric rating scales (NRS) evaluating: (i) how confident they felt with the method of assessment; (ii) how comfortable they felt during the session; (iii) how safe they felt during the session, and (iv) how easy it was to perform the tests within the session (on average for all tests).

Data analysis

Descriptive statistics were summarised for demographic and clinical utility data. Percentages of maximal scores (ceiling effects) were calculated for the timed single-leg stance test since the score for this test was capped at 30 s.

Test–retest reliability between telehealth assessments for each performance-based test scores was determined using ICCs, with 95% CIs, calculated using a two-way random-effects model. A Bland-Altman plot⁴² of the difference between paired measurements versus their mean was then generated for each test and included the mean difference, 95% CI and the 95% limits of agreement (estimated by mean difference ±2 standard deviation of the difference). Similar analyses were undertaken to assess agreement between telehealth and in-person tests.

Interpretation of ICCs was based on published recommendations of poor (ICC<0.5), moderate (0.5–0.75), good (0.75–0.9), or excellent (>0.9),⁴³ with an a priori optimal level of agreement of 0.8³² and minimum acceptable level of 0.7.³² Lower limits of 95% CIs were inspected to determine whether ICCs met the pre-determined acceptable threshold.

Statistical analyses were performed using Stata version 16.1 (StataCorp LLC, College Station, TX, USA).

Results

Participant characteristics

Fifty-seven participants were recruited (Table 2). Most were female (70%), aged 63.1 (standard deviation (SD) = 9.3) years, exercised four or more times per week (58%), had not used telehealth before for physiotherapy (93%), and used videoconferencing software for communication at least once a week (63%). The most common body part that was painful was the knee (72%) followed by the hip (58%). Around half (53%) had received a diagnosis for their pain problems, the most commonly reported of which were osteoarthritis (43%), non-specific arthritis (20%), bursitis (7%), torn meniscus (7%), or plantar fasciitis (7%). Fifty-three (93%) participants were included in the analyses of agreement between telehealth and in-person tests; 54 (95%) in the test–retest reliability analyses. Three participants withdrew from the study after only one testing session and four did not complete both telehealth tests. Average time between in-person and telehealth testing was 2.4 (SD = 1.9) days and between the two telehealth tests was 14.6 (SD = 2.4) days. Missing data for each individual test, and reasons why, are described in Appendix 3. A description of materials used for telehealth and in-person testing sessions is in Appendix 4.

Table 2.

Participant characteristics (n = 57).

	n (%), mean [standard deviation] or median {IQR}
Age (years)	63.1 [9.3]
Gender
Male	17 (30)
Female	40 (70)
Other	0 (0)
Ethnicity
Australian/New Zealander	40 (70)
Asian	5 (9)
European	11 (19)
South American	1 (2)
Highest level of education
Primary school	0 (0)
High school	6 (11)
Trade or trade certificate	4 (7)
University or tertiary institute degree	31 (54)
Higher university degree	16 (28)
Financial situation
Find it a strain to get by from week to week	0 (0)
Have to be careful with money	15 (26)
Able to manage without much difficulty	16 (28)
Quite comfortably off	17 (30)
Very comfortably off	3 (5)
Prefer not to answer	6 (11)
Employment status
Work full-time	13 (23)
Work part-time	18 (32)
Unable to work due to health reasons	1 (2)
Retired (not due to health reasons)	21 (37)
Unemployed/not employed	4 (7)
Body mass index (kg/m²)	27.5 {24.1–30.2}
Current exercise/physical activity level^a
0–1 times per week	8 (14)
2–3 times per week	16 (28)
4–5 times per week	22 (39)
6 + times per week	11 (19)
Body part affected by pain^b
Hip	33 (58)
Thigh	16 (28)
Knee	41 (72)
Calf	7 (12)
Ankle	15 (26)
Foot	16 (28)
How long experienced pain for (years)	2 {1–5}
Average pain (NRS)^c	5 {3–7}
Physical function (NRS)^d	5 {3–7}
Seen physiotherapist before
No	7 (12)
Yes, for my lower limb musculoskeletal condition	28 (49)
Yes, for another condition	22 (39)
Previous experience with telehealth for:
(i) physiotherapy	4 (7)
(ii) another health professional	20 (35)
Frequency of using videoconferencing for communication
Never	1 (2)
Once every few months	9 (16)
Once a month	11 (19)
Once a week	11 (19)
Several times a week	14 (25)
Every day	11 (19)

IQR = interquartile range (25th to 75th percentile).

Participants were asked ‘do you currently participate in regular exercise and/or physical activity?’

Not mutually exclusive; participants were able to choose all body parts that were affected

Rated on an 11-point scale ranging from 0 (no pain) to 10 (worst pain possible) in response to ‘select the number which indicates the average number of pain felt over the past week in the muscles and/or joints of your leg’

Rated on an 11-point scale ranging from 0 (not at all) to 10 (extremely) in response to ‘select the number which indicates how the problem with the muscles and/or joints of your lower limb have interfered with your physical function over the past week.’

Test–retest reliability of performance-based tests via telehealth

The estimated ICC for test–retest reliability for four tests were good to excellent (ICC = 0.84–0.91, Table 3) and above the pre-specified acceptable lower 95% CI threshold of 0.7 (95% CI lower limit = 0.71–0.79), including the stair climb, timed up and go, right leg timed single-leg stance, and calf raise tests. The estimated ICC for test–retest reliability for four tests was moderate to good (ICC = 0.69–0.81), however, the lower 95% CI did not reach acceptable levels (95% CI lower limit = 0.52–0.69), including the 30-s chair stand, 5-m fast-paced walk, step test, and left leg timed single-leg stance.

Table 3.

Test–retest reliability of performance-based tests measured via telehealth (n = 57).

Mean (SD) scores first telehealth session	Mean (SD) scores second telehealth session	Within participant difference between sessions (session 1 minus session 2), mean (95%CI) [N]	ICC (95% CI)	SEM	MDC₉₅
30-s chair stand (number repetitions)
12.7 (3.7)	13.6 (3.8)	−1.0 (−1.7, −0.4) [N = 53]	0.77 (0.60, 0.87)	1.74	4.81
5-m fast-paced walk (seconds)
3.3 (1.0)	3.3 (1.0)	0.1 (−0.1, 0.3) [N = 48]	0.71 (0.53, 0.82)	0.55	1.52
Stair climb test (seconds per stair)
1.4 (0.5)	1.3 (0.4)	0.1 (0.0, 0.2) [N = 26]	0.91 (0.79, 0.96)	0.14	0.38
Timed up and go (seconds)
7.7 (2.0)	7.7 (2.0)	0.1 (−0.2, 0.4) [N = 53]	0.86 (0.77, 0.92)	0.74	2.05
Step test (number repetitions)
Left leg
11.7 (3.2)	12.3 (3.3)	−0.7 (−1.2, −0.1) [N = 53]	0.79 (0.66, 0.88)	1.45	4.03
Right leg
12.3 (3.4)	12.7 (3.5)	−0.7 (−1.2, −0.1) [N = 52]	0.81 (0.69, 0.89)	1.44	3.99
Timed single-leg stance (seconds)
Left leg^a
24.9 (9.0)	24.5 (9.1)	−1.6 (−3.5, 0.2) [N = 51]	0.69 (0.52, 0.81)	5.13	14.23
Right leg^b
25.8 (8.0)	24.9 (9.3)	0.7 (−0.6, 2.1) [N = 52]	0.84 (0.74, 0.91)	3.37	9.34
Calf raise test (number repetitions)
Both legs
25.5 (10.0)	27.1 (10.6)	−1.8 (−3.4, −0.3) [N = 53]	0.84 (0.73, 0.91)	3.96	10.97
Left leg
22.0 (9.9)	24.1 (10.6)	−2.1 (−3.7, −0.4) [N = 47]	0.84 (0.71, 0.91)	3.98	11.03
Right leg
23.0 (10.1)	24.6 (11.1)	−1.6 (−3.3, 0.0) [N = 47]	0.85 (0.74, 0.91)	3.95	10.96

SD = standard deviation; ICC = intra-class correlation coefficient; CI = confidence interval; SEM = standard error of measurement; MDC95 = minimal detectable change at the 95% confidence limits.

Inspection of maximum scores showed a consistent ceiling effect, with 70% and 72%, being able to perform the test for the maximum of 30 s at the in-person and first telehealth testing session, respectively.

Inspection of maximum scores showed a consistent ceiling effect, with 71% and 70%, being able to perform the test for the maximum of 30 s at the in-person and first telehealth testing session, respectively.

SEM = SD in first telehealth session × √(1 – (intra-class correlation coefficient, ICC[2,1]))

MDC95 = SEM × 1.96 × √2

Normality and uniformity assumptions of the mean and SD of the differences between telehealth sessions appeared reasonable (Figure 1) except for the timed single-leg stance test for both the left and right leg, as few participants scored below the maximum of 30 s. Differences between paired telehealth measurements did not increase in magnitude substantially with higher counts or longer times.

Figure 1.

Bland–Altman plots of differences between telehealth sessions (session 1 minus session 2) versus averages of paired measurements for each performance-based test.

Agreement between telehealth and in-person performance-based tests

The estimated ICC for agreement for three tests was good to excellent (ICC = 0.82–0.91, Table 4) and above the pre-specified acceptable lower 95% CI threshold of 0.7 (95% CI lower limit = 0.71–0.85), including the 30-s chair stand, left leg timed single-leg stance, and calf raise tests. The estimated ICC for agreement for four tests ranged between moderate to good (ICC = 0.71–0.81), however, the lower 95% CI did not reach acceptable levels (95% CI lower limit = 0.52–0.69), including the stair climb, timed up and go, step test, and right leg timed single-leg stance. One test did not meet our minimum acceptable ICC and showed poor agreement (95% CI included values <0.5), which was the 5-m fast-paced walk (ICC = 0.55, 95% CI = 0.30–0.72).

Table 4.

Agreement between performance-based tests when measured in-person and by telehealth (n = 57).

Performance-based test	Mean (SD) scores in-person	Mean (SD) scores via telehealth	Within-participant difference between modes (in-person minus telehealth), mean (95%CI) [N]	ICC (95% CI)
30-s chair stand (number repetitions)	13.4 (4.1)	12.7 (3.7)	0.6 (0.0, 1.3) [N = 54]	0.82 (0.71, 0.90)
5-m fast-paced walk (seconds)	2.9 (0.8)	3.3 (1.0)	−0.4 (−0.6, −0.1) [N = 49]	0.55 (0.30, 0.72)
Stair climb test (seconds per stair)	1.4 (0.5)	1.4 (0.5)	−0.0 (−0.1, 0.1) [N = 28]	0.75 (0.52, 0.88)
Timed up and go (seconds)	7.6 (1.9)	7.7 (2.0)	−0.2 (−0.5, 0.2) [N = 54]	0.81 (0.69, 0.88)
Step test (number of repetitions)
Left leg	12.7 (3.2)	11.7 (3.2)	0.9 (0.3, 1.5) [N = 54]	0.75 (0.57, 0.86)
Right leg	13.1 (3.3)	12.3 (3.4)	0.9 (0.3, 1.4) [N = 53]	0.79 (0.63, 0.88)
Timed single-leg stance (seconds)
Left leg^a	25.4 (8.4)	24.9 (9.0)	0.4 (−1.0, 1.9) [N = 52]	0.82 (0.71, 0.89)
Right leg^b	24.6 (9.2)	25.8 (8.0)	−1.2 (−3.0, 0.6) [N = 53]	0.71 (0.55, 0.82)
Calf raise test (number of repetitions)
Both legs	26.0 (8.8)	25.5 (10.0)	0.6 (−0.9, 2.1) [N = 54]	0.82 (0.71, 0.89)
Left leg	22.4 (9.7)	22.0 (9.9)	0.4 (−0.8, 1.6) [N = 49]	0.91 (0.85, 0.95)
Right leg	24.4 (11.6)	23.0 (10.1)	1.5 (−0.1, 3.1) [N = 49]	0.87 (0.77, 0.92)

SD = standard deviation; ICC = intra-class correlation coefficient; CI = confidence interval.

Inspection of maximum scores showed a consistent ceiling effect, with 69% and 70%, being able to perform the test for the maximum of 30 s at the in-person and first telehealth testing session, respectively.

Inspection of maximum scores showed a consistent ceiling effect, with 69% and 71%, being able to perform the test for the maximum of 30 s at the in-person and first telehealth testing session, respectively.

Assumptions of normality and uniformity of the mean and SD of the differences were, again, unviolated (Figure 2) except for the timed single-leg stance test for both the left and right legs. Differences between paired measurements did not increase in magnitude substantially with higher counts or longer times.

Figure 2.

Bland–Altman plots of differences between methods (in-person minus telehealth) versus averages of paired measurements for each performance-based test.

Clinical utility of testing sessions

Participant confidence, comfort, safety, and ease of the performance-based tests were high and similar across telehealth and in-person sessions (Table 5), ranging from a mean of 8.5–9.8 out of 10 on an 11-point NRS for the in-person session, compared to 8.4–9.4 out of 10 for the telehealth sessions. The in-person testing session was, on average, shorter than the first telehealth session but of similar duration to the second telehealth session.

Table 5.

Clinical utility measures relating to in-person and telehealth testing sessions.

	Mean (SD) in-person session (n = 55)	Mean (SD) first telehealth session (n = 56)	Mean (SD) second telehealth session (n = 53)
Participant confidence in method of assessment^a	9.4 (0.9)	9.0 (1.3)	9.0 (1.3)
Participant comfort during assessment^a	9.4 (1.2)	9.1 (1.3)	9.2 (1.3)
Participant feeling of safety during assessment^a	9.8 (0.6)	9.3 (1.3)	9.4 (1.2)
Participant ease of performing session^a	8.5 (1.3)	8.4 (1.7)	8.7 (1.3)
Duration of session (mins)	27.1 (8.2)	36.2 (9.5)	27.9 (8.6)
		n (%)
Device used during telehealth sessions
Laptop computer		36 (63)
Tablet		15 (26)
Mobile phone		5 (9)
Participants per physiotherapist
Physiotherapist 1		17 (30)
Physiotherapist 2		18 (32)
Physiotherapist 3		19 (33)
Physiotherapist 4		3 (5)

Rated on 11-point Likert scales ranging from 0 (not at all) to 10 (very confident/very comfortable/very safe/very easy).

SD: standard deviation.

Discussion

This study aimed to investigate the test–retest reliability of clinician-administered performance-based tests via telehealth, and agreement between scores obtained via telehealth and in-person, in adults with chronic musculoskeletal pain. We found that the stair climb test, timed up and go, right leg timed single-leg stance, and calf raise tests demonstrated acceptable test–retest reliability via telehealth. The 30-s chair stand, left leg timed single-leg stance, and calf raise tests demonstrated acceptable agreement between scores obtained via telehealth and in-person.

To our knowledge, no previous studies have evaluated the test–retest reliability, or agreement with in-person assessment, of telehealth-administered stair climb, calf raise, timed single-leg stance, and 5-m fast-paced walk tests. However, a small number of studies have examined the 30-s chair stand, timed up and go, and step test Two studies found that the 30-s chair stand test had excellent test–retest reliability via telehealth (ICC = 0.95, 95% CI = 0.92–0.97)⁴⁴ and good–excellent agreement with in-person assessment (correlation coefficient = 0.95,⁴⁴ Krippendorff's alpha reliability estimate = 0.85²⁹ (95% CIs not reported in either)) in healthy young adults (without any health condition or musculoskeletal pain) assessed via Zoom⁴⁴ and in people after total knee arthroplasty assessed via sophisticated videoconferencing software (wide-angle cameras with remote-controlled panning/tilting).²⁹ Four studies found that the timed up and go test had excellent test–retest reliability (ICCs = 0.96–0.98, 95% CI lower limit = 0.86–0.98) and excellent agreement with in-person test scores (ICCs = 0.83–0.98, 95% CI lower limit = 0.27–0.96) via both simple (i.e. Zoom, WhatsApp, and Adobe Connect) and sophisticated (i.e. eHAB, which allows real-scale measurement of performance) videoconferencing software in people with chronic heart failure,⁴⁵ chronic obstructive pulmonary disease,⁴⁶ and in healthy older adults.^39,44 However, lower limits of 95% CIs for agreement between telehealth and in-person scores for two of those studies^39,45 fell below our pre-determined acceptable threshold. One study found that the step test had an excellent agreement with in-person assessment (weighted Cohen's kappa = 0.95–0.97 (95% CIs not reported)) via sophisticated eHAB videoconferencing software in people with Parkinson's disease.⁴⁰

Our ICCs appear to be lower than some of those previous studies described above.^{29,40,44–46} In addition, we found that some tests did not meet our lower 95% CI acceptable threshold for reliability or agreement with in-person scores. This may be because our study utilised pragmatic methods that could be easily implemented in clinical and/or research settings (e.g. utilising non-standardised objects/spaces within people's homes and freely available videoconferencing software on any suitable device). As such, there was variation in equipment and environments used (e.g. chair height for 30-s chair stand in-person and via telehealth differed by a mean of 3 cm, step height for the step test differed by a mean of 2.5 cm, and the number of steps in stair climb test differed by a mean of 1.3 steps (Appendix 4)), and, as participants set up each test themselves, there was likely some imprecision in distances measured, which all likely contributed to variability in test scores. In contrast, most other studies conducted telehealth assessments using standardised equipment,^{29,40,44–46} conducted tests in a clinical setting (rather than the patient's home) while the assessor was adjacent in another room,^29,40,44,45 and/or used sophisticated videoconferencing with in-built measurement tools.^29,40 This likely reduced the variability of their test scores but also limits the usefulness and generalisability of their findings. Indeed, one study³⁹ in healthy older adults utilised a similar pragmatic telehealth approach to ours and observed similar ICCs to ours (ranging from 0.79 to 0.87, with 95% CI lower limits all below our pre-determined acceptable threshold of <0.7) for the timed up and go and tests of balance/gait. Collectively, this suggests that clinicians or researchers who are considering pragmatically assessing performance via telehealth should be aware that there may be increased variability between telehealth test scores and reduced agreement with in-person tests.

Participant satisfaction with our telehealth testing sessions was high, indicating people with chronic lower limb musculoskeletal conditions feel confident, comfortable, and safe performing tests via telehealth, and that test requirements were not perceived as difficult to complete. However, telehealth did present some challenges. Participants' home environments often lacked appropriate space or stairs which meant that some tests could not be performed (e.g. stair climb and 5-m walk test). Although no adverse events were reported, some participants (2–11% of participants) were unable to complete some of the single-leg tests on account of safety/balance concerns. Finally, our first telehealth assessment sessions took approximately 10 min longer than the in-person assessment sessions, which has implications for the viability of administering the full suite of our performance-based tests via telehealth in some healthcare settings (e.g. private physiotherapy clinics that may have limited consultation time). This was likely because physiotherapists were required to guide the participant through the set up and procedure for each test, as well as instruct the participant on the necessary camera angles to ensure they had an appropriate view. However, our second telehealth session was shorter, and of a similar duration to the in-person session, suggesting that experience can help reduce the time required. Our physiotherapists followed a detailed testing manual (freely available online³⁴), which was vital to help them adapt each test to a telehealth setting and instruct patients from afar.

Our study had limitations. Our sample comprised mostly women (70%) who were well-educated (82% had completed a university or tertiary degree or higher) and who were physically active four or more times per week (58%). We also excluded those who were at risk of falls or who did not pass pre-exercise screening. As such, our findings may not be generalisable to men, people with lower levels of education, those who do not engage in regular physical activity, or those with balance issues or other health conditions that may affect their ability to exercise safely. Although we did not collect any data about whether participants sought professional care for their musculoskeletal condition between testing sessions, we assessed changes in clinical status between test sessions to ensure only those whose condition was stable across sessions were included.

In conclusion, the stair climb, timed up and go, right leg timed single-leg stance, and calf raise tests have acceptable reliability for use via telehealth in research and clinical practice. If re-testing via a different mode (telehealth/in-person), clinicians and researchers should consider using the 30-s chair stand test, left leg timed single-leg stance, and calf raise tests.

Abbreviations

ICC = intraclass correlation coefficient

95% CI = 95% confidence interval

Footnotes

Acknowledgements

The author(s) would like to acknowledge Mr Alex Kimp for his assistance in recruiting and phone screening participants.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported by Arthritis Australia. RSH is supported by a National Health & Medical Research Council Fellowship (#1154217) and KLB by an NHMRC Investigator grant (#1174431).

ORCID iDs

Belinda J Lawford

Mark Merolli

Appendices

References

Vos

Abajobir

Abate

, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: A systematic analysis for the global burden of disease study 2016. The Lancet 2017; 390: 1211–1259.

Ackerman

Pratt

Gorelik

, et al. Projected burden of osteoarthritis and rheumatoid arthritis in Australia: A population–level analysis. Arthritis Care Res (Hoboken) 2018; 70: 877–883.

Cieza

Causey

Kamenov

, et al. Global estimates of the need for rehabilitation based on the global burden of disease study 2019: A systematic analysis for the global burden of disease study 2019. The Lancet 2020; 396: 2006–2017.

Australian Institute of Health and Welfare. Disease expenditure in Australia. In: AIHW (ed) Canberra: AIHW, 2019.

The Burden of Musculoskeletal Disorders in the US. Cost to Treat Musculoskeletal Diseases: Bone and Joint Initiative; 2022 [Available from: https://www.boneandjointburden.org/2014-report/xe1/cost-treat-musculoskeletal-diseases.

Australian Institute of Health and Welfare. Musculoskeletal conditions and comorbidity in Australia. In: Australian Institute of Health and Welfare (ed). Canberra: Australia, 2019.

Centers for Disease Control and Prevention. National Statistics: Centers for Disease Control and Prevention; 2022 [Available from: https://www.cdc.gov/arthritis/data_statistics/national-statistics.html.

World Health Organisation. International Classification of Functioning, Disability and Health (ICF): World Health Organisation; 2022 [Available from: https://www.who.int/standards/classifications/international-classification-of-functioning-disability-and-health.

Terwee

van der Slikke

van Lummel

, et al. Self-reported physical functioning was more influenced by pain than performance-based physical functioning in knee-osteoarthritis patients. J Clin Epidemiol 2006; 59: 724–731.

10.

Gandhi

Tsvetkov

Davey

, et al. Relationship between self-reported and performance-based tests in a hip and knee joint replacement population. Clin Rheumatol 2009; 28: 253–257.

11.

Unnanuntana

Mait

Shaffer

, et al. Performance-based tests and self-reported questionnaires provide distinct information for the preoperative evaluation of total hip arthroplasty patients. J Arthroplasty 2012; 27: 770–5.e1.

12.

Dayton

Judd

Hogan

, et al. Performance-based versus self-reported outcomes using the hip disability and osteoarthritis outcome score after total hip arthroplasty. Am J Phys Med Rehabil 2016; 95: 132–138.

13.

Amris

Bandak

Kristensen

, et al. Agreement between self-reported and observed functioning in patients with rheumatoid arthritis, osteoarthritis, and fibromyalgia, and the influence of pain and fatigue: A cross-sectional study. Scand J Rheumatol 2021: 1–9. DOI: 10.1080/03009742.2021.1952755. Epub ahead of print.

14.

Australian Institute of Health and Welfare. Health-care expenditure on arthritis and other musculoskeletal conditions 2008-09. In: series

(ed). Canberra: AIHW, 2014.

15.

Greenhalgh

Wherton

Shaw

, et al. Video consultations for covid-19. BMJ 2020; 368: m998.

16.

Hwang

Elkins

Telephysiotherapy. J Physiother 2020; 66: 143–144.

17.

Snoswell

Chelberg

De Guzman

, et al. The clinical effectiveness of telehealth: A systematic review of meta-analyses from 2010 to 2019. J Telemed Telecare 2021: 1357633X211022907. DOI: 10.1177/1357633X211022907. Epub ahead of print.

18.

Snoswell

Stringer

Taylor

, et al. An overview of the effect of telehealth on mortality: A systematic review of meta-analyses. J Telemed Telecare 2021: 1357633X211023700. DOI: 10.1177/1357633X211023700. Epub ahead of print.

19.

Malliaras

Merolli

Williams

, et al. ‘It’s not hands-on therapy, so it’s very limited’: Telehealth use and views among allied health clinicians during the coronavirus pandemic. Musculoskeletal Science and Practice 2021; 52: Epub 2021 Feb 5.

20.

Barton

Ezzat

Merolli

, et al. “It's second best”: A mixed-methods evaluation of the experiences and attitudes of people with musculoskeletal pain towards physiotherapist delivered telehealth during COVID-19 pandemic. Musculoskeletal Science and Practice 2021: 102500. DOI: 10.1016/j.msksp.2021.102500. Epub ahead of print.

21.

World Physiotherapy. Impact of the COVID-19 Pandemic on Physiotherapy Services Globally. London, UK: World Physiotherapy, 2021.

22.

Monaghesh

Hajizadeh

. The role of telehealth during COVID-19 outbreak: A systematic review based on current evidence. BMC Public Health 2020; 20: 1–9.

23.

Bennell

Lawford

Metcalf

, et al. Physiotherapists and patients report positive experiences overall with telehealth during the COVID-19 pandemic: A mixed-methods study. J Physiother 2021; 67: 201–209.

24.

Mani

Sharma

Omar

, et al. Validity and reliability of internet-based physiotherapy assessment for musculoskeletal disorders: A systematic review. J Telemed Telecare 2017; 23: 379–391.

25.

Zischke

Simas

Hing

, et al. The utility of physiotherapy assessments delivered by telehealth: A systematic review. J Glob Health 2021; 11: 04072.

26.

Russell

Blumke

Richardson

, et al. Telerehabilitation mediated physiotherapy assessment of ankle disorders. Physiother Res Int 2010; 15: 167–175.

27.

Richardson

Truter

Blumke

, et al. Physiotherapy assessment and diagnosis of musculoskeletal disorders of the knee via telerehabilitation. J Telemed Telecare 2017; 23: 88–95.

28.

Cottrell

O'Leary

Swete-Kelly

, , et al. Agreement between telehealth and in-person assessment of patients with chronic musculoskeletal conditions presenting to an advanced-practice physiotherapy screening clinic. Musculoskeletal Science and Practice 2018; 38: 99–105.

29.

Cabana

Boissy

Tousignant

, et al. Interrater agreement between telerehabilitation and face-to-face clinical outcome measurements for total knee arthroplasty. Telemed J E Health 2010; 16: 293–298.

30.

Walsh C

Cahalan

Hinman

, et al. Psychometric properties of performance-based measures of physical function administered via telehealth among people with chronic conditions: A systematic review. PloS One 2022; 17: e0274349.

31.

Exercise and Sports Science Australia. Adult pre-exercise screening system 2012 [Available from: https://www.essa.org.au/Public/ABOUT_ESSA/Pre-Exercise_Screening_Systems.aspx?WebsiteKey=b4460de9-2eb5-46f1-aeaa-3795ae70c687.

32.

de Vet

HCW

Terwee

Mokkink

, et al. Measurement in Medicine - A Practical Guide. New York: USA: Cambridge University Press, 2011.

33.

Bonett

. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med 2002; 21: 1331–1335.

34.

Centre for Health EaSM. Performance-based tests for lower limb musculoskeletal pain: Centre for Health, Exercise and Sports Medicine; 2022 [Available from: https://healthsciences.unimelb.edu.au/__data/assets/pdf_file/0003/4147563/Manual-for-telehealth-administered-performance-based-tests-1.pdf.

35.

Dobson

Bennell

Hinman

, et al. Recommended performance-based tests to assess physical function in people diagnosed with hip or knee osteoarthritis: OARSI; 2011.

36.

Bennell

Dobson

Hinman

. Measures of physical performance assessments: Self–paced walk test (SPWT), stair climb test (SCT), six–minute walk test (6MWT), chair stand test (CST), timed up & go (TUG), sock test, lift and carry test (LCT), and car task. Arthritis Care Res (Hoboken) 2011; 63: S350–SS70.

37.

Choi

Dobson

Martin

, et al. Interrater and intrarater reliability of common clinical standing balance tests for people with hip osteoarthritis. Phys Ther 2014; 94: 696–704.

38.

Hébert-Losier

Newsham-West

Schneiders

, et al. Raising the standards of the calf-raise test: A systematic review. J Sci Med Sport 2009; 12: 594–602.

39.

Pelicioni

Waters

Still

, et al. A pilot investigation of reliability and validity of balance and gait assessments using telehealth with healthy older adults. Exp Gerontol 2022; 162: 111747.

40.

Russell

Hoffmann

Nelson

, et al. Internet-based physical assessment of people with Parkinson disease is accurate and reliable: A pilot study. J Rehabil Res Dev 2013; 50: 643–650.

41.

Dobson

Hinman

Hall

, et al. Reliability and measurement error of the osteoarthritis research society international (OARSI) recommended performance-based tests of physical function in people with hip and knee osteoarthritis. Osteoarthritis Cartilage 2017; 25: 1792–1796.

42.

Bland

Altman

. Measuring agreement in method comparison studies. Stat Methods Med Res 1999; 8: 135–160.

43.

Koo

. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15: 155–163.

44.

Güngör

Ovacık

Ertan Harputlu

, et al. Tele-assessment of core performance and functional capacity: Reliability, validity, and feasibility in healthy individuals. J Telemed Telecare 2022: 1357633X221117335. DOI: 10.1177/1357633X221117335. Epub ahead of print.

45.

Hwang

Mandrusiak

Morris

, et al.

Assessing functional exercise capacity using telehealth: Is it valid and reliable in patients with chronic heart failure?

J Telemed Telecare 2017; 23: 225–232.

46.

Ozsoy

Kodak

Kararti

, et al. Intra-and inter-rater reproducibility of the face-to-face and tele-assessment of timed-up and go and 5-times sit-to-stand tests in patients with chronic obstructive pulmonary disease. COPD: Journal of Chronic Obstructive Pulmonary Disease 2022; 19: 125–132.