Abstract
Background
There is no consensus regarding the minimum number of joints that should be included in an ultrasound (US) scoring system to reliably assess for disease activity in rheumatoid arthritis (RA).
Purpose
To assess whether simplified US protocols for hand examination are as informative as the examination of 22 joints in patients with RA, and to correlate the US parameters with disease activity (DAS-28).
Material and Methods
This is a cross-sectional study of 224 RA patients stratified based on their DAS-28 scores and assessed using eight preselected US examination protocols, including 22, 18, 16, 14, ten, eight, and two different combinations of four joints, respectively.
Results
We found a significant difference between US hand scores regarding their ability to detect active inflammation and erosions. DAS-28 scores correlated very well with the power Doppler (PD) scores generated by all eight US examination protocols (r = 0.89–1, P < 0.05), irrespective of patients' disease activity. Simplified US scores missed information on presence of PD in 20.6–40.2% patients (P < 0.05) and misdiagnosed non-erosive hand RA in 12–38.4% patients (P < 0.05), depending on the number of joints excluded from US hand examination.
Conclusion
Preselected simplified US scores are less reliable in appreciating the disease burden when compared with an extended protocol for 22 joint US examination, raising clinicians' awareness regarding the need to comprehensively assess multiple hand joints to reliably rule out subclinical inflammation.
Introduction
Rheumatoid arthritis (RA) is a chronic inflammatory condition associated with well-recognized inflammatory joint features, which are amenable to ultrasound (US) examination. The use of US facilitated a significant progress in the early diagnosis of RA, enabling a better assessment of the disease activity, prognosis, and response to different therapeutic interventions. The implementation of US scoring systems in addition to clinical examination could help standardizing the way RA is monitored; however, based on local availability of US and sonographer expertise, different scoring systems have been used in clinical practice. Despite significant research progress in supporting the role of US in RA, no consensus was reached with regards to what scoring system is the most useful. The OMERACT US Task Force defined the US pathology associated with RA (1), which combines tendon, joint, and bone abnormalities (1,2). The presence of power Doppler (PD) is recognized as a reliable objective measure of active joint inflammation (3). Different semi-quantitative scoring systems are currently used for assessing synovial hypertrophy (SH), joint effusion, tendon abnormalities and erosions (4), and protocols for hand and feet US examination are well-established (5).
A recent systematic review of the scoring systems used to evaluate synovitis in RA found it difficult to determine the least number of joints that needed to be assessed for a global US score (1). The purpose of our study was to investigate how much we can simplify the US examination of hands in RA, without compromising the ability of a certain US scoring system to evaluate the disease activity and damage associated with hand RA. The authors focused on the US examination of hands as this is the most commonly used in routine clinical practice.
Material and Methods
This is a real-life, cross-sectional study, which evaluated patients referred to our US rheumatology outpatient clinics, presenting with inflammatory sounding hand joint pains. The patients were referred based on clinician indication to have an US scan to help with identifying joint inflammation that was not confidently assessed clinically. We examined 604 patients between January 2012 and August 2015. For each patient, a set of demographic, clinical, and laboratory data were recorded at the time of the scan. Of 604 patients referred to our clinic, 224 patients with RA were included in the study analysis based on their final diagnosis made using the 2010 ACR/EULAR classification criteria, following complete investigations and revision of the clinical notes. Fig. 1 details the patient selection and stratification based on DAS-28 scores.
Flowchart of the study population.
This study evaluated the same set of reported outcomes and clinical and laboratory parameters for all the patients, to ensure homogeneity of the collected data. The following information was analyzed: disease duration (in months); hand tender joint count (TJC); swollen joint count (SJC); and a patient-reported global disease assessment score (GVAS).
Additional data about the high sensitivity C-reactive protein (hsCRP), erythrocyte sedimentation rate (ESR), presence of rheumatoid factor (RF), anti-citrullinated cyclic peptides antibodies (ACPA), and anti-nuclear antibodies (ANA) were collected at the time of the scan (needed to exclude associated pathology).
For each patient, a detailed record was compiled of their medication at the time of the US scan, including paracetamol and non-steroidal anti-inflammatory drugs (NSAIDs), disease-modifying anti-rheumatic drugs (DMARDs), biologic therapies, and glucocorticoids, either oral or intramuscular depot injection.
The US protocol examination used included the extensor tendons and 22 joint assessments (dorsal longitudinal and transverse views of wrists, including extensor tendons, metacarpo-phalangeal [MCP] joints, and proximal interphalangeal [PIP] joints), as per our local clinic protocol. The same US examination protocol was used for each patient, irrespective of their hand symptoms. The US findings were scored according to the OMERACT scoring system (1). The hand US examination was performed by two clinicians (CC and LA) in the same session. Consensus was obtained for each patient.
We used a Logiq S8 US machine (GE Healthcare, Wauwatosa, Wisconsin, WI, USA) equipped with a multi-frequency linear matrix array transducer (6–15 MHz). B-mode and PD machine settings were optimized and standardized for all our patients' US examinations. The settings used were: B-mode frequency = 11–15 MHz, depending on the depth of the anatomical area; Doppler frequency = 7.5–15, depending on the depth of anatomical area; Doppler gain = 18–20 dB; low wall filters; and pulse repetition frequency around 800 Hz. In this study, we only used PD mode.
The information collected comprised the following US parameters: SH grade (graded 1–3); erosions (present/absent); PD signal (graded 1–3); joint effusion (present/absent); osteophytes (present/absent); and tendon abnormalities (PD signal present/absent) using the US definition of joint pathology as defined by the OMERACT group (2) (Fig. 2 exemplifies two MCP joints with different SH and PD grades). Well controlled disease was defined as PD score zero (including joints and tendons).
Examples of MCP joint grading: (top) SH grade 3 and PD grade 2; (bottom) SH grade 2 and PD grade 3 (below).
To address our research question and assess how many joints would require scanning, and which joints are most likely to provide the answer as to whether or not there is active disease, we tested and compared the following scoring systems (bilateral examination):
22 joints (MCPs, PIPs, wrists); 18 joints (wrists, MCP 2–5, and PIP 2–5); 16 joints (MCP 2–5 and PIP 2–5); 14 joints (wrists, MCP 2–4, and PIP 2–4); 10 joints (wrists, MCP 2–3, and PIP 2–3); 8 joints (MCP 2–3 and PIP 2–3); 4 joints (wrists and MCP5); 4 joints (MCP 2–3).
The above joint combination score was selected based on our experience of performing US examination of hands in > 1000 patients, which identified that the most affected joints in RA were the wrists, MCP 2, 3 and 5, and PIP 2 and 3 (unpublished observation).
The SH grade 1 score was calculated as the total number of the joints with SH grade 1, the SH grade 2 score as the total number of the joints with SH grade 2, and the SH grade 3 score as the total number of the joints with SH grade 3 per patient. The total PD score was the sum of all individual PD scores per patient and the erosion score was calculated as the total number of erosions per patient.
Data about active inflammation affecting tendons overlying the above-mentioned joints were also collected and reported separately. The total gray-scale scores and PD scores for joints were calculated as a sum of the individual scores for all the joints included in the US examination protocol the score refers to. The duration of the US examination was approximately 25 min/patient. This 22-joint protocol is used routinely in our US clinics, which have 30-min slots for clinical and US examination of patients with RA.
Descriptive statistics were used to characterize the RA population, and Student's t-test, Mann–Whitney U and Kruskal–Wallis tests were implemented for the assessment of different parameters and US scoring systems (IBM SPSS Statistics 22, IBM Corporation, Armonk, NY, USA). A P value < 0.05 was considered a statistically significant result. Spearman's correlation coefficients were used to correlate permutations of pairs of US scores and the total PD scores with the disease activity, as assessed by the disease activity score assessing 28 joints (DAS-28).
The data were collected as standard of practice. The study analyzed cross-sectionally the results of the US examinations of patients seen in our US clinics over a defined period of time. No ethical approval or patient's consent were required as no patient information was used for teaching or new intervention research. The results of our study analysis had no impact on the clinical management of patients and their confidentiality was maintained.
Results
Comparison between RA patient groups stratified based on their DAS-28 scores using the 22-joint US scoring system as detailed above (Kruskal–Wallis test, P < 0.05 shows a significant difference between the patient groups).
There were no significant differences in the total US scores including the majority of US parameters, or in the disease duration or type of medication used (for both conventional and biologic DMARDs). The only significant difference was between the proportion of patients with SH grade 2 at the US examination of their hands, which was higher in patients with moderately active and strongly active RA (P < 0.05) (Table 1). The SH grade 2 total score also correlated with the SJC (r = 0.89, P < 0.05).
Comparison between eight different US scores (P < 0.05 was considered significant).
IQR, interquartile range.
Strong correlations were found between the PD score generated by the 22-joint examination and all of the other US score combinations (r = 0.68–0.74, P < 0.05). The scores that correlated very strongly were those assessing eight, ten, and 14 joints (r = 0.92–0.96, P < 0.05). The weakest correlation was found between the eight-joint and the four-joint score (wrist and MCP 5 bilaterally) (r = 0.28, P < 0.05) (Suppl. Table 1).
The permutation comparisons between pairs of US scores related to their ability to detect the presence of active joint inflammation found no significant differences between the total PD scores assessed by eight-, ten-, 12-, and 16-joint US scores and 10-, 12-, 16-, and 18-joint scores, respectively (Suppl. Table 2). Similarly, the total gray-scale score (combining the total scores for SH grades 2 and 3) identified no significant differences between the permutation comparisons between the scores assessing eight, ten, 14, 16, and 18 joints (Suppl. Table 3).
The analysis was also focused on correlating the total PD scores derived from all the pre-set US examination protocols with the DAS-28 scores in patients stratified based on their disease activity, to identify if certain US hand examination protocols can be used differentially in patients with active disease compared to patients in remission. All the total PD scores derived from the eight US examination protocols correlated very strongly with DAS-28 assessment, irrespective of how well the disease was controlled (r = 0.88–1, P < 0.005).
In addition, we have been interested in comparing the correlation of different US scoring systems with the disease activity as assessed by DAS-28, considering the clinical implications of using both clinical and US assessments. We found significant high correlations between all the total PD scores assessed by different US joint combination scores and DAS-28 scores (R = 0.88–0.99, P < 0.05), in the context of no significant different PD scores in patients stratified based on their disease activity (as detailed in Table 1).
Discussion
This is the first large cross-sectional study correlating different US examination protocols (derived from a 22-hand joint comprehensive score) with DAS-28 score in patients in RA, stratified based on their disease activity.
Quantitative and semi-quantitative US scores have been previously compared in RA (3), and US examinations have been found to be sensitive to therapeutic interventions (4–8). A comprehensive study comparing several US score systems in RA found that all were sensitive to change when assessing the response of RA patients to adalimumab (9,10). In addition, simplified US scores (including six or 12 joints) have previously been compared with extensive US protocol examinations (assessing 12 and 44 joints, respectively) and showed good sensitivity to change in three separate studies (11–13). However, none of these studies stratified patients based on their disease activity scores or included RA patients based on the clinical indication to have an US scan, as is the case with our study. The need to use a comprehensive US scoring system, capturing both active and chronic inflammatory changes for assessment of RA disease activity, is supported by the good correlation between US and magnetic resonance imaging findings (8,14). The presence of SH and PD signal was found to be associated with structural damage in RA (15), even in patients in clinical remission (16), and was associated with risk of flares (17,18).
The role and reliability of US in the disease activity assessment in patients with RA is supported by several studies (19–21).
Previous studies reported good correlation between hand US scores and DAS-28 assessment using three different US scores (22,23), a result that was also replicated by our study, which included a larger number of joint combinations, and also assessed US parameters stratified based on the DAS-28 scores.
Our comparative analysis of several US scoring systems showed that there is significant difference in terms of the equivalence of several US hand-scoring systems. Our study found that age, duration of symptoms, duration of disease, type of medication, and total PD score generated by US examination of hands were not able to inform about the inclusion of patients in one specific disease activity group, as patients stratified based on DAS 28 scores had similar parameters.
In addition to previous studies, we have been interested in exploring the amount and significance of missed information related to the use of simplified US hand examination protocols. A significant proportion of patients have been diagnosed as having well-controlled or non-erosive hand RA by using US protocols limiting the number of joints examined (10.6–40.2% and 12–34.8%, respectively). Our study found that the assessment of our preselected eight, ten, and 14 joints captured comparable amounts of information regarding disease activity in RA (still misdiagnosing around 40% of patients as having well controlled disease, equivalent to PD score zero), while the two four-joint scores missed significant information when compared to the others (around 60% patient were diagnosed in remission despite having active disease at least in one joint). The scores including 20 and 22 joints captured more information than the eight-, ten-, and 14-joint scores, even if all the eight US scores we explored correlated very well with the DAS-28 assessment. This is particularly relevant for our patient group, characterized by a small number of active joints and clinical indication to have an US scan to establish if their disease was well controlled or not. In this context, underdiagnosing active disease would have erroneously led to classifying our patients as being in remission. The clinical consensus is that we cannot predict which joints are the most likely to flare in patients with RA patients; therefore, examining only the joints that previously flared using a patient-tailored US protocol is not justified.
Even if a comprehensive hand joint score is time-consuming, it can provide significant additional information compared to a simplified score, as our study showed. As expected, all the scores correlated very well with each other, because they are derived from a comprehensive US hand score, while missing significant information proportional to the number of joints excluded from US examination. All the preselected US hand scores correlated with the disease activity scores, despite the fact that the patient groups stratified based on disease activity had similar median total PD scores. This showed that subclinical inflammation can be found in similar proportion in RA patients with different DAS-28 scores, as DAS-28 scores usually reflect a combination of active and chronic joint changes.
Our study has some limitations as no strict inclusion criteria were used; the patients were included based on clinical indication to exclude subclinical synovitis. Therefore, there is a significant selection bias, as the study did not capture patients with obvious active synovitis detected by clinical examination. In this particular clinical context, detection of active disease in at least one joint is clinically relevant, as US examination triggered treatment optimization to minimize joint damage (e.g. guided steroid injection targeting the active joints or escalation of therapy).
In conclusion, even if simplified US scores for hand assessment of RA disease activity can be useful in practice, by examining additional joints, clinicians are able to detect subclinical inflammation, which is not captured by the simplified US scores. If previous studies reassured clinicians that various US examination protocols correlated well with the DAS-28 assessment or were sensitive to change following therapy, our study showed that a significant proportion of patients can be misclassified as having well-controlled or non-erosive disease as a result of simplified US protocols. Further studies, including large longitudinal cohorts, are needed to establish the smaller number of joints needed to be examined to minimize the risk of under detecting subclinical inflammation in patients with hand RA.
Supplemental Material
Supplementary Table 1 -Supplemental material for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis
Supplemental material, Supplementary Table 1 for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis by Priyanka Sivakumaran, Sidra Hussain, Laura Attipoe and Coziana Ciurtin in Acta Radiologica
Supplemental Material
Supplementary Table 2 -Supplemental material for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis
Supplemental material, Supplementary Table 2 for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis by Priyanka Sivakumaran, Sidra Hussain, Laura Attipoe and Coziana Ciurtin in Acta Radiologica
Supplemental Material
Supplementary Table 3 -Supplemental material for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis
Supplemental material, Supplementary Table 3 for Diagnostic accuracy of simplified ultrasound hand examination protocols for detection of inflammation and disease burden in patients with rheumatoid arthritis by Priyanka Sivakumaran, Sidra Hussain, Laura Attipoe and Coziana Ciurtin in Acta Radiologica
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. CC was funded by a Biomedical Research Council Funding grant – BRCIII/001.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
