Reliability and standard error of measurement of pressure pain thresholds at articular landmarks

Abstract

Background

Pressure pain threshold (PPT) testing is a widely used quantitative sensory testing method to assess mechanical pain sensitivity. While reference values for PPT at articular-/joint-related and osseous landmarks are available, data on measurement reliability remain limited. This study aimed to determine the intra- and inter-rater reliability of PPT measurements and to report the standard error of measurement (SEM) for joint PPT.

Methods

Healthy participants aged 18–40 years were included in a two-part study. In Part A (N = 40), intra-rater reliability was assessed through three repeated PPT measurements by the same examiner. In Part B (N = 40), inter-rater reliability was evaluated with two different examiners. PPT was measured at joints (elbow, knee, ankle) and reference landmarks (sternum, forehead) using a digital algometer. ICCs (2,k model) and SEM were calculated for each landmark.

Results

Intra-rater reliability was excellent at all landmarks (ICCs ≥ 0.926) except the elbow with a good reliability (ICC = 0.848). Inter-rater reliability was excellent at the sternum (ICC = 0.950) and good at the ankle, knee, elbow, and forehead (ICCs ≥ 0.813). SEM values varied across landmarks, with lowest values observed at reference landmarks sites (e.g., sternum and forehead). ICCs are similar between sexes.

Conclusions

PPT measurements at articular (joints) and bony reference landmarks show good to excellent reliability. These findings support the use of joint PPT in standardized pain sensitivity assessments and provide a basis for clinical and research applications. However, findings are limited to pain-free participants and should be replicated in broader populations.

Keywords

intra-class correlation coefficient reliability pain measurement pressure pain threshold pain sensitivity

Introduction

Quantitative Sensory Testing (QST) is a standardized psychophysical method used to assess the function of the somatosensory system by applying controlled mechanical, thermal, or pressure stimuli to elicit sensory responses.¹ Among the QST test battery, pressure pain threshold (PPT) testing using pressure algometry is employed to evaluate mechanical pain sensitivity. PPT is defined as the minimum pressure at which a mechanical pressure stimulus is first perceived as painful, offering a semi-objective alternative to solely self-reported pain scales such as visual analogue scales (VAS) or numeric rating scales (NRS). It enables the characterization of localized and widespread hyper- or hypoalgesia relevant in various clinical pain conditions as well as research settings.² Pain scales such as the VAS or NRS can validly quantify perceived pain intensity when administered with appropriate instructions and anchors.³ The combination of pain scales and PPT assessments may contribute to a more comprehensive assessment of pain sensitivity and pain perception.⁴ Indeed, measurements of PPT are widely employed as a more objective method to quantify pain sensitivity in various pain-related disorders.⁵ PPT can be described as a cost-effective and clinically feasible assessment outcome, characterized by properties (e.g., point of care device and time efficient) that facilitate its implementation in clinical settings for the evaluation and management of, for instance, musculoskeletal pain.⁶ Its application as an outcome measure for evaluating the effectiveness of multi-week interventions in musculoskeletal pain conditions has been increasingly recognized.^7,8 Furthermore, to evaluate the phenomenon of exercise induced hypoalgesia PPTs are usually utilized to measure pain sensitivity before and after an acute exercise session.⁹

Recently, reference values were published for PPT measured at the ankle, knee, and elbow joint as well as further osseous reference landmarks (i.e., sternum and forehead) in healthy individuals.¹⁰ However, the clinical and research application of these specific PPT data remains limited without corresponding information on measurement reliability and measurement error. Previous studies have demonstrated good to excellent PPT reliability in healthy participants and clinical populations, including, for instance, chronic low back pain and distal radius fracture cohorts.^11–13 In addition, methodological work has shown that reliability estimates may depend on the number of repetitions and on how inter-session and inter-repetition variability are modeled.^11,14 Although this body of literature provides important evidence for the reliability of PPT assessment in general, the transferability of these findings to joint-related landmarks is limited. PPT values have been shown to differ depending on tissue type and anatomical region, indicating that pressure pain sensitivity and measurement properties should be interpreted in a site-specific manner¹⁵ and previous work on, for instance, joint-related sites such as the temporomandibular joint capsule further indicates the relevance of site-specific PPT assessment.¹⁶ Consequently, reliability and measurement-error data derived from muscular landmarks cannot necessarily be generalized to articular or osseous landmarks. This is particularly relevant because joint-related structures are clinically important sources of nociceptive input in many musculoskeletal pain conditions.^17–19 Moreover, articular and osseous landmarks may offer methodological advantages because they are often more superficially located than muscular landmarks, potentially allowing more direct mechanical stimulation of the intended anatomical site with less influence of overlying soft tissue.^20,21 However, before applying such assessments in clinical populations, reliability should first be established under controlled conditions in pain-free individuals, where potential confounding by ongoing pain, pain sensitization, analgesic medication, structural joint pathology, or disease-related heterogeneity is minimized. Having reliable data on PPT for articular landmarks from a healthy population is essential because these sites are sources of nociceptive input in many musculoskeletal joint conditions.^22,23

From a physiological standpoint, articular structures are mainly innervated by high-threshold Aδ as well as polymodal and TrkA + C-fibers concentrated in periosteum, synovium, ligaments, and fat pads, which can become sensitized by, for instance inflammation, acidosis, or nerve sprouting. In contrast, muscle tissue contains a wider range of nociceptors (i.e., mechanosensitive, mechano-heat, and polymodal C-fibers) often activated by ischemia or sustained load. Further, muscle tissue has a high stimulation threshold and is therefore not stimulated physiological movement, muscle stretch, or moderate compression.²⁴ Thus, articular PPTs can potentially reflect localized joint sensitization, whereas muscle PPTs are more influenced by metabolic stress and diffuse nociceptor activation.²

This study therefore aims to provide data on intra- and inter-rater reliability of articular joint PPT measurements and to present data on standard error of measurement (SEM). Establishing both intra- and inter-rater reliability ensures that measurements are consistent across time and examiners, which is crucial for individual longitudinal monitoring and research standardization in diverse study designs. Furthermore, reporting the SEM provides clinicians and researchers with practical thresholds in the same units as the original measurement to distinguish true changes in pain sensitivity from deviations related to measurement error.

Methods

Study design

This study was conducted in two parts at the Department of Sports Medicine at the University of Wuppertal, Germany to assess the reliability of joint PPT measurements. Part A examined intra-rater reliability of healthy participants, each of whom was assessed three times by the same rater. Part B evaluated inter-rater reliability using a separate group of healthy participants, with measurements performed independently by two different raters. All raters were experienced research assistants holding a bachelor degree and received a minimum of 6 hours of the same standardized training in PPT assessment procedures prior to data collection.²⁵ To ensure consistency, all measurements were conducted at the same time of day for each participant. In both study parts, repeated measurements were separated by a minimum of 72 h to reduce the potential influence of temporal factors or sensitization effects. This study was conducted in accordance with the principles of good clinical and ethical practice and was approved by the local ethics commission. In accordance with the Declaration of Helsinki, all participants provided their written consent after being informed about the study protocol. The GRRAS checklist for reporting of studies of reliability and agreement was employed for this study.²⁶

Participants

For both study parts, the study sample comprised healthy and pain-free adults (convenience sample) recruited via word of mouth and flyers distributed at the local university as well as personal contacts. Inclusion criteria were an age between 18 and 40 years and engagement in a minimum of 150 min of physical activity per week²⁷ while not being a professional athlete.²⁸ Exclusion criteria included the presence of acute or chronic pain (NRS > 0), any form of joint disease or recent musculoskeletal injury, as well as regular use of analgesic medication or analgesic intake within 24 h prior to testing. This information was attained via a short structured verbal interview. Participants aged 18–40 years were deliberately selected to establish reliability estimates under controlled conditions in a young, pain-free adult sample. This age range was chosen to reduce heterogeneity related to age-associated differences in pressure pain sensitivity, degenerative joint changes, comorbidities, current pain, or medication use. Therefore, the present data should be interpreted as reliability estimates for young healthy adults and not as directly generalizable reference values for older adults or clinical populations with joint pathology. Participants were additionally instructed to refrain from vigorous physical activity for at least 24 h before each assessment session to avoid the potential effects related to exercise induced hypoalgesia.^29,30

Pressure pain threshold assessment

For both study parts, the same standardized PPT assessments were performed. All PPT measurements were conducted using a standardized procedure as recently described in detail elsewhere.¹⁰ A digital pressure algometer (FPX 25 Compact Digital Algometer, Wagner Instruments, Greenwich, CT, USA) with a 1 cm² rubber tip was used to apply pressure to five predefined anatomical landmarks. These included the right elbow (lateral joint space below the lateral humeral epicondyle), right knee (midpoint of the medial joint space below the medial femoral epicondyle), right ankle (lateral joint space between the lateral malleolus and the talus bone), the sternum (2 cm above its lower edge, near the xiphoid process), and the forehead (1 cm above the midpoint of the right eyebrow, at the supraorbital margin). Pictures of the measurements are to be found elsewhere.¹⁰ Prior to formal measurements, the procedure was explained in detail, and each participant completed at least one familiarization trial to minimize bias due to unfamiliarity with the stimulus. Before each trial, participants confirmed their readiness. They were instructed to verbally indicate (“stop”) when the applied pressure first became painful. Each landmark was measured three times using a circuit protocol,³¹ meaning that all five landmarks were assessed once before the next measurement round started. This procedure resulted in an interval of at least 40 s between two consecutive measurements at the same landmark. Pressure was applied at a rate of 10 Newtons (N) per second as described before.¹⁰ The order was randomized and the mean of the three measurements was used for analysis. Data are presented as N/cm².

Statistical analysis

For all reliability analyses, untransformed PPT data were employed, as recommended, since this approach enables an adequate and easy interpretation of untransformed results in the unit of measurement. Besides, due to the robust nature of ANOVA-based intraclass correlation coefficient (ICC) analyses and the sufficiently large sample size, the primarily analyses are performed using the original data.³² To assess the test-retest and inter-rater reliability of PPT of each landmark, ICCs were calculated. For both parts of the study, the ICC(2,k) model (two-way mixed-effects, absolute agreement, average measures) was used.³³ This was done for the entire population as well as stratified for male and female participants. ICC values are reported with 95% confidence intervals and can be interpreted as follows: < 0.50 = poor, 0.50–0.75 = moderate, 0.75–0.90 = good, and > 0.90 = excellent reliability.³⁴

The Standard Error of Measurement (SEM) was calculated using the formula: SEM = SD × √(1 – ICC), where SD is the mean standard deviation of measurements. This was performed for both parts of the study using the ICCs and SD of the respective study parts. In addition, the coefficient of variation (CV) was calculated as the standard deviation divided by the mean, expressed as a percentage, to assess the relative variability of the data.

Further, the Friedman test was used to explore whether PPT values were different between the three measurements performed during the intra-rater reliability testing. Besides, the Wilcoxon test was used to test for significant differences between measurements performed as part of the inter-rater reliability testing. In case of observed significant main effect, post-hoc tests with Bonferroni adjustment were performed. Data were analyzed using the statistical software IBM SPSS Statistics 29.0 and the p-value for significance was set at 0.05.

The sample size was calculated with an effect size based on two previous studies investigating the reliability of PPT that reported good to excellent reliability estimates (ICC >0.75).^18,17 A minimum acceptable reliability for ICC (ρ0) of 0.4, an expected reliability for ICC (ρ₁) of 0.75, a significance level of 0.05 and a power (1 – β) of 90% was employed for 2 raters/repetitions per subject (k). This resulted in a minimum sample size for reliability testing of N = 36. Hence, a minimum sample size of N = 40 was deemed to be appropriate to acknowledge a possible dropout rate of 10% for both parts of this study.³⁵

Results

For intra-rater reliability assessment, 40 participants (18 female, 22 male, age: 25.8 ± 3.4 yrs, weight: 74.6 ± 12.0 kg, height: 177.6 ± 8.6 cm, BMI: 23.5 ± 2.5) were successfully included. For inter-rater reliability assessment, 40 separate participants (18 female, 22 male, age: 27.7 ± 4.7 yrs, weight: 72.2 ± 12.1 kg, height: 175.6 ± 8.9 cm, BMI: 23.3 ± 2.6) were successfully included. No dropouts occurred and no adverse events of any kind were observed.

The ICC and SEM values for PPT measurements of the different landmarks for the entire study population are presented in Table 1. Intra-rater ICCs ranged from 0.848 at the elbow to 0.941 at the sternum, indicating good to excellent reliability across landmarks. Corresponding SEM values ranged from 2.5 N/cm² at the forehead to 7.6 N/cm² at the elbow. Inter-rater ICCs ranged from 0.813 at the forehead to 0.950 at the sternum, with SEM values ranging from 4.1 N/cm² at the sternum to 10.3 N/cm² at the knee. Descriptive values (Table 2) showed small non-significant numerical variations between repeated measurements and Friedman tests indicated no significant differences between the three intra-rater measurements at any landmark (p ≥ .273). Similarly, inter-rater median values were closely comparable between raters, and Wilcoxon tests revealed no statistically significant differences between raters at any landmark (p ≥ .270).

Table 1.

Intraclass correlation coefficients (ICC) of pressure pain threshold measurements presented for each landmark for intra-rater (study part A) and inter-rater (study part B) measures. Standard error of measurement (SEM) is presented as N/cm².

Landmark	ICC (95% CI)	SEM
Ankle
Intra-rater	0.926 (0.875–0.958)	5.5
Inter-rater	0.853 (0.723–0.922)	7.7
Knee
Intra-rater	0.926 (0.875–0.958)	7.1
Inter-rater	0.887 (0.785–0.940)	10.3
Elbow
Intra-rater	0.848 (0.744–0.914)	7.6
Inter-rater	0.872 (0.757–0.932)	9.8
Sternum
Intra-rater	0.941 (0.901–0.967)	4.1
Inter-rater	0.950 (0.906–0.973)	4.1
Forehead
Intra-rater	0.927 (0.877–0.959)	2.5
Inter-rater	0.813 (0.648–0.901)	5.3

Table 2.

Median (coefficient of variation) [IQ1, IQ3] {Min-Max} values of pressure pain thresholds measured at different landmarks presented as N/cm².

Landmark	Intra-rater testing			Inter-rater testing
Landmark	1. Measurement	2. Measurement	3. Measurement	Rater 1	Rater 2
Ankle	61.8 (33.3 %) [50.4, 79.7] {37.0–132.9}	64.6 (32.9 %) [48.2, 81.2] {25.9–117.4}	65.0 (30.7 %) [50.7, 85.6] {28.3–107.9}	44.0 (45.4 %) [32.3, 67.5] {13.6–109.6}	44.7 (37.3 %) [32.5, 59.8] {15.8–86.6}
Knee	85.8 (29.9 %) [72.5, 106.9] {34.8–140.0}	80.6 (29.9 %) [69.5, 104.1] {39.0–140.0}	93.0 (28.9%) [30.3, 140.0] {30.3–140.0}	62.3 (45.1 %) [45.4, 103.6] {29.4–130.0}	62.6 (40.8 %) [48.0, 91.3] {21.1–126.6}
Elbow	73.7 (23.8 %) [63.8, 85.2] {42.9−129.0}	68.4 (26.3 %) [59.6, 82.9] {39.6–127.6}	74.6 (28.1 %) [56.5, 90.5] {36.1–129.5}	60.5 (44.0 %) [40.1, 75.8] {24.7–126.4}	59.4 (44.3 %) [40.8, 75.9] {18.5–130.0}
Sternum	46.5 (32.6 %) [36.3, 55.5] {19.7–86.4}	45.3 (36.1 %) [37.8, 56.4] {21.2–99.0}	47.9 (35.9 %) [36.1, 59.3] {20.6–90.5}	40.0 (46.8 %) [32.0, 51.1] {22.3–118.7}	41.2 (38.2 %) [33.0, 48.1] {24.3–104.8}
Forehead	33.1 (29.0 %) [28.2, 41.0] {12.5–54.4}	36.5 (25.7 %) [26.5, 41.5] {20.3–51.8}	35.8 (27.3 %) [27.0, 41.6] {17.5–53.5}	33.5 (39.5 %) [25.4, 40.4] {15.9–78.1}	34.1 (31.8 %) [25.7, 41.7] {14.1–55.4}

Sex-stratified analyses showed good to excellent ICCs in both male and female participants for most landmarks and Table 3 shows ICC data for male and female participants separately. In male participants, ICCs ranged from 0.787 to 0.943, while in female participants ICCs ranged from 0.732 to 0.964. The lowest value was observed for inter-rater reliability at the forehead in female participants (ICC = 0.732), which falls slightly below the threshold for good reliability.

Table 3.

Intraclass correlation coefficients (ICC) of pressure pain threshold measurements of male and female participants presented for each landmark for intra-rater (study part A) and inter-rater (study part B) measures. Standard error of measurement (SEM) is presented as N/cm².

Landmark	Male (n = 22)	Female (n = 18)
Landmark	ICC (95% CI)	ICC (95% CI)
Ankle
Intra-rater	0.924 (0.846–0.966)	0.925 (0.836–0.970)
Inter-rater	0.832 (0.599–0.930)	0.900 (0.730–0.963)
Knee
Intra-rater	0.912 (0.820–0.961)	0.937 (0.862–0.974)
Inter-rater	0.855 (0.655–0.940)	0.964 (0.859–0.980)
Elbow
Intra-rater	0.787 (0.572–0.904)	0.881 (0.738–0.925)
Inter-rater	0.905 (0.774–0.960)	0.821 (0.530–0.932)
Sternum
Intra-rater	0.933 (0.864–0.970)	0.929 (0.843–0.971)
Inter-rater	0.943 (0.864–0.976)	0.939 (0.838–0.977)
Forehead
Intra-rater	0.928 (0.855–0.968)	0.922 (0.830–0.969)
Inter-rater	0.845 (0.563–0.940)	0.732 (0.306–0.899)

Discussion

This study aimed to evaluate the intra- and inter-rater reliability of PPT measurements across joint and reference landmarks in a healthy population and to report the SEM. Overall, a good to excellent reliability was found for this PPT protocol. The results demonstrated excellent intra-rater reliability for all landmarks (ICCs ≥ 0.926) with the exception of the elbow, where a good reliability (ICC = 0.848) is found. Inter-rater ICCs show excellent reliability at the sternum (ICC = 0.950) and good reliability (ICCs = 0.813 − 0.887) at other measured landmarks. Besides, no statistically significant differences are observed between PPT values between individual measurements as part of the intra- as well as inter-rater reliability testing.

In more detail, intra-rater reliability was highest at the sternum (ICC = 0.941) while the lowest was observed at the elbow (ICC = 0.848). Inter-rater reliability was strongest at the sternum (ICC = 0.950), with the forehead showing the lowest inter-rater reliability (ICC = 0.813; see Table 1). These findings are consistent with previous literature demonstrating good to excellent reliability of PPTs in the healthy population and the present results are comparable to, for instance, the reliability results observed at the tibialis anterior muscle in 18 healthy participants with an observed intra-rater reliability of 0.96 and 0.92 (ICCs of two raters) and inter-rater reliability of 0.94 and 0.96 (ICCs of two time points).¹⁷ Besides, similar high ICCs are observed for the upper trapezius muscle in a sample of 60 healthy participants. Here, intra-rater reliability was 0.97 and inter-rater-reliability was 0.89.¹⁹ Also in healthy participants, mostly excellent within-session reliability was reported and showed that repeated PPT measurements reduced measurement error compared with single measurements.¹¹ This supports the use of repeated measurements and averaging in the present protocol.

In a clinical pain population, generally excellent within-session reliability was reported in individuals with chronic low back pain, although systematic test-retest differences were observed at the painful lower back region.¹² This indicates that reliability estimates may differ between pain-free and painful sites and supports the need to establish reliability data under controlled conditions in healthy adults. In addition, another study showed that reliability estimates may depend on how inter-session and inter-repetition variability are modeled, highlighting that ICC values should be interpreted in relation to the specific measurement protocol.¹⁴ Clinical work in persons with conservatively managed wrist fractures further suggests that PPT can be reliably assessed in joint-related clinical conditions, although rater effects remain relevant and repeated assessments should preferably be performed by the same examiner when possible.¹³ Recent evidence from older adults undergoing foot PPT testing also supports the need for site- and population-specific reliability data.³⁶ Therefore, the present study adds to the existing literature by providing intra- and inter-rater ICCs and SEM values for predefined articular and osseous landmarks using a specific measurement protocol in young healthy adults.

The SEM provides important context for interpreting measured PPT values since the same units as the original variable are presented. The SEM reflects the precision of the measurement (i.e., the expected error in a single measurement) without being affected by between-subject variability.³⁷ For example, while the sternum had a low SEM (4.1 N/cm²), the knee had a higher SEM (10.3 N/cm²) in the inter-rater testing, highlighting the increased variability at that site along with higher median values observed (see Table 2). However, SEM at the knee based on intra-rater reliability is lower (7.1 N/cm²). In general, the present results show that SEM values are lower based on the intra-rater reliability results. Given the manual nature of pressure algometry used in the present study, repeated clinical assessments should preferably be performed by the same rater to reduce rater-related variability. This is also supported by previous clinical work in persons with conservatively managed wrist fractures, where intra-rater reliability was more favorable than inter-rater reliability.¹³ In another study on healthy participants, mean PPT values at the tibialis anterior of 82.2 and 75.5 N/cm² were observed along with a SEM of 12.2 N/cm².¹⁷ Besides, a mean PPT of 25.2 N/cm² (±10.2) at the upper trapezius was observed along with a SEM of 1.8 and 5.3 N/cm² for intra- and inter-rater measurements, respectively.¹⁹ These findings and comparisons highlight that anatomical site and methodology influence PPT measurement consistency. Therefore, landmark-specific reliability and measurement-error data are essential to provide a basis for reliable and precise assessments as well as accurate interpretation. The interpretation of SEM also depends on the underlying PPT magnitude and a larger SEM is still to be interpreted as acceptable with high mean or median PPT values. In the present analysis, the CV values ranged from 23.8 % to 36.1 % for intra-rater testing and from 31.8 % to 46.8 % for inter-rater testing, indicating moderate to high relative variability. Such comparatively high CVs are, however, common in PPT testing, as reported in previous research on PPT testing. For instance, CV values of up to 62.7 % are observed depending on the landmark measured.³⁸ These values suggest that, although measurement reliability is generally high to excellent, notable variability might remain between raters, underscoring the importance of standardized protocols and examiner training in PPT assessments.

The sex-stratified findings (see Table 3) further suggest broadly comparable reliability between male and female participants, although these subgroup analyses were exploratory and not powered to detect small sex-specific differences.

The major strength of the present study is that PPT of joint and reference landmarks were tested for both intra- and inter-rater reliability in a sample based on a sample size calculation including both sexes. The results build on the recently published reference values for these landmarks¹⁰ and together, these two studies provide a solid basis for the use of these PPT measurements for future clinical and research use. However, there are some limitations that should be mentioned to adequately interpret the results presented. The sample comprised only healthy, pain-free, physically active male and female adults aged 18–40 years, which limits the generalizability of the findings to older adults, highly sedentary individuals, and clinical populations with acute or chronic pain conditions. Moreover, although the total sample size was calculated a priori and was considered adequate for the primary reliability analyses, the study was not powered to detect small subgroup effects. Therefore, the sex-stratified ICC analyses should be interpreted as exploratory and descriptive rather than as confirmatory evidence for or against sex-specific differences in PPT reliability. Further, although intra- and inter-rater reliability were assessed across repeated sessions separated by at least 72 h, the study design did not evaluate long-term measurement stability over extended time periods, which would be relevant for longitudinal monitoring in clinical or research settings. No information regarding contraceptive use, phase of the menstrual cycle, or further pain associated phenomena associated with the menstrual cycle (e.g., dysmenorrhea or recurrent menstrual pain³⁹) was obtained in female participants. Yet, all participants indicated a subjective pain intensity of 0 making it unlikely that such pain states were present in female participants of this study. In addition, potential short-term influences on pain perception, such as alcohol or nicotine consumption prior to testing, were not systematically recorded. Although participants were instructed to avoid vigorous physical activity and analgesic medication before testing, these additional factors may have contributed to interindividual variability in PPT values. Lastly, body composition and local subcutaneous tissue thickness were not assessed. This should be acknowledged because experimental and modeling data indicate that overlying adipose tissue might influence stress and strain transmission during pressure algometry, particularly when deeper soft tissues such as muscle are targeted.²¹ However, the use of articular and osseous landmarks may also be regarded as a methodological strength of the present protocol. Due to the superficial anatomical location of the selected joint-related and bony landmarks, the influence of overlying subcutaneous adipose tissue is likely lower than at many muscular landmarks, where the target tissue is commonly covered by more soft tissue.²⁰ This may allow comparatively direct and standardized mechanical stimulation of the intended anatomical site, although future studies should verify this by assessing local tissue thickness and by including soft-tissue comparison landmarks since such soft-tissue landmarks were not considered in the present study.

Conclusion

This study demonstrates good to excellent intra- and inter-rater reliability of PPT measurements at articular and osseous reference landmarks in healthy male and female individuals. The presentation of SEM values enhances the use of these measures by defining thresholds for detecting changes beyond measurement error. These findings support the use of joint PPT in standardized pain sensitivity assessments and provide a basis for clinical and research applications. However, results are limited to healthy, pain-free participants of both sexes, and future research should confirm these findings in clinical populations.

Footnotes

Acknowledgments

The authors would like to thank Nina Simonis, Lina Schmidt, and Isabella Knapp for their support in this study.

ORCID iDs

Fabian Tomschi

Thomas Hilberg

Ethical considerations

Ethical approval for both study parts was obtained by the ethics committee of the University of Wuppertal (MS/AE 220811, SK/AE 240803, MS/AE 220203, MS/BBL 190517).

Consent to participate

All participants provided written informed consent to participate in this study.

Author contributions

FT wrote the manuscript, performed statistical analyses, performed data curation, was responsible for data acquisition, and had the idea of the study. TH revised the manuscript, supervised the study, and had the idea of the study.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.*

References

Mücke

Cuhls

Radbruch

, et al. Quantitative sensorische testung (QST). Schmerz 2021; 35: 153–160.

Stausholm

Bjordal

Moe-Nilssen

, et al. Pain pressure threshold algometry in knee osteoarthritis: intra- and inter-rater reliability. Physiother Theory Pract 2023; 39: 615–622. https://doi.org/10.1080/09593985.2021.2023929

Dworkin

Turk

Farrar

, et al. Core outcome measures for chronic pain clinical trials: IMMPACT recommendations. Pain 2005; 113: 9–19. https://doi.org/10.1016/j.pain.2004.09.012

Sánchez-Sabater

Suso-Martí

Núñez-Cortés

, et al. Optimizing acute pain relief in severe knee osteoarthritis: the influence of resistance exercise volume and psychosocial factors. Musculoskelet Sci Pract 2025; 79: 103390. https://doi.org/10.1016/j.msksp.2025.103390

Amiri

Alavinia

Singh

, et al. Pressure pain threshold in patients with chronic pain: a systematic review and meta-analysis. Am J Phys Med Rehabil 2021; 100: 656–674.

Bhattacharyya

Hopkinson

Nolet

, et al. The reliability of pressure pain threshold in individuals with low back or neck pain: a systematic review. Br J Pain 2023; 17: 579–591. https://doi.org/10.1177/20494637231196647

Tomschi

Meder

Hilberg

. Effects of whole-body strength training on pain and strength in chronic low back pain. A Randomized controlled trial. J Back Musculoskelet Rehabil. 2026; 10538127261448967. https://doi.org/10.1177/10538127261448967, In press.

Belavy

van Oosterwijck

Clarkson

, et al. Pain sensitivity is reduced by exercise training: evidence from a systematic review and meta-analysis. Neurosci Biobehav Rev 2021; 120: 100–108. https://doi.org/10.1016/j.neubiorev.2020.11.012

Rice

Nijs

Kosek

, et al. Exercise-induced Hypoalgesia in pain-free and chronic pain populations: state of the art and future directions. J Pain 2019; 20: 1249–1266. https://doi.org/10.1016/j.jpain.2019.03.005

10.

Tomschi

Schmidt

Brühl

, et al. Reference values of joint-specific pressure pain thresholds in healthy Male individuals: a retrospective study. Eur J Pain 2025; 29: e70050. https://doi.org/10.1002/ejp.70050

11.

Mailloux

Beaulieu

L-D

Wideman

, et al. Within-session test-retest reliability of pressure pain threshold and mechanical temporal summation in healthy subjects. PLoS One 2021; 16: e0245278. https://doi.org/10.1371/journal.pone.0245278

12.

Oliveira

FCL

Cossette

Mailloux

, et al. Within-session test-retest reliability of pressure pain threshold and mechanical temporal summation in chronic low back pain. Clin J Pain 2023; 39: 217–225. https://doi.org/10.1097/AJP.0000000000001106

13.

Saebø

Naterstad

Stausholm

, et al. Reliability of pain pressure threshold algometry in persons with conservatively managed wrist fractures. Physiother Res Int 2020; 25: e1797. https://doi.org/10.1002/pri.1797

14.

Liew

Lee

Rügamer

, et al. A novel metric of reliability in pressure pain threshold measurement. Sci Rep 2021; 11: 6944. https://doi.org/10.1038/s41598-021-86344-6

15.

Kosek

Ekholm

Nordemar

. A comparison of pressure pain thresholds in different tissues and body regions. Long-term reliability of pressure algometry in healthy volunteers. JRM 1993; 25: 117–124.

16.

Chung

Kim

. Reliability and validity of the pressure pain thresholds (PPT) in the TMJ capsules by electronic algometer. Cranio J Craniomandib Sleep Pract 1993; 11: 171–176. https://doi.org/10.1080/08869634.1993.11677961

17.

Koh

Paul

Nesovic

, et al. Reliability and minimal detectable difference of pressure pain thresholds in a pain-free population. Br J Pain 2023; 17: 239–243. https://doi.org/10.1177/20494637221147185

18.

Waller

Straker

O'Sullivan

, et al. Reliability of pressure pain threshold testing in healthy pain free young adults. Scand J Pain 2015; 9: 38–41. https://doi.org/10.1016/j.sjpain.2015.05.004

19.

Walton

Macdermid

Nielson

, et al. Reliability, standard error, and minimum detectable change of clinical pressure pain threshold testing in people with and without acute neck pain. J Orthop Sports Phys Ther 2011; 41: 644–650. https://doi.org/10.2519/jospt.2011.3666

20.

Finocchietti

Nielsen

Mørch

, et al. Pressure-induced muscle pain and tissue biomechanics: a computational and experimental study. Eur J Pain 2011; 15: 36–44. https://doi.org/10.1016/j.ejpain.2010.05.010

21.

Kosek

Ekholm

Hansson

. Pressure pain thresholds in different tissues in one body region. The influence of skin sensitivity in pressure algometry. Scand J Rehabil Med 1999; 31: 89–93. https://doi.org/10.1080/003655099444597

22.

Arendt-Nielsen

Nie

Laursen

, et al. Sensitization in patients with painful knee osteoarthritis. Pain 2010; 149: 573–581. https://doi.org/10.1016/j.pain.2010.04.003

23.

Moss

Knight

Wright

. Subjects with knee osteoarthritis exhibit widespread hyperalgesia to pressure and cold. PLoS One 2016; 11: e0147526. https://doi.org/10.1371/journal.pone.0147526

24.

Puntillo

Giglio

Paladini

, et al. Pathophysiology of musculoskeletal pain: a narrative review. Ther Adv Musculoskelet Dis 2021; 13: 1759720X21995067. https://doi.org/10.1177/1759720X21995067

25.

Reezigt

Slager

GEC

Coppieters

, et al. Novice assessors demonstrate good intra-rater agreement and reliability when determining pressure pain thresholds; a cross-sectional study. PeerJ 2023; 11: e14565. https://doi.org/10.7717/peerj.14565

26.

Kottner

Audigé

Brorson

, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol 2011; 64: 96–106.

27.

Bull

Al-Ansari

Biddle

, et al. World health organization 2020 guidelines on physical activity and sedentary behaviour. Br J Sports Med 2020; 54: 1451–1462. https://doi.org/10.1136/bjsports-2020-102955

28.

Thornton

Baird

Sheffield

. Athletes and experimental pain: a systematic review and meta-analysis. J Pain 2024; 25: 104450. https://doi.org/10.1016/j.jpain.2023.12.007

29.

Tomschi

Schulz

Stephan

, et al. Short all-out isokinetic cycling exercises of 90 and 15 s unlock exercise-induced hypoalgesia. Eur J Pain 2024.

30.

Tomschi

Schmidt

Soffner

, et al. Hypoalgesia after aerobic exercise in healthy subjects: a systematic review and meta-analysis. J Sports Sci 2024; 42: 574–588. https://doi.org/10.1080/02640414.2024.2352682

31.

Bisset

Evans

Tuttle

. Reliability of 2 protocols for assessing pressure pain threshold in healthy young adults. J Manip Physiol Ther 2015; 38: 282–287. https://doi.org/10.1016/j.jmpt.2015.03.001

32.

Tabachnick

Fidell

. In: internat

(ed), [Nachdr.]. Using multivariate statistics. 5. Pearson Allyn and Bacon, 2009.

33.

Shrout

Fleiss

. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979; 86: 420–428. https://doi.org/10.1037//0033-2909.86.2.420

34.

Koo

. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropract Med 2016; 15: 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

35.

Walter

Eliasziw

Donner

. Sample size and optimal designs for reliability studies. Stat Med 1998; 17: 101–110. https://doi.org/10.1002/(sici)1097-0258(19980115)

36.

Mayorga-Vega

Jimenez-Cebrian

Barón-López

, et al. Reliability of the use of foot pressure pain threshold in adults: a test-retest analysis. PeerJ 2025; 13: e19875. https://doi.org/10.7717/peerj.19875

37.

Stratford

Goldsmith

. Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data. Phys Ther 1997; 77: 745–750. https://doi.org/10.1093/ptj/77.7.745

38.

Srimurugan Pratheep

Madeleine

Arendt-Nielsen

. Relative and absolute test-retest reliabilities of pressure pain threshold in patients with knee osteoarthritis. Scand J Pain 2018; 18: 229–236. https://doi.org/10.1515/sjpain-2018-0017

39.

Fortún-Rabadán

Boudreau

Bellosta-López

, et al. Facilitated central pain mechanisms across the menstrual cycle in dysmenorrhea and enlarged pain distribution in women with longer pain history. J Pain 2023; 24: 1541–1554. https://doi.org/10.1016/j.jpain.2023.04.005