Abstract
Background
The Pennsylvania Shoulder Score is a common patient-reported measure of shoulder pain, function, and satisfaction. Cross-cultural adaptation is essential for non-English-speaking populations.
Methods
This prospective cross-sectional validation of the Arabic Pennsylvania Shoulder Score (PSS-AR) followed translation and cultural adaptation steps. Adults ≥18 years with shoulder pain/dysfunction were recruited from two outpatient clinics. Participants completed the PSS-AR (Pain 0–30; Satisfaction 0–10; Function 0–60; Total = sum, higher = better) and Arabic Simple Shoulder Test (SST-AR; 0–100). Readability was assessed using Automatic Arabic Readability Index (AARI). Stable participants repeated PSS-AR after 5–10 days.
Results
219 patients participated; 102 stable individuals completed retesting. AARI showed readability levels of grade 5.24 for PSS-AR and 2.92 for SST-AR, below the recommended sixth-grade threshold. Function showed excellent internal consistency (Cronbach's α=0.96). Exploratory factor analysis supported essential mono-dimensionality: Factor 1 explained 54.90% of variance (factor 2 + 5.19%; cumulative 60.09%) with loadings 0.47–0.85. Convergent validity with SST-AR was strong (Spearman ρ: Total 0.792, function 0.788, pain 0.538; all p < 0.0001). Test–retest reliability was excellent: ICC(2,1) = 0.95 (95% confidence interval 0.93–0.97). Mean change was minimal (Δ=−0.47 ± 9.36) without systematic shift (F = 0.09, p = 0.76). SEM was 4.98; MDC individual=13.80 and MDC_group=1.37.
Conclusion
The PSS-AR is reliable, valid, and culturally appropriate for Arabic-speaking patients.
Keywords
Introduction
In current modern medical practice, quality of life assessment has risen as a critical endpoint, especially in orthopedic care. Having become an indispensable outcome measure, it is now regarded as an equally important parameter alongside surgical and postoperative results when evaluating the effectiveness of an intervention and is increasingly employed in numerous studies.1–4 The patients’ viewpoints, perspectives and preferences can be evaluated through quality-of-life metrics. They emerge as a pivotal indicator of patient satisfaction, which has potential implications for long-term outcomes. 5 Furthermore, monitoring patient-reported outcome measures (PROMs) leads to actionable insights into recovery. Individuals who frequently completed PROMs while receiving automated alerts demonstrated significantly better outcomes in comparison with standard care. 6
Patients’ personal standpoints confer quality of life metrics a highly subjective dimension. To ensure consistency among measurements, reliability in comparison across studies, and proper appraisal and synthesis of evidence, it becomes crucial to objectify these metrics. Standardized questionnaires systematically assess variables such as pain, translating them into quantifiable data, effective for clinicians to determine improvements objectively. The Pennsylvania Shoulder Score (PSS) is becoming more widely applied and adopted in the literature as an evaluation tool for function, satisfaction, and pain levels. It consists of a 100-point shoulder-specific self-report questionnaire comprising three subscales. The PSS has demonstrated positive measurement properties with both reliable and valid evaluation of patients with diverse shoulder disorders.7,8
In order for this tool to be proficiently used in diverse cultural and linguistic contexts, it requires accurate translation to ensure validity and accuracy. In non-English-speaking countries, the lack of instruments remains a true challenge, stipulating cross-cultural adaptation. Adopting translation instead of developing a new questionnaire grants a concise comparison with a valid examination of the functional status across a wide range of populations. 9
This study represents the first to translate to Arabic the PSS questionnaire for Arab patients presenting with upper limb pathology. Through the establishment of both reliability and validity of the Arabic versions, it addresses the cultural and linguistic gap while enabling objective and standardized patient assessment. Hence, it may improve the quality of care while also creating new opportunities for clinical research within the Arab world.
Materials and methods
Study design and eligibility criteria
A prospective cross-sectional validation study was conducted to validate the PSS-AR. The Arabic Simple Shoulder Test (SST-AR), previously validated, served as the comparator. 10
The PSS comprises three domains: pain (0–30), satisfaction (0–10), and function (20 items; 0–60); total equals pain + satisfaction + function, with higher scores indicating better status. The SST was scored per the manual and transformed to 0–100 for comparability. Analyses used native subscale ranges and a 0–100 total scale (higher = better).
Consecutive patients were recruited over 2 years from the waiting rooms of two outpatient clinics during routine visits until the target sample size (>200 participants) was reached.
Inclusion criteria were adults (≥18 years) presenting with shoulder pain and/or dysfunction across common diagnoses (e.g. rotator cuff-related disorders, instability, glenohumeral osteoarthritis, adhesive capsulitis).
Exclusion criteria were a primary pain source outside the shoulder (e.g. cervical radiculopathy, brachial plexus injury, referred cardiac/visceral pain), acute traumatic injuries requiring urgent management, inability to provide informed consent or complete a patient-reported questionnaire, and insufficient proficiency in Arabic.
Permissions and authorizations
Permission to translate and adapt the PSS-AR was obtained from the original authors/copyright holders. 7 Permission to use the validated SST-AR as a comparator was also obtained from its authors. 10
Translation
Cross-cultural adaptation followed established guidelines. 11 Three independent forward translations of the original English Pennsylvania Shoulder Score into Arabic were produced (one sworn translator and two shoulder surgeons) and reconciled into a single forward version. A back-translation into English was then performed by a translator who is a native English speaker and blinded to the original. An expert panel (statistician/researcher, two shoulder clinicians, and sworn translators) reviewed equivalence between the back-translation and the original and resolved discrepancies. A pilot study with 10 patients assessed comprehensibility; any misunderstood items were revised. Final proofreading by all authors produced the PSS-AR. Readability of the final Arabic patient-facing questionnaires was assessed using the Automatic Arabic Readability Index (AARI). The final Arabic text of both administered PROMs, PSS-AR and SST-AR, was analyzed as presented to participants. AARI was selected because it is an Arabic-specific readability measure incorporating lexical and sentence-level features, with the final index derived from the number of characters, average characters per word, and average words per sentence. 12
Procedures and data capture
All patients completed both questionnaires (PSS-AR and SST-AR) using a standardized Google Form after providing informed consent. An author was available to offer assistance only if needed.
Furthermore, participants judged to be clinically stable (i.e. no initiation or change of treatment and no surgery during the interval) were invited to complete the PSS-AR again after 5–10 days, an interval chosen to reduce recall of prior responses while minimizing true clinical change. Contact information was recorded solely for this follow-up.
All responses were exported to a single Excel spreadsheet for statistical analysis, and each participant was assigned a unique study code to ensure anonymity.
Ethical approval and consent
Ethical approval was obtained from the authors’ affiliated institutions. The study adhered to the Declaration of Helsinki, and informed consent was obtained from all participants.
Statistical analysis
All analyses were conducted in XLSTAT (Addinsoft, Paris, France). Two-sided tests were used with α = 0.05. Effect sizes and 95% confidence intervals (CIs) are reported where applicable. Analyses were performed on complete cases; for factor analysis, listwise deletion was applied.
Descriptive Statistics and Interpretability
For PSS-AR subscales (pain, satisfaction, function) and total, and for SST-AR total, distributions were summarized using minimum, maximum, median, and interquartile range (IQR). Interpretability was examined via floor and ceiling effects, defined as the proportion of respondents at the theoretical minimum or maximum; values <15% were considered acceptable. Readability results were reported descriptively for each Arabic PROM.
(b) Internal consistency (function subscale)
Internal consistency of the 20-item Function subscale was evaluated using Cronbach's α with 95% CI, corrected item-total correlations (CITC), and “α if item deleted.” CITC ≥0.30 were considered acceptable, with higher values indicating stronger item–scale coherence.
(c) Structural validity (function subscale)
Exploratory factor analysis (EFA) was conducted on the 20 Function items. Sampling adequacy was assessed with the Kaiser–Meyer–Olkin statistic and Bartlett's test of sphericity. Extraction used principal axis factoring; an oblique rotation (direct oblimin) was applied to allow for correlated factors. Factor retention was guided by the scree plot, eigenvalues >1, and interpretability (parallel analysis considered when available). Factor loadings ≥0.40 were deemed salient.
(d) Convergent validity
Convergent validity was examined using Spearman's rank correlation (ρ) between PSS-AR Total and SST-AR Total. Secondary analyses assessed correlations between PSS-AR subscales and SST-AR total. Because the PSS-AR and SST-AR are bounded, ordinal-derived scores with non-normal, tie-prone distributions and an expected monotonic (not strictly linear) relationship, convergent validity was assessed using Spearman's rank correlation (ρ). Correlations were interpreted with 95% CI.
(e) Short-interval change (5–10 Days)
For participants reassessed within 5–10 days, change scores (Δ = Time2 − Time1) were summarized as mean Δ ± SD with 95% CI for PSS-AR Total and subscales. This interval was chosen to minimize recall while limiting true clinical change.
(f) Test–retest reliability and measurement error (stable subsample)
In participants judged clinically stable over the 5–10-day interval, test–retest reliability of PSS-AR Total was estimated using ICC(2,1) (two-way random effects, absolute agreement, single measures) with 95% CI. Systematic change between Time1 and Time2 was examined (paired testing appropriate to the distribution). Measurement error indices were derived as follows:
Standard error of measurement (SEM) = SD × √(1 − ICC), using the Time1 SD of the stable subsample. Minimal detectable change for an individual (MDC_individual) = 1.96 × √2 × SEM. Minimal detectable change for a group (MDC_group) = MDC_individual / √n.
These indices provide thresholds to interpret score changes beyond measurement error.
Results
Translation and cross-cultural adaptation of the Arabic Pennsylvania Shoulder Score
Following three independent forward translations (one sworn translator and two shoulder surgeons), reconciliation into a single version, blinded back-translation by a native English translator, expert panel review (statistician/researcher, two shoulder clinicians, and sworn translators), a 10-patient pilot for comprehensibility, and final proofreading, the finalized PSS-AR is presented in Supplementary Figure 1. Readability analysis using AARI showed that the PSS-AR had an estimated readability level of grade 5.24, while the SST-AR had an estimated readability level of grade 2.92. Both instruments were therefore easier than the recommended sixth-grade threshold for patient-facing materials.
Study population and demographics
A total of 219 patients were enrolled. Of these, 102 were judged clinically stable and completed the PSS-AR again 5–10 days later. Age was not normally distributed (Shapiro–Wilk p = 0.011); median age was 54 years (IQR 40.25–65.75). The female-to-male ratio was 126:93, and dominant limb distribution was right:left = 198:21.
Interpretability
Observed score ranges span full theoretical intervals (Figure 1). Floor and ceiling effects (Table 1) did not exceed 15.00% for any scale, indicating adequate coverage without boundary saturation.

Score distributions for the Arabic Penn shoulder score (PSS-AR) and Arabic simple shoulder test (SST-AR) (n = 219): (a) PSS-AR—total, function, pain, and satisfaction; (b) SST-AR total. Box-and-whisker plots show median (line), IQR (box), and range (whiskers); points denote individual observations. IQR: interquartile range.
Descriptive statistics, floor/ceiling effects, and normality (Shapiro–Wilk) for PSS-AR (total and subscales) and SST-AR (n = 219).
PSS-AR: Arabic Pennsylvania Shoulder Score; SST: Arabic Simple Shoulder Test.
Reliability / consistency
Internal consistency (function subscale)
The 20-item Function subscale demonstrated excellent internal consistency with Cronbach's α = 0.96. CITCs were all positive and ranged 0.50–0.88. “α if item deleted” ranged 0.96–0.96, indicating that removal of any single item would not meaningfully improve internal consistency. On its native 0–60 scale, the function score had a mean of 33.67 with an SD of 15.42.
(b) Test–retest reliability
In a clinically stable subsample (n = 102), PSS-AR Total showed excellent reliability: ICC (2,1) = 0.95 with 95% CI 0.93–0.97. Time-1 scores were 55.59 ± 22.96 and Time-2 scores were 55.88 ± 23.45. Changes were small: PSS-AR total Δ = −0.47 ± 9.36, pain Δ = −0.03 ± 2.54, satisfaction Δ = −0.86 ± 3.97, and function Δ = 0.38 ± 7.68. No systematic change was detected (F = 0.09, p = 0.76). The derived SEM was 4.98, the MDC_ind was 13.80, and the MDC_group was 1.37 on the 0–100 scale, indicating that individual improvements ≥ 13.80 points exceed measurement error with 95% confidence.
(c) Construct/structural validity
EFA supported essential unidimensionality. The scree plot (Figure 2) showed a clear elbow after the first factor. Factor 1 explained 54.90% of the variance, and factor 2 added 5.19% (cumulative 60.09%). All 20 items loaded strongly on factor 1, with loadings 0.47–0.85; secondary loadings on other factors were small. These findings justify the use of a single summed Function score.

Scree plot.
Positive, monotonic associations were observed between PSS-AR and SST-AR scores. Correlations were strong forPSS-AR total with SST-AR total (Spearman ρ = 0.792; p < 0.0001) and for the function subscale with SST-AR Total (ρ = 0.788; p < 0.0001), and moderate for the pain subscale with SST-AR total (ρ = 0.538; p < 0.0001) (Figures 3–5).

Convergent validity: PSS-AR total vs SST-AR total (Spearman ρ = 0.792; p < 0.0001). PSS-AR: Arabic Pennsylvania Shoulder Score; SST: Arabic Simple Shoulder Test.

Convergent validity: PSS-AR function vs SST-AR total (Spearman ρ = 0.788; p < 0.0001). PSS-AR: Arabic Pennsylvania Shoulder Score; SST: Arabic Simple Shoulder Test.

Convergent validity: PSS-AR pain vs SST-AR total (Spearman ρ = 0.538; p < 0.0001). PSS-AR: Arabic Pennsylvania Shoulder Score; SST: Arabic Simple Shoulder Test.
Discussion
The present study demonstrated excellent psychometric properties of the PSS-AR version. The ICC (2,1) = 0.95 (95% CI 0.93–0.97), alongside a Cronbach's α > 0.95, exhibited high test–retest reliability and internal consistency.13,14 PSS scores showed good concordance with the SST-AR, a well-established and widely used instrument for shoulder function assessment, highlighting good construct validity.15–17 Additionally, the PSS-AR showed favorable interpretability and measurement-error properties, supporting its use in clinical and research settings. Its ease of use among Arabic-speaking patients demonstrates its suitability as a PROM. This supports the widespread implementation of the PSS-AR across Arab countries as a reliable and practical tool in both clinical and research settings for diverse patient populations.
Furthermore, this study's findings are largely consistent with the original PSS validation. 7 In prior cohorts, the original PSS demonstrated excellent reliability and corresponding responsiveness. A recent systematic review reported excellent ICC values ranging from 0.90 to 0.94 and high construct validity (0.75 < r < 0.96), with a pooled MDC₉₀ of 12.13 and an effect size of 0.85. Therefore, this study shows strong consistency with the validity and reliability of the PSS in other populations.
The PSS, being one of the more comprehensive shoulder-specific PROMs, integrates subscales for pain, function, and patient satisfaction, allowing both domain-specific and total scoring. 7 In contrast to other PROMs such as the SST and the University of California–Los Angeles (UCLA) Shoulder Scale, the PSS had not previously been translated and validated in Arabic.10,18 Additionally, the PSS translations into Turkish and Portuguese highlight its global recognition as a valid and useful tool for outcome measurement in assessing shoulder disorders, demonstrating its high perceived value, clinical applicability, and psychometric strength.19,20 Moreover, it is estimated that over 420 million people worldwide speak Arabic. 21 The pronounced number of Arabic-speaking patients coupled with the lack of a validated Arabic version of the PSS places these patients at a clinical and research disadvantage, consequently restricting the inclusion of these patients in studies and reducing the representativeness of global research.
However, translation on its own is insufficient. Linguistic, conceptual, experiential, and idiomatic equivalences must be considered in cross-cultural adaptation. In order to maintain the instrument's content validity, it is well-established that PROM tools intended for cross-cultural use must not only be well translated linguistically but also culturally adapted. 9 The standard process involves forward translation, reconciliation, backward translation, expert committee review, and pretesting. 22 As a result, meaningful comparisons across populations can be ensured. In addition to linguistic and cultural equivalence, the present readability assessment provides further support that the Arabic wording of the administered PROMs is suitable for patient-facing use.
Although objective clinical assessments such as range of motion, strength testing, imaging, and physical examination remain the foundation for managing shoulder pathology, they do not fully capture a patient's lived experience. PROMs have therefore filled this gap by complementing objective assessments. Surgeons have consistently reported the importance of utilizing PROMs in patient care, as they have been shown to improve functional outcomes and postoperative recovery.23–25 Over the past years, PROMs have gained increased recognition, especially in orthopedic research.26–28 The American Academy of Orthopaedic Surgeons (AAOS) emphasizes that changes in PROM scores rank among the best measures of an orthopedic procedure's “success.” 29 Randomized trials for shoulder disorders frequently include pain and function measurements, underscoring the centrality of PROMs in modern shoulder research.30,31
As previously mentioned, the PSS includes three subscales related to pain, satisfaction, and function. Pain and function represent core domains in shoulder trials, often measured exclusively through PROMs.32,33 Functional items in shoulder PROMs generally inquire about reaching, lifting, overhead, and carrying tasks, as well as activities of daily living. A validated functional domain ensures that these tasks are culturally and linguistically appropriate and adapted. Regarding satisfaction, expectations, values, and cultural norms significantly influence how it is expressed; therefore, this domain is highly sensitive to translation quality. By validating each of the three domains above, this study establishes interpretable and comparable results by ensuring that pain scores, functional limitations, and satisfaction ratings are coherent with the original version.
Validating the PSS-AR will not only enhance inclusion and comparability in shoulder research within Arabic-speaking regions but will also guide clinical decision-making by helping identify patients who lack subjective improvement. This, in turn, will contribute to elevating the quality of care among these populations.
Nonetheless, this study presents some limitations. It was conducted solely in two clinics, which may limit generalizability across Arabic-speaking regions. The sample size, while adequate for initial validation, could restrict subgroup analyses. Lastly, some cultural and dialectal nuances might remain untested, but the questionnaire was translated into Modern Standard Arabic (Foṣḥa), the official form of the language taught and understood throughout Arabic-speaking countries, thereby ensuring broad linguistic comprehensibility despite regional variations. Although readability was assessed quantitatively, formal readability indices may not fully capture regional dialectal familiarity, health-literacy variation, or item-level patient interpretation.
Conclusion
In conclusion, the current study provides the first formally validated Arabic version of the PSS. It addresses a significant gap on both the clinical and research levels. By overcoming cultural and linguistic barriers through proper adaptation, this tool enables the standardized assessment of shoulder outcomes in Arabic-speaking populations, thereby enhancing the quality of care and research through its wide applicability.
Supplemental Material
sj-docx-1-sel-10.1177_17585732261459310 - Supplemental material for Cross-cultural adaptation and validation of the Arabic version of the Pennsylvania Shoulder Score
Supplemental material, sj-docx-1-sel-10.1177_17585732261459310 for Cross-cultural adaptation and validation of the Arabic version of the Pennsylvania Shoulder Score by Marc Boutros, Guy Awad, Caren Hassan, Shaza Hammad, Antonella Leba, Sami Kais, Sami Roukoz and Rami El Abiad in Shoulder & Elbow
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
