Abstract
Background:
Sarcoidosis is a heterogeneous granulomatous disease of unknown aetiology, characterised by a highly variable clinical behaviour. While some patients experience a self‑limited course, others develop chronic and relapsing disease. At present, no precise phenotyping strategy exists in clinical practice to reliably predict which patients will progress to chronicity.
Objectives:
The aim of this study was to identify distinct phenotypic clusters sharing common clinical characteristics and to evaluate the predictive value of these clusters for disease outcomes.
Design:
This is a retrospective, single-center study. Histologically confirmed sarcoidosis patients (N = 68), diagnosed between January 2022 and December 2023 at the 5th Pulmonary Medicine Department of SOTIRIA Chest Diseases Hospital of Athens, were included. All patients had >2 years of follow-up.
Methods:
Multiple correspondence analysis (MCA) followed by hierarchical clustering on principal components (HCPC) was performed to identify phenotypic clusters. Logistic regression was performed to identify the prognostic factors of chronicity.
Results:
A total of 68 consecutive patients with sarcoidosis were included in the study. The mean age at diagnosis was 57.6 ± 11.1 years, with the majority being females (61.8%). Thoracic involvement, including either intrathoracic lymph nodes and/or the lungs, was the most frequently detected. Overall, fatigue, cough, and arthralgia were among the most frequently reported symptoms. Two phenotypic clusters were identified, including 63 (92.6%) and 5 (7.4%) patients. Cluster 1 was characterised by minimal extrapulmonary involvement, whereas Cluster 2 showed a higher frequency of multi-organ disease, including liver (80%), musculoskeletal (75%), spleen (67%), cardiac (50%) and skin (43%) involvement. Cluster stability was moderate to weak. The clustering phenotype was not associated with chronicity. However, involvement of lymph nodes only was associated with reduced odds of chronicity (OR 0.17, 95% CI 0.02–0.69), while arthritis was strongly associated with increased chronicity risk (OR 17.02, 95% CI 2.04–471.07). Exploratory 3-cluster analysis suggested a potential arthritis-predominant phenotype, although of low stability.
Conclusion:
Two distinct phenotypes in sarcoidosis are identified by using cluster analysis. In a sensitivity analysis that allowed for three clusters, a third cluster, characterised by the presence of arthritis and predominant eye and skin involvement, emerged. Interestingly, the presence of lone intrathoracic or extrathoracic lymphadenopathy appears to be significantly protective against chronicity.
Plain language summary
Sarcoidosis is a heterogeneous granulomatous disease of unknown aetiology, characterised by highly variable clinical behaviour. While some patients experience a self-limited course, others develop chronic and relapsing diseases. At present, no precise phenotyping strategy exists in clinical practice to reliably predict which patients will progress to chronicity. This study aimed to identify groups of patients who share similar patterns of symptoms and organ involvement. The chest, mainly the lungs and nearby lymph nodes, was the most affected area. Fatigue, cough, and joint pain were among the symptoms patients reported most often. We found two clear groups of patients: Cluster 1 included people with minimal extrapulmonary involvement, while Cluster 2 mainly involved the liver, spleen, muscles, and heart. An exploratory analysis indicated that a third Cluster may exist, marked by arthritis along with eye and skin problems. The presence of arthritis was independently associated with an increased risk of chronic disease, while having only lymph node involvement, either inside or outside the chest, seemed to protect against long-term disease.
Background
Sarcoidosis, a heterogeneous granulomatous disease of unknown cause, presents a highly variable clinical behaviour. 1 It occurs worldwide and most commonly affects young and middle-aged adults of all sexes, though its incidence shows substantial geographic variation. 2 The disease primarily affects the lungs, although it can impact virtually any organ, and unfortunately, so far, its clinical progression remains unpredictable. 3
Disease severity also varies considerably; some patients are incidentally found to have radiographic abnormalities yet remain entirely asymptomatic, whereas others experience a chronic, progressive course that proves refractory to treatment. 4 Indeed, sarcoidosis can follow two different courses: a self-limited one, seen in roughly two-thirds of patients, in which the disease resolves spontaneously within 12–36 months and a chronic one requiring prolonged treatment.5,6 Approximately 25% of patients develop chronic or progressive disease, contributing substantially to morbidity, healthcare utilisation and mortality. 7 The great variability in disease presentation and the unpredictable clinical behaviour have led to the establishment of clinical and radiological patterns, phenotypes that may facilitate disease management. 8 Indeed, clinical phenotyping in such a complex and challenging disease may allow for the identification of more clearly defined subpopulations with similar clinical characteristics and prognosis.
Historically, lung involvement patterns have been classified based on the Scadding staging system, which, although now may be surpassed by chest computed tomography (CT), remains useful for broad prognostic assessment.1,9 However, the Scadding system lacks the accuracy needed to predict disease chronicity and does not capture the whole burden of sarcoidosis. 10 Lately, the introduction of positron emission tomography (PET)-CT into clinical practice for sarcoidosis management has prompted efforts to stratify disease according to organ involvement. 11 However, a clinical phenotype should also reflect a consistent natural history and shared clinical and physiological characteristics that mirror the underlying pathology. 8 In addition, to date, most phenotyping paradigms have relied on expert opinion, introducing potential bias, and none has been widely adopted for classification. For these reasons, cluster analyses have increasingly been used to identify distinct clinical phenotypes characterised by shared trait combinations.8,12,13 Such an example is the European multicentre GenPheReSa (Genotype-Phenotype Relationship in Sarcoidosis) study that stratified patients with sarcoidosis into five distinct subsets. 14 However, these phenotypes do not reliably predict disease severity or prognosis, and their application in routine clinical practice remains laborious, complex and difficult to implement broadly. Moreover, available data on sarcoidosis are limited and inconsistent, underscoring the need for further investigation. There remains a critical need to identify sarcoidosis phenotypes at risk for severe disease and to classify them in ways that support more effective, individualised management.
Therefore, this retrospective study aimed to validate the prognostic value of previously proposed clinical and radiological phenotypes for chronicity and, in addition, to identify novel clinical phenotypes through cluster analysis on clinical and PET-CT scan characteristics, thereby developing a predictive model for chronic disease in patients with sarcoidosis.
Design and methods
Study population
This retrospective study included all consecutive patients with biopsy-proven sarcoidosis who were diagnosed and regularly followed up, from January 2022 to December 2023, ensuring more than 2 years of follow‑up by the administrative censoring date of December 2025, at the 5th Pulmonary Medicine Department of SOTIRIA Chest Diseases Hospital of Athens meeting the following criteria: (1) a compatible clinical picture and radiological findings, (2) histological evidence of non-caseating granulomas and (3) exclusion of other diseases, including infections, common variable immunodeficiency, or malignancy (Figure 1). Data on demographics, laboratory findings, medical imaging, past medical history, treatment, pulmonary function tests and duration of symptoms or relapses were collected in detail. All patients underwent a chest radiograph, chest CT, and PET-CT scan on the initial day of diagnosis. The treatment groups, organ involvement, and clinical phenotypes were assessed based on the characteristics at the time of initial diagnosis. Organ involvement was assessed using fluorodeoxyglucose (18F-FDG) PET-CT findings, consistent with previously published findings.11,15,16 PET findings were considered positive when areas that normally exhibit minimal or no physiological uptake demonstrated increased 18F‑FDG activity exceeding background levels. A quantitative assessment of metabolic activity was performed using the maximum standardised uptake value (SUVmax) for each lesion.

Patient flow chart.
Disease resolution was defined as the remission of symptoms and normalisation of radiological and laboratory findings. Chronicity was defined as sarcoidosis with any of the following beyond 2 years of follow-up: (a) Persistent or worsening documented sarcoidosis-related symptoms despite ongoing systemic treatment; (b) failure to withdraw systemic treatment and need for oral corticosteroids and/or steroid-sparing agents at ⩾2 years after treatment initiation; (c) relapse, defined as recurrence or worsening of sarcoidosis-related symptoms with new or worsening radiologic abnormalities requiring re-initiation or escalation of systemic treatment after a period of remission.
Informed written consent to participate in the study was provided by all participants upon the diagnosis of sarcoidosis, and anonymisation of participants’ information data was performed during data collection and processing. The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (Supplemental File). 17
18F-FDG PET/CT protocol
18F‑FDG PET/CT was performed after appropriate patient preparation to suppress physiological myocardial glucose uptake. All patients followed a 24‑h low‑carbohydrate, high‑fat diet, followed by an 18‑h fasting period. Blood glucose levels were confirmed to be < 180 mg/dL prior to tracer administration. An intravenous cannula was placed 10 min before the injection of 18F‑FDG. The radiotracer was administered intravenously at a dose of 4.5–5.5 MBq/kg, and imaging commenced 40 min of post-injection. A low‑dose CT scan of the brain was acquired first, followed by a 10‑min PET acquisition with the patient’s arms positioned down, and the head secured in a dedicated holder. Subsequently, whole‑body imaging was performed with the patient in the supine position, with arms raised above the head when feasible. Acquisition times were 3–5 min per bed position for the standard whole‑body scan and 2 min per bed position for imaging of the lower limbs.
Statistical analysis
Patients’ characteristics were described overall and by chronicity status. In case of continuous variables, means along with standard deviations were presented based on the distribution assessment, while the categorical variables were summarised as counts and corresponding percentages. Organ involvement was coded as a set of binary variables (0 = absence, 1 = presence). The final dataset contained variables for the presence of intrathoracic and extrathoracic nodal, pulmonary, cutaneous, ocular, musculoskeletal (beyond joints), cardiac, hepatic, splenic, renal and central nervous system involvement, and arthritis. All variables were treated as categorical factors. To explore associations among organ involvement patterns and reduce dimensionality, we performed multiple correspondence analysis (MCA) using the FactoMineR package in R. The MCA was run with default normalisation and included all selected organ involvement variables. Individual factor coordinates from the MCA served as the input for the hierarchical clustering on principal components (HCPC) analysis that was conducted subsequently for the identification of patient phenotypes. This method combines Ward’s hierarchical agglomeration with consolidation through k-means clustering, as described elsewhere.18,19 The clustering was performed on the MCA individual coordinates to ensure that clusters reflected the major axes of variation in organ involvement.
A two-cluster solution was chosen a priori based on inspection of hierarchical dendrogram structure and in a way to ensure the maximum stability, given the relatively small sample size (N = 68). Each patient was assigned to one of the two clusters. Cluster descriptions were obtained from HCPC output by assessing over- and under-representation of each organ involvement modality using χ²-based tests. To evaluate the stability of the identified clusters, we performed bootstrap resampling. Briefly, for each bootstrap iteration, a sample of equal size was drawn with replacement, MCA was recomputed, and HCPC was applied using two clusters. Further, a sensitivity, hypothesis-generating analysis with three clusters was also performed. Cluster labels from the bootstrap solution were then mapped back to the original subject indices. Cluster stability was quantified using the Jaccard similarity index, defined as the proportion of overlapping subjects between the original cluster and its bootstrap analogue. For each cluster, 10,000 bootstrap replications were performed. Mean Jaccard values were computed for each cluster to assess stability. Following conventional thresholds, mean Jaccard indices ⩾0.75 were considered highly stable, 0.60–0.75 moderately stable, 0.50–0.60 indicative of weak cluster reproducibility, and <0.50 were considered unstable.
To allow intuitive visualisation of patterns of organ involvement and relationships between variables, patients and variables were visualised using scatter plots based on MCA dimensions, using the factoextra R package.
To identify the independent prognostic factors of chronicity, multivariable logistic regression analysis was performed. The different staging and phenotyping variables were included in separate models, along with all individuals’ characteristics (sex, age and smoking). The variable selection was based on a stepwise procedure.
All analyses were conducted in R version 4.5.1.
Results
Demographics
A total of 68 consecutive patients with sarcoidosis were included in the study. Demographic and clinical characteristics of the patients are shown in Table 1. The mean age at diagnosis was 57.6 ± 11.1 years, while the majority were females (61.8%). No sex- or age-related differences were observed between patients with sarcoidosis who developed chronic disease and those who experienced remission. Patients with a chronic course were significantly more likely to present with symptoms and, as expected, received treatment, primarily corticosteroids, while their pulmonary function was marginally impaired, as expressed with diffusing capacity of the lungs (DLCO). Arthritis was also significantly more frequent among those with a chronic course. In our cohort, 25% of the patients presented chronic sarcoidosis.
Basic characteristics of patients by chronicity.
p-Values are derived from 2-sided Fisher’s exact tests.
DLCO, diffusing capacity of the lung for carbon monoxide; FVC, forced vital capacity.
Bold indicates statistically significant differences at p<0.05.
Thoracic involvement, by either intrathoracic lymph nodes or lungs, was the most frequently detected overall. The most frequently affected organs besides the thorax were the extrathoracic lymph nodes, joints, skin, liver and musculoskeletal system. As shown in Figure 2, patients with chronic sarcoidosis exhibited arthritis significantly more often than those without chronic disease. Overall, fatigue, cough and arthralgia were among the most frequently reported symptoms. As shown in Figure 3, patients with chronic sarcoidosis experienced fatigue, arthralgia and night sweats significantly more often than those without chronic disease.

Organ involvement and chronicity. The asterisk indicates a statistically significant difference (p < 0.05).

Symptoms and chronicity. The asterisk indicates a statistically significant difference (p < 0.05).
Disease staging and phenotyping by chronicity
In addition, we applied to our cohort the previously proposed clinical and radiological phenotypes, including the Scadding stage, the GenPhenReSa and the PET scan phenotyping related to chronicity. According to Scadding x-ray stage classification, 1.5% were stage 0, 63.2% were stage I, 25% were stage II, 4.4% were stage III, and 5.9% were stage IV. No statistically significant differences in these phenotypes were observed between patients with a chronic course and those with a self-limited course, as shown in Table 2.
Key associated variables and predominant organs involved in the three clusters.
Mean Jaccard is provided as a measure of clusters’ reproducibility.
Cluster analysis
The two groups identified by the hierarchical clustering included 63 (Cluster 1, 92.6%) and 5 (Cluster 2, 7.4%). Differences across clusters in clinical variables are shown in Table 3. Cluster 1 is characterised by low extrapulmonary involvement, with 98% of patients having no liver involvement, 97% no musculoskeletal and no skin involvement, 95% no Spleen and no Cardiac involvement. In Cluster 2, the majority of patients had liver (80%), Musculoskeletal (75%), Spleen (67%), Cardiac (50%) and Skin (43%) involvement. Both clusters indicated moderate to weak reproducibility (Table 3). Notably, lymph node and lung involvement did not distinguish disease behaviour across clusters, which explains their lack of discrete predominance in the clustering analysis.
Logistic regression model for the odds of chronicity.
Bold indicates statistically significant differences at p<0.05.
The two phenotypic clusters were not found to be associated with increased odds of chronicity in sarcoidosis. On the other hand, having only lymph node involvement (i.e. in the absence of any other organ involvement) is associated with decreased odds of chronicity (OR: 0.17, 95% CI: 0.02–0.69; p = 0.028). Adding the presence of arthritis to the above model revealed that it emerges as the key symptom associated with an increased risk of chronicity (OR = 17.02, 95% CI 2.04–471.07; p = 0.027). In the presence of arthritis, the effect of having only lymph nodes’ involvement became non-significant (p = 0.071).
Sensitivity analysis
A 3-cluster solution was explored as a secondary, hypothesis-generating analysis (Figure 4). In this model, a distinct intermediate cluster emerged, primarily characterised by the presence of arthritis, along with eye and skin involvement. Notably, this cluster was not retained in the more stable 2-cluster solution, where these features were instead incorporated into the multi-organ involvement group. Due to the low number of participants, the three clusters were small (Cluster 1: n = 56, Cluster 2: n = 9 and Cluster 3: n = 3) and had a medium to low reproducibility (mean Jaccard 0.47, 0.25, and 0.53).

Organ involvement variables were visualised using multiple correspondence analysis (MCA). Clusters of patients were derived from hierarchical clustering of MCA coordinates. Scatter plots (upper panel) show individual patients coloured by cluster membership, with cluster centres indicated, while variable plots (lower panel) display the contribution of each categorical variable to the MCA dimensions.
Discussion
Our study reveals three distinct phenotypes in sarcoidosis by using cluster analysis based on clinical and radiological 18F-FDG PET-CT characteristics. Cluster 2 in our study, characterised by the presence of arthritis and predominant eye and skin involvement, is associated with a more chronic disease course. Interestingly, the presence of lone intrathoracic or extrathoracic lymphadenopathy appears to be significantly protective against chronicity. Importantly, the clustering approach we propose offers preliminary insight into phenotypic patterns associated with disease severity and clinical course.
According to our main clustering analysis, that is the 2-cluster approach, patients appear to segregate into two main phenotypic groups, a low-burden phenotype, characterised predominantly by the absence of major organ involvement, and a high-burden phenotype, characterised by multi-organ involvement, including hepatic, splenic, musculoskeletal and cardiac disease. Although the limited reproducibility of the 3-cluster solution precludes definitive conclusions, the consistent clinical pattern observed in this intermediate cluster is of particular interest. In conjunction with the observed association between arthritis and chronicity in our cohort, these findings suggest that this pattern may not be incidental and may support the hypothesis that arthritis-predominant sarcoidosis may represent a third, clinically relevant phenotype, potentially associated with chronic disease evolution, which is not reliably captured in smaller datasets but may emerge in larger cohorts. In general, in sarcoidosis, arthritis may often occur as part of Löfgren syndrome, where it accompanies erythema nodosum and bilateral hilar lymphadenopathy. This specific clinical phenotype, characterised by acute, is associated with an excellent prognosis, typically being self-limited and resolving within months.20–22 However, arthritis is not confined exclusively to the Löfgren phenotype; in a smaller but clinically important subset of patients, it occurs outside this context and is associated with chronic sarcoidosis, characterised by persistent inflammation.20–22 Chronic sarcoid arthropathy is characterised by persistent oligo or polyarthritis in approximately 20% of patients, with arthralgia reported in up to 40%. 20 Previous data align with our findings, particularly with Cluster 2, defined by arthritis together with eye and skin involvement, as chronic arthritis in the literature frequently clusters with multi-organ disease in sarcoidosis, including skin, eye, liver, and spleen involvement.14,23,24 Compared with other previously described clusters, despite the differences with ours, we found some interesting similarities. The so-called ‘ocular–cardiac–cutaneous–central nervous system (OCCC)’ and ‘musculoskeletal–cutaneous’ subsets in the GenPheReSa study probably converge in Cluster 2 of our study. 14 However, the above large study was not designed to associate the phenotypes with prognosis. Furthermore, as expected based on the previously published literature, lung and mediastinal lymph node involvement occurred in the vast majority of our patients.14,25 Beyond the lungs, extrapulmonary sarcoidosis in our cohort involved extrathoracic lymph nodes, the liver, the musculoskeletal system, and the skin, as the most frequently affected organs. Overall, constitutional symptoms, like fatigue and cough, were also common, confirming previous studies.26,27
Moreover, no differences emerged when applying other proposed phenotyping strategies.10,16 In their study, Papiris SA et al. described distinct phenotypes based on PET-CT findings. 16 The pivotal role of PET-CT has increasingly emerged, as it can reveal the full spectrum of disease activity, including clinically silent sites of sarcoid inflammation.11,15,28,29 However, their methodology for defining phenotypic subsets is based on direct clustering, which may be more sensitive to noise, variable scaling, and redundancy. For example, Cluster 3 in the study by Papiris SA et al., characterised by extensive lymph node involvement, may not provide additional clinical insight, since the extent of lymphadenopathy does not appear to correlate with distinct clinical behaviour. In our cohort, applying their clustering approach did not reveal differences in chronicity. Conversely, in our predictive model, the presence of lone lymph node involvement, whether intrathoracic or extrathoracic, was associated with a protective effect against chronicity itself. This aligns with longstanding observations that chronic sarcoidosis is associated with broader organ involvement and that, as originally described by Scadding in pulmonary sarcoidosis, Stage I disease is linked to a more favourable prognosis.26,27,30,31
Previous clustering studies in sarcoidosis have attempted to group patients based on shared clinical characteristics.8,12,13,32 However, their complex and often mechanistic approaches, combined with subjective variable selection and frequent overlap between features, introduce a risk of circular reasoning and limit their prognostic implications. For example, in the analysis by Rubio Rivas et al., different cutaneous manifestations are distributed across multiple clusters, that is, plaques and lupus pernio in Cluster 6, maculopapular lesions in Cluster 5, and erythema nodosum in Cluster 1, raising questions about the biological coherence of these groupings. 8 In addition, in their cluster analysis, the authors additionally include therapeutic interventions, such as corticosteroids and steroid-sparing agents, that inherently modify disease course and therefore may confound prognostic interpretation.8,32 In our study, we intentionally excluded corticosteroid therapy from the clustering variables, as steroid use influences the clinical course of sarcoidosis and some chronic symptoms reported by patients may be related to steroid use rather than the disease itself. In another study, Rodrigues et al. included lung function parameters such as forced vital capacity (FVC), residual airflow limitation, residual restriction, radiographic stage, and pulmonary fibrosis, which are variables that all reflect pulmonary involvement and substantially overlap. This overlap may have influenced their clustering results. 32 For example, it is well established that patients with sarcoidosis-related pulmonary fibrosis typically present with lower FVC values and more advanced radiographic stages at baseline. Consequently, compared with previously published cluster analyses in sarcoidosis, our proposed phenotypic stratification not only groups patients according to shared clinical characteristics but also identifies, at the time of diagnosis, those more likely to follow an unfavourable disease course. This represents a critical tool for clinicians managing a disease known for its unpredictable behaviour. The ability to distinguish clinically meaningful phenotypes with prognostic relevance may enable earlier therapeutic interventions and potentially prevent the development of chronic, irreversible disease.
Our study presents several limitations, the most important being the relatively small number of patients and the retrospective study design. The small size of our sample may limit the internal validity and stability of our results, that is, their reproducibility. This is the reason why we opted to identify only three clusters. Consequently, the generalisability of the findings should be interpreted with caution, although the clusters were produced following an unsupervised method and are biologically plausible. Secondly, external validation using independent cohorts is lacking, and the robustness of our cluster analysis requires further confirmation. Future multicentre, large-sample, and prospective studies are warranted to validate the reproducibility of our clustering approach in external sarcoidosis cohorts to better establish its generalisability. In any case, our study represents a single-centre study based on a well-defined group of patients with histologically proven sarcoidosis.
Conclusion
Two distinct phenotypes in sarcoidosis are identified by using cluster analysis. In a sensitivity analysis that allowed for three clusters, a third cluster, characterised by the presence of arthritis and predominant eye and skin involvement, emerged. Interestingly, the presence of lone intrathoracic or extrathoracic lymphadenopathy appears to be significantly protective against chronicity.
Supplemental Material
sj-pdf-1-tar-10.1177_17534666261462403 – Supplemental material for Decoding sarcoidosis chronicity through phenotypic profiling
Supplemental material, sj-pdf-1-tar-10.1177_17534666261462403 for Decoding sarcoidosis chronicity through phenotypic profiling by Ioannis Tomos, Georgia Vourli, Andreas M. Matthaiou, Nikoleta Bizymi, Pantelis Avarlis, Vasiliki Bessa, Chrysavgi Kosti, Serafeim Chrysikos and Adamantia Liapikou in Therapeutic Advances in Respiratory Disease
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
