Artificial Intelligence Performance on Pressure Injury Diagnosis: A Systematic Review and Meta-Analysis

Abstract

Objective:

With the advent of artificial intelligence (AI), new diagnostic tools have emerged, potentially offering more consistent and accurate detection. This study aimed to systematically evaluate the diagnostic performance of AI in identifying pressure injuries (PIs).

Approach:

This systematic review and meta-analysis was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines and was registered in Prospective Register of Systematic Reviews (CRD 42024618716). Comprehensive literature searches were performed across PubMed, Embase, IEEE, Cochrane, arXiv, and ACM databases for studies published up to November 2024. Two independent reviewers screened studies, extracted data, and assessed methodological quality using the QUADAS-AI (Quality Assessment of Diagnostic Accuracy Studies–Artificial Intelligence) tool. Statistical analyses were performed using R 4.3.3, RevMan 5.4, Stata 17, and Meta-Disc 1.4.

Results:

A total of 16 studies met the inclusion criteria, among which 12 provided 147 2 × 2 confusion matrices for quantitative synthesis. The meta-analysis showed a pooled sensitivity of 0.77 (95% CI: 0.76–0.77), specificity of 0.92 (95% CI: 0.92–0.92), and an area under the summary receiver operating characteristic curve of 0.928 (SE = 0.0079), indicating high diagnostic accuracy. Methodological quality was generally fair, but most studies were retrospective and lacked external validation.

Innovations:

This study applied the QUADAS-AI tool for quality assessment and conducted subgroup analyses by PIs stage, algorithm type, and region to offer a nuanced understanding of AI diagnostic performance.

Conclusions:

AI-based systems demonstrate promising diagnostic accuracy in detecting PIs, with high sensitivity and specificity.

Keywords

artificial intelligence pressure injury diagnosis systematic review meta-analysis

INTRODUCTION

Pressure injuries (PIs) are localized damages to the skin and underlying soft tissues, typically resulting from compression between bony prominences and external surfaces or medical devices. These injuries represent a significant clinical concern in hospitals and long-term care facilities globally.¹ Epidemiological data indicate that approximately 12.9% of patients admitted to acute care hospitals in Australia and New Zealand present with PIs, while 7.9% of patients develop PIs during hospitalization.² Meanwhile, multinational clinical studies conducted across surgical centers in China, South Korea, and other countries have demonstrated that the incidence of intraoperative PIs ranges from 4.1% to 41.75%.³ The impact of PIs extends beyond patient discomfort, significantly affecting mortality rates and health care cost.⁴ Systematic reviews have established that patients with PIs face approximately twice the mortality risk compared with those without such injuries.⁵ In addition, PIs are classified into six stages based on the severity and depth of tissue loss: Stages 1–4, unstaged, and deep tissue.⁶ Severe pressure ulcers (stage III and above) are associated with higher health care costs.⁷ Early diagnosis of PIs and prompt intervention are therefore essential to prevent disease progression and facilitate timely recovery.⁸

Xu Zhang, PhD

Current diagnostic approaches for PIs rely primarily on visual assessment by clinical nurses. While clinical practice utilizes approximately 40 risk assessment scales (e.g., Braden, Waterlow, Norton⁹) for predicting PIs development risk, these tools do not assist in diagnosing existing wounds or determining their stage. In addition, the accuracy of visual diagnosis is highly dependent on the professional knowledge and experience of nurses, which can easily lead to errors in diagnosis and staging.¹⁰ This limitation has prompted the exploration of more objective and consistent diagnostic technologies. In recent years, artificial intelligence (AI) has emerged as a promising solution in nursing practice, with increasing applications in diagnosis, complication management, prognosis prediction, and relapse assessment.^11,12 Specifically, AI algorithms have demonstrated potential for enhancing the diagnostic accuracy of PIs through standardized image analysis and pattern recognition capabilities.

Currently, the two main types of AI techniques applied to diagnose PIs and their staging are machine learning (ML) and deep learning (DL). ML recognizes and classifies dermatological diseases by manually extracting and selecting features by experts.¹³ Common algorithms include logistic regression, support vector machines, and least absolute shrinkage and selection operator methods.¹⁴ Liu et al. developed an ML-based algorithm capable of rapidly determining PIs stages and reporting impressive accuracy values of 0.98, 0.97, 0.95, and 0.95 for stages I through IV, respectively.¹⁵ DL, representing the most recent evolution of ML, learns to recognize patterns and features from extensive datasets without requiring manual feature extraction. DL automatically processes and assimilates complex data characteristics through multilayer neural networks to identify and classify skin conditions.¹⁶ Prominent DL algorithms in PIs assessment include convolutional neural networks, deep convolutional neural networks, and deep neural networks.¹⁴ Tusar et al. developed an algorithmic model based on You Only Look Once version 8, which measured different stages of PIs and achieved an accuracy of 0.9 for deep tissue damage in PIs.¹¹

Despite several systematic reviews and meta-analyses examining AI applications in PIs management, significant limitations persist in the current literature. Pei et al.¹⁷ and Zhou et al.¹⁸ conducted systematic reviews and meta-analyses focusing exclusively on AI for PIs prediction rather than diagnostic performance assessment. While Dweekat et al.¹⁹ summarized the contribution of ML in the field of PIs and explained its potential to intervene at various stages of PIs, the literature included in this systematic review was all published before July 2022. Given the rapid advancement of AI algorithms, updated analyses incorporating recent publications are essential to accurately assess the current state of AI-based PIs diagnosis. Furthermore, methodological considerations necessitate reassessment of previous studies. The QUADAS-AI (Quality Assessment of Diagnostic Accuracy Studies–Artificial Intelligence), a quality assessment tool specifically for AI diagnostic-type studies, was proposed by Sounderajah et al.,²⁰ suggesting that quality assessments in earlier reviews may require updating. Finally, many researchers have developed algorithms to quantify the diagnostic performance of different stages of PIs²¹; however, there is a lack of quantitative cross-sectional comparisons of the effectiveness of multiple AI algorithms for diagnosing PIs at different stages. Therefore, this study aims to evaluate the diagnostic accuracy of AI algorithms in PIs detection based on published data through a systematic literature review and meta-analysis. Subgroup analyses will be used to compare the following: (1) the accuracy of AI algorithms in diagnosing PIs across different regions; (2) the accuracy of AI algorithms in diagnosing PIs at different stages (stages 1–4); and (3) the accuracy of AI algorithms developed using different types of technologies (ML, DL, and other algorithm types) for diagnosing PIs, thereby providing evidence-based guidance for clinical application.

METHODS

This systematic review and meta-analysis was registered in Prospective Register of Systematic Reviews (PROSPERO, CRD 42024618716). All methodological procedures strictly adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy guidelines (Supplementary Table S1)²² and Meta-Analyses Of Observational Studies in Epidemiology²³ reporting standards, as well as the rigor checklist for AI in health care and clinical research.²⁴

Search strategy

Two researchers (X.Z. and Y.G.) independently conducted a comprehensive literature search across six electronic databases: PubMed, Embase, IEEE Xplore, Cochrane Library, arXiv, and ACM Digital Library. The search encompassed all literature published from database inception through November 2024 examining the diagnostic accuracy of AI applications for PIs. The search was conducted using a combination of subject terms and free terms, and was adapted to the characteristics of each database. In addition, manual search and literature tracing were also conducted to collect research literature that met the requirements based on the search results. The results of the literature screening process conducted by the two researchers were then compared. In instances where a consensus could not be reached, the disputed literature was to be submitted to a third researcher for adjudication. The English search terms include “artificial intelligence,” “diagnosis,” “performance,” and “pressure injury”. The complete search terms and strategies for all databases are provided in Supplementary Table S2.

Eligibility criteria

We established precise inclusion and exclusion criteria before the literature search to ensure methodological rigor and relevance to our research question. Studies were included if they met all of the following criteria: (i) Published diagnostic studies evaluating AI applications for PIs assessment. (ii) Utilized established clinical diagnostic criteria as the reference standard, including the European Pressure Ulcer Advisory Panel (EPUAP),²⁵ National Pressure Injury Advisory Panel (NPIAP),²⁶ Pan Pacific Pressure Injury Alliance (PPPIA),⁸ or histopathological confirmation, or utilized validated PIs datasets. (iii) Studies that reported AI performance metrics. (iv) Published in English or Chinese with full-text accessibility.

Studies were excluded if they met any of the following criteria: (i) Nonprimary research publications (case reports, conference abstracts, reviews, letters, commentaries, editorials). (ii) Studies focusing solely on image processing techniques without diagnostic classification (e.g., image segmentation, wound measurement). (iii) Studies where the original text was not available or relevant data were not available.

Data extraction

Two researchers independently screened the eligible studies based on the predefined inclusion and exclusion criteria. Following the screening process, relevant data were extracted from each study using a standardized data extraction form. Data extraction included first author’s name, year of publication, country, sample size, gold standard, internal validation type, external validation, algorithms’ architecture, and assessment metrics for diagnostic efficacy. When discrepancies arose between data extracted by two researchers, these would be submitted to a third researcher for adjudication.

Risk-of-bias assessment

The risk of bias for each included study was evaluated using the QUADAS-AI tool.²⁰ Two researchers independently performed the assessments, and any disagreements were resolved through arbitration by a third senior reviewer. The tool comprises four domains: Patient selection, Index test, Reference standard, and Flow and timing. The first three domains also incorporate an evaluation of clinical applicability. Each domain was rated as having a high, low, or unclear risk of bias.

Statistical analysis

Statistical analysis was performed using R 4.3.3, RevMan 5.4, Stata 17, and Meta-Disc 1.4 software. The quality of the included studies was assessed and corresponding graphs were generated using RevMan 5.4 software. Data were analyzed using R 4.3.3, Stata 17, and Meta-Disc 1.4 software. First, the presence of heterogeneity due to threshold effects was examined by plotting summary receiver operating characteristic (SROC) curve scatter plots and Spearman correlation analysis. Should the scatter plot exhibit a pronounced “shoulder-arm” morphology with a correlation coefficient of p value exceeding 0.05, this indicates the presence of a threshold effect in the research. Subgroup comparisons are then conducted by calculating the area under the SROC curve (SROC-AUC). Conversely, if no threshold effect is present, heterogeneity attributable to nonthreshold effects is assessed through Cochran’s Q test and I² value. Unless substantial evidence indicates homogeneity of effects across studies of varying methodological quality, data synthesis uses a random-effects model.²⁷ Finally, the risk of publication bias was assessed using Deeks funnel plot, and the sources of heterogeneity were explored through subgroup analyses, with three subgroup analyses. (i) study region (Asia, Europe, North America); (ii) stage of PI (stages 1–4); and (iii) type of AI algorithms (ML, DL, and other algorithms).

RESULTS

A total of 214 articles related to the research topic were initially retrieved from various databases. After removing duplicates, 199 unique records remained. An additional 23 articles were identified through manual searches and reference tracking. Following a rigorous multistage screening process based on predefined inclusion and exclusion criteria, 16 high-quality studies were ultimately included in the review,^28–43 with 12 undergoing meta-analysis.^{28–31,34,36,37,39–43} Supplementary Table S3 summarizes the listings extracted from the included studies. A total of 147 2 × 2 confusion matrices on diagnostic performance were provided. The screening process and results of the literature are presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram (Fig. 1).

Figure 1.

Flowchart of literature screening. The flowchart illustrates the process of identifying and screening studies, from database searches to the final selection of studies meeting the eligibility criteria.

Basic characteristics and quality assessment

The basic characteristics of the 16 included studies are summarized in Table 1. Four studies were conducted in China,^33,34,40,41 three in Turkey,^28,42,43 two each in the United Kingdom,^29,38 Japan,^30,36 and South Korea,^31,37 with one study each from Greece,³² Italy,³⁵ and the United States,³⁹ indicating a broad international distribution.

Table 1.

Basic characteristics of included studies

First Author and Year	Country	Sample Size (n)	Gold Standard	Algorithmic Architecture	Source of Data	Open Access Data	Internal Validation Type	Number of Images for Training/Testing	External Validation	Annotation Method	Consistency Among Raters
Ay et al 2022²⁸	Turkey	1,091	EPUAP	CNN	Retrospective study with data from Pressure Injury Images Dataset (PIID)	Yes	4:1 Randomized split-sample verification	874/217	No	A clinical doctor and a dermatologist	NR
Fergus et al 2023²⁹	The United Kingdom	4,290	NICE	Faster R-CNN	Retrospective study with data from Medetec dataset (publicly available) and Google Images (not publicly available)	No	2:1 Randomized split-sample verification	3,861/429	Yes	A district nurse with expertise in pressure ulcer	No
Ikuta et al 2024³⁰	Japan	14,472	DESIGN-R®	CNN	Retrospective study with data from Tottori University Hospital, Japan	No	133:1 Randomized split-sample verification	14,364/108	No	plastic surgeons	Mark according to medical records and correct any obvious errors.
Kim et al 2023³¹	South Korea	3,098	NPUAP	SE-ResNext 101(CNN)	Retrospective study with data from Severance and Gangnam Severance hospitals in Korea	No	A 10-fold cross-validation	2,614/484	Yes	Four certified wound nurses with more than 1 year of experience and one board-certified dermatologist	Vote on images that are subject to disagreement. If the voting results are inconclusive, the professionals convened and reviewed the relevant image and consultation report to arrive at a consensus.
Kosmopoulos et al 2007³²	Greek	85	EPUAP	SVM	Retrospective study with data from various hospitals	No	NR	NR	No	Operator	NR
Lau et al 2022³³	China	1,442	NPIAP	YOLOv4	Retrospective study with data from a publicly available GitHub repository, consisting of images from the Medetec Wound Database and other public sources	No	A 10-fold cross-validation	1,278/144	Yes	An experienced (>10 years) wound nurse	Reviewed and confirmed by another nurse (member of the WCET)
Liu et al 2022³⁴	China	528	EPUAP/NPIAP/PPPIA	CNN	Retrospective study with data from National Taiwan University College of Medicine Hospital	No	3:2 Randomized split-sample verification	327/201	Yes	Three plastic surgeons	If there are any disagreements, discuss them and reach a consensus.
Orciuoli et al 2020³⁵	Italy	NR	NPUAP/EPUAP	MobileNet V2	Retrospective study with data from MAGALDI INNOVA Srl	No	Randomized split-sample verification	NR	No	the field operator	NR
Sakakibara et al 2023³⁶	Japan	51	EPUAP	Image identification algorithm	Retrospective study with data from Kobe University, Japan	No	1:1 randomized split sample verification	27/24	No	Two plastic surgeons with experience in wound care	The two evaluators agreed on their assessment of all the images.
Seo et al 2023³⁷	South Korea	2,464	EPUAP/NPIAP/PPPIA	VGG16/ ResNet50/ ResNet152/ DenseNet201/ EfficientNet-B4	Retrospective study with data from SMG-SNU Boramae Medical Center, Korea	No	9:1 Randomized split-sample verification	2,218/246	No	one plastic surgeon and two wound care specialized nurses	Vote on controversial images until a consensus is reached, while also comparing the effect values of the algorithm with those of the other two nurses’ diagnoses.
Shiraishi et al 2024³⁸	The United Kingdom	10	NPIAP	GPT-4 Turbo/ BingAI Creative mode	Retrospective study with data from NHS England website	Yes	NR	NR	No	NR	NR
Swerdlow et al 2023³⁹	The United States	969	NR	Mask-R-CNN	Retrospective study with dataset from eKare USA, Inc.	No	7:1 Randomized split-sample verification	848/121	No	Images were segmented and classified manually by two study authors with medical training; images that resulted in disagreements were discussed until a unifying conclusion was reached.	NR
Wang et al 2021⁴⁰	China	246	NR	SVM/RF/CNN	Prospective study, with data collected from follow-up of selected patients in the First Affiliated Hospital of Wenzhou Medical University	No	Quadruple-fold cross-validation	164/82	No	Nurses and doctors	NR
Wang et al 2024⁴¹	China	1,519	NPUAP	CNN	Retrospective study with a multicenter cohort of pressure injury images, including three internal datasets (Wuhan University People’s Hospital, Union Hospital of Tongji Medical College of Huazhong University of Science and Technology, and Zhongshan Hospital of Xiamen University) and one publicly available dataset (Pressure Injury Image Dataset, PIID)	No	4:1 Randomized split-sample verification	1,216/303	No	Experienced doctors and nurses (>10 years of experience)	NR
Yilmaz et al 2021⁴²	Turkey	142	EPUAP/NPUAP	ANN/LR	Retrospective study with data from public dataset	Yes	4:1 Randomized split-sample verification	114/28	No	A plastic surgeon	NR
Zalluhoğlu et al 2024⁴³	Turkey	1,202	EPUAP	AlexNet/ DenseNet169/EfficientNetX-B0-5/GoogLeNet/MobileNetV2/ResNet50/ VGG16/CNN	Retrospective study with data from Ankara Pursaklar State Hospital	No	3:1 Randomized split-sample verification	907/295	No	Experts	NR

CNN, convolutional neural network; DESIGN-R®, Development of the Seven-Item Pressure Ulcer Surveillance Scale by the Scientific and Educational Committee of the Pressure Ulcer Society of Japan; EPUAP, European Pressure Ulcer Advisory Panel; N, total number of images included in the study; NICE, National Institute for Health and Care Excellence; NPIAP, Pressure Injury Pediatrics Advisory Panel; NPUAP, National Pressure Ulcer Advisory Panel; NR, not reported; PPPIA, Pan Pacific Pressure Injury Alliance; SVM, support vector machine; YOLOv4, You Only Look Once version 4.

Reliable diagnostic reference standards were used in 14 studies,^{28–35,37,38,41–43} whereas two studies^39,40 did not clearly define the reference standard. Regarding internal validation methods, 11 studies^{28–30,34–37,39,41–43} utilized random splitting of data, two adopted 10-fold cross-validation,^31,33 one used fourfold cross-validation,⁴⁰ and two did not specify the method.^32,38 Only four studies conducted external validation.^29,31,33,34

Supplementary Figures S1 and Figure S2 provide the results of the risk-of-bias evaluation of the included studies. In terms of image selection for PIs diagnosis, seven studies were deemed high risk and eight had unclear risk. Clinical applicability in this domain was unclear in nine studies, possibly due to the reliance on publicly available image databases without clear inclusion and exclusion criteria. For the index test domain, 10 studies demonstrated a high risk of bias, while clinical applicability remained unclear in eight, largely due to the absence of external validation. Regarding reference standards, 14 studies were judged to have low risk, and two were of unclear risk. In the domain of flow and timing, seven studies had low risk, seven were unclear, and two were high risk.

Pooled diagnostic performance of AI algorithms for PI detection

In this study, the overall Spearman correlation coefficients and subgroup-specific correlation coefficients were calculated to assess the presence of a threshold effect. As shown in Supplementary Table S4, the combined Spearman correlation coefficient across all studies was –0.068 (p = 0.416), indicating no evidence of a threshold effect. However, subgroup analysis revealed significant threshold effects in two cases: PIs stage 4 (correlation coefficient = –0.547, p = 0.002) and studies using other types of algorithms (correlation coefficient = –0.943, p = 0.005).

Table 2 and Supplementary Table S4 present the meta-analysis results. The pooled sensitivity of AI algorithms for diagnosing PIs was 0.77 (95% CI: 0.76–0.77), with substantial interstudy heterogeneity (I² = 88.5%, χ² = 1271.6, p < 0.001). The pooled specificity was 0.92 (95% CI: 0.92–0.92), also demonstrating high heterogeneity (I² = 91.7%, χ² = 1761.06, p < 0.001). The combined positive likelihood ratio (LR+) was 9.87 (95% CI: 8.62–11.31), and the negative likelihood ratio (LR−) was 0.26 (95% CI: 0.23–0.29). The pooled diagnostic odds ratio was 40.86 (95% CI: 33.75–49.46). The area under the SROC-AUC was 0.928, with a standard error of 0.0079 (Fig. 2A).

Figure 2.

Overall effectiveness of the 12 studies. (A) Area under the SROC curve (SROC-AUC) of included studies in the meta-analysis. (B) Fagan plot of included studies in the meta-analysis. The symmetric and asymmetrical SROC models exhibit nearly identical performance. The Fagan nomogram results indicate that a positive likelihood ratio of 12 increases the post-test probability to 92% from a 50% prior probability, while a negative likelihood ratio of 0.24 reduces it to 19%. SROC, summary receiver operating characteristic.

Table 2.

Results of combined analysis of artificial intelligence for diagnosis of pressure injuries

Categorization	Number of Studies	Sensitivity (95% CI)	Specificity (95% CI)	Positive Likelihood Ratio (95% CI)	Negative Likelihood Ratio (95% CI)	Diagnostic Odds Ratio (95% CI)	AUC (95% CI)
Overall	12^{28–31,34,36,37,39–43}	0.77 (0.76–0.77)	0.92 (0.92–0.92)	9.87 (8.62–11.31)	0.26 (0.23–0.29)	40.86 (33.75–49.46)	0.928 (0.0079)
Continent
Asia	7^{30,31,34,36,37,40,41}	0.82 (0.81–0.83)	0.92 (0.91–0.93)	11.18 (8.77–14.26)	0.18 (0.15–0.22)	70.4 (47.93–103.41)	0.946 (0.0102)
Europe	4^28,29,42,43	0.73 (0.72–0.74)	0.92 (0.91–0.92)	9.04 (7.66–10.66)	0.31 (0.28–0.35)	30.46 (24.61–37.71)	0.905 (0.0121)
North America	1³⁹	0.94 (0.88–0.98)	0.98 (0.95–0.99)	25.89 (14.3–46.9)	0.08 (0.02–0.34)	471.16 (148.03–1499.65)	0.991 (0.0051)
Pressure ulcer staging
Stage 1	4^28,29,39,42	0.76 (0.73–0.78)	0.96 (0.96–0.97)	18.68 (14.72–23.71)	0.25 (0.19–0.32)	81.84 (54.51–122.87)	0.976 (0.0109)
Stage 2	5^{28,29,37,39,42}	0.60 (0.57–0.62)	0.93 (0.92–0.93)	8.10 (6.66–9.86)	0.43 (0.37–0.5)	20.19 (14.81–27.53)	0.923 (0.0240)
Stage 3	5^{28,29,37,39,42}	0.79 (0.78–0.80)	0.84 (0.83–0.85)	5.09 (4.08–6.36)	0.28 (0.22–0.35)	19.48 (13.94–27.22)	0.887 (0.0159)
Stage 4	5^{28,29,37,39,42}	Presence of threshold effects					0.962 (0.0187)
Algorithm type
DL	9^{28,29,31,34,37,39–41,43}	0.76 (0.75–0.77)	0.92 (0.92–0.92)	10.17 (8.84–11.7)	0.27 (0.24–0.3)	41.07 (33.75–49.98)	0.928 (0.0085)
ML	2^40,42	0.85 (0.77–0.91)	0.82 (0.70–0.91)	4.14 (2.21–7.75)	0.2 (0.11–0.36)	22.97 (7.15–73.74)	0.898 (0.0487)
Other algorithms	3^30,34,36	Presence of threshold effects					0.934 (0.0292)

AUC, area under the curve; DL, deep learning; ML, machine learning.

Figure 2B demonstrates the results of the analysis of the Fagan plot, from which it can be seen that the LR+ of the current diagnostic tests performs well and is suitable for confirmatory diagnosis in high-prevalence populations. However, the LR− suggests insufficient exclusionary power and needs to be combined with more test results in clinical confirmation. Supplementary Figure S3 demonstrates the results of the funnel plot analysis, and the significant result of Deeks test (p < 0.05) indicates that there is a statistical asymmetry in the funnel plot, which suggests a possible publication bias. This may occur because studies reporting higher AI diagnostic sensitivity are more likely to be published, leading to high overall estimates. Supplementary Figures S4-S7 present the crosshair plots summarizing all confusion matrices as well as the crosshair plots for individual subgroups.

Subgroup analysis

Table 2 show the results of subgroup analyses according to region, with a combined sensitivity of 0.94 [95% CI (0.88–0.98)] in North America being higher than in Asia [0.82, 95% CI (0.81–0.83)] and Europe [0.73, 95% CI (0.72–0.74)]. However, there was only one study in North America, a relatively small sample size. In contrast, the sensitivity of Asia included seven studies with higher confidence, and in addition to this, the combined specificity of all three regions was greater than 0.9. Figure 3 demonstrates the SROC-AUC results of subgroup analyses performed in the three different regions, with an SROC-AUC of 0.946 in Asia ([SE] = 0.0102), an SROC-AUC of 0.905 in Europe ([SE] = 0.0121), and an SROC-AUC of 0.991 in North America ([SE] = 0.0051). Overall, the efficacy of using AI to diagnose PIs was better and more consistent in Asia.

Figure 3.

sROC AUC in different regions. (A) Asia: AUC = 0.9455; SE(AUC) = 0.0102; Q* = 0.8846; SE(Q*) = 0.0133. (B) North America: AUC = 0.9914; SE(AUC) = 0.0051; Q* = 0.9621; SE(Q*) = 0.0133. (C) Europe: AUC = 0.9051; SE(AUC) = 0.0121; Q* = 0.8368; SE(Q*) = 0.0132.

For the results of the subgroup analysis of the efficacy of AI in diagnosing different stages of PIs (Table 2), the combined sensitivity was 0.79 [95% CI (0.78–0.80)] for stage 3, 0.76 [95% CI (0.73–0.78)] for stage 1, and 0.60 for stage 2 [95% CI (0.57–0.62)], and overall, the sensitivity of AI to diagnose stage 3 was higher than the other two stages. In contrast, the combined specificity was 0.96 [95% CI (0.96–0.97)] for stage 1, 0.93 [95% CI (0.92–0.93)] for stage 2, and 0.84 [95% CI (0.83–0.85)] for stage 3, and AI diagnosis of stage 3 performed poorer than the other two stages in terms of specificity. As the threshold effect at the time of merging in the studies was included in stage 4, only the values of the AUC were merged and analyzed and compared (Fig. 4), with an SROC-AUC of 0.976 ([SE] = 0.0109) for stage 1, 0.923 ([SE] = 0.0240) for stage 2, 0.887 for stage 3 ([SE] = 0.0159), and 0.962 ([SE] = 0.0187) for stage 4. Stage 1 had the highest SROC-AUC value.

Figure 4.

sROC AUC with different pressure injury staging. (A) Stage 1: AUC = 0.9755; SE(AUC) = 0.0109; Q* = 0.9290; SE(Q*) = 0.0190. (B) Stage 2: AUC = 0.9230; SE(AUC) = 0.0240; Q* = 0.8569; SE(Q*) = 0.0279. (C) Stage 3: AUC = 0.8867; SE(AUC) = 0.0159; Q* = 0.8173; SE(Q*) = 0.0164. (D) Stage 4: AUC = 0.9618; SE(AUC) = 0.0187; Q* = 0.9071; SE(Q*) = 0.0278.

In the subgroup analysis of algorithm-type classification, according to Table 2, the sensitivity of the DL algorithm to diagnose PIs was 0.76 [95% CI (0.75–0.77)], which was smaller than that of the ML algorithm of 0.85 [95% CI (0.77–0.91)]; however, the specificity of the DL algorithm was 0.92 [95% CI (0.92–0.92)] greater than 0.82 [95% CI (0.70–0.91)] for ML algorithms. For the other types of algorithms, there is a threshold effect when merging, so only the values of AUC were merged, analyzed, and compared; according to Fig. 5, the SROC-AUC for the DL algorithm was 0.928 ([SE] = 0.0085), for the ML algorithm was 0.898 ([SE] = 0.0487), and for the other types of algorithms was 0.934 ([SE] = 0.0292).

Figure 5.

sROC AUC for different algorithm types. (A) Deep learning (DL): AUC = 0.9283; SE(AUC) = 0.0085; Q* = 0.8631; SE(Q*) = 0.0101. (B) Machine learning (ML): AUC = 0.8983; SE(AUC) = 0.0487; Q* = 0.8294; SE(Q*) = 0.0518. (C) Other algorithms: AUC = 0.9344; SE(AUC) = 0.0292; Q* = 0.8705; SE(Q*) = 0.0359.

DISCUSSION

PIs remain a major concern for hospitalized patients, with significant implications for patient outcomes and health care costs.⁴⁴ While various risk assessment tools exist, their effectiveness is often limited by caregiver expertise and inconsistent diagnostic criteria.⁴⁵ Recent advances in AI have introduced new possibilities for assisting in PIs diagnosis.³⁴ However, previous reviews have primarily focused on predictive modeling or image segmentation, with limited attention to diagnostic performance.^46–48 This study fills that gap by systematically evaluating the diagnostic accuracy of AI algorithms for PIs, offering quantitative evidence to support clinical and research applications.

This meta-analysis of 12 studies incorporating 147 2 × 2 confusion matrices demonstrated AI’s robust diagnostic performance in PIs detection, with a pooled sensitivity of 0.77 and a specificity of 0.92, achieving a diagnostic odds ratio of 40.86 and an AUC of 0.93. Significant heterogeneity (Supplementary Table S4) primarily originated from different algorithm types (Supplementary Table S3). However, the discriminatory capacity remained consistent with established AI meta-analyses in oncological imaging¹⁴ (sensitivity 0.88) and musculoskeletal diagnostics.⁴⁹ Future investigations warrant standardized external validation protocols and prospective multicenter trials to address algorithmic heterogeneity. The demonstrated diagnostic superiority substantiates AI’s capacity to enhance secondary prevention through elevated early detection rates, thereby enabling timely therapeutic interventions and improved rehabilitation outcomes in PIs management. However, funnel plot analysis revealed statistical asymmetry, suggesting a publication bias toward positive results. Future efforts should promote the preregistration of diagnostic AI studies (e.g., through the PROSPERO platform). According to the QUADAS-AI assessment, methodological flaws in the included studies have impacted the validity and applicability of the meta-analysis. First, significant patient selection bias (seven high-risk, eight unclear risk) and unclear applicability (nine studies) may introduce spectrum bias. By directly including PIs images from databases, challenging cases may be excluded, thereby overestimating diagnostic accuracy. Second, there are certain methodological flaws in the indicator detection domain (10 high-risk studies, 8 studies with unclear applicability). Since most of the included studies lack external validation, this may lead to overfitting, thereby overestimating the performance of the study results in practical applications. Therefore, caution is advised when translating these study results into clinical practice, and future studies should carefully consider the criteria for including images and conduct more external validation studies.

Regional subgroup analyses demonstrated consistent specificity (>0.90) across all geographic regions, with pooled specificities of 0.98 in North America, 0.92 in Asia, and 0.92 in Europe. Although sensitivities varied (0.94 in North America, 0.82 in Asia, and 0.73 in Europe), the overall diagnostic performance remained comparable, potentially due to standardized clinical criteria and similar wound segmentation architectures in AI systems.⁵⁰ Meanwhile, the factor of skin color has a relatively low impact on the segmentation of wound parameters (color, size, depth) in the image preprocessing process.⁵¹

The updated NPIAP²⁵ classification system outlines six PIs categories: stages 1–4, unstageable, and deep tissue injury. Current AI-driven diagnostic studies predominantly focus on stages 1–4 PIs, a research scope mirrored in our subgroup analyses. The evaluation demonstrated superior AUC performance across stages: 0.98 (stage 1), 0.92 (stage 2), 0.89 (stage 3), and 0.96 (stage 4). The AI exhibited higher AUC and superior diagnostic performance for stage 1 and 4 PIs, paralleling clinical caregivers’ accuracy in PIs staging assessments.⁵² The diminished diagnostic accuracy for stage 3 PIs in this study appears attributable to two interrelated mechanisms. Primarily, the clinical presentations of these stages demonstrate significant overlap with chronic wound etiologies such as diabetic and arterial ulcers, creating substantial diagnostic uncertainty⁵³; secondarily, the characteristic obscuration of stage 3 wound beds by necrotic tissue or eschar fundamentally compromises depth evaluation, resulting in misinterpretation.⁵⁴ The enhanced diagnostic accuracy for stage 1 PIs likely relates to its epidemiological prevalence, where its higher proportion in training datasets enables AI algorithms to optimize diagnostic parameters.⁵⁵ Collectively, these findings confirm the clinical value of AI-driven nursing systems in facilitating early-stage PIs detection and intervention. However, there remains insufficient attention to clinically significant unstaged or deep tissue damage, as well as overlapping etiologies of trauma (such as diabetic ulcers and PIs). Future research should focus more on these areas.

Subgroup analysis by algorithm type revealed distinct performance characteristics: DL demonstrated a pooled sensitivity of 0.76 and a specificity of 0.92, while ML showed higher sensitivity (0.85) but lower specificity (0.82). Other algorithm types achieved superior pooled AUC (0.934) compared with both, although with fewer included studies than DL. This result suggests greater diagnostic consistency in DL, whereas ML and other algorithms may exhibit variability influenced by publication bias or random effects during meta-analysis.⁵⁶ These findings align with established diagnostic research, particularly Wu et al’s comparative analysis⁵⁷ demonstrating comparable accuracy between DL (sensitivity = 0.82) and ML (sensitivity = 0.93) in chronic obstructive pulmonary disease detection, with nonsignificant performance variation. While ML exhibited lower misdiagnosis rates, DL showed superior comprehensive diagnostic performance metrics. Overall, ML demonstrated reduced misdiagnosis rates, and DL exhibited superior comprehensive diagnostic efficacy; however, these comparative performance outcomes require confirmation through rigorously designed prospective clinical studies.

Overall, the high diagnostic accuracy (pooled sensitivity 0.77, specificity 0.92) demonstrates AI’s potential to transform PIs management. First, in triage settings, AI’s high discriminative ability (AUC: 0.98 for stage I, 0.96 for stage IV injuries) enables nurses to rapidly initiate interventions for early-stage injuries to prevent progression, while prioritizing critical cases. Secondly, the high combined specificity (0.92) supports the use of artificial intelligence-assisted diagnosis to enhance efficiency and achieve standardised decision-making.³⁷ This supports AI-assisted diagnostics for improved efficiency and standardized decision-making. Finally, integration with mobile applications³⁵ facilitates remote screening, allowing patients to submit images for community nurse evaluation with AI support—enhancing diagnostic throughput in telemedicine workflows despite variable user experience levels. Therefore, in future clinical practice, AI can be integrated into the diagnostic process for PIs to optimize staged diagnosis and improve patient outcomes. For example, after training with PIs datasets and clinical guidelines, the most appropriate AI diagnostic system can be selected and embedded into the hospital system. Nurses can then take photographs and upload them, and the system will provide real-time diagnostic criteria, nursing care recommendations, and prevention guidance (Supplementary Fig. S8). However, widespread implementation is constrained by critical limitations: Prevalent methodological deficiencies in existing research designs, utilization of datasets with ill-defined evaluation criteria, and insufficient external validation across most of analyzed studies. To bridge this gap, future investigations should prioritize rigorous external validation of diagnostic algorithms to ensure clinical applicability and generalizability. In addition, device compatibility is critical, as differences in smartphone/tablet camera quality (e.g., resolution, light calibration) can affect the accuracy of image-based AI, especially in resource-poor settings. Further training needs for health care workers also pose a challenge, including technical proficiency (application operation, standardized image acquisition) and clinical interpretation of AI outputs to avoid overreliance. Furthermore, regulatory and ethical challenges encompass algorithm transparency (concerns about “black boxes”), liability allocation for misdiagnoses, and patient data privacy issues during image transmission. Addressing these issues through adaptive AI design, competency-based training programs, and clear governance frameworks will be key to achieving practical application.

This meta-analysis has the following limitations: (i) Only one of the included studies was prospective in design, with the remainder being retrospective. This limits the strength of the evidence and introduces selection bias. Future studies with larger sample sizes are required. (ii) ML subgroup analyses demonstrated limited statistical power due to insufficient study numbers, resulting in compromised reliability of pooled effect estimates that require validation through dedicated ML-focused meta-analytic approaches. (iii) Heterogeneous reference standards and labeling processes across studies may compromise the validity of pooled accuracy estimates. (iv) Substantial variation in sample sizes could disproportionately influence meta-analysis results, lacking clear mitigation strategies. (v) Among the 16 studies included, only four incorporated external validation. This raises concerns regarding the overfitting and generalization capabilities of AI models in real-world clinical settings, thereby undermining their credibility for application in independent populations. Therefore, the applicability of the findings of this study to different populations and health care settings needs to be further explored through external validation in future studies.

CONCLUSION

The diagnostic efficacy of AI for PIs is high, but there may be risks such as overfitting due to factors such as the timeliness of the databases included in the study and the lack of external validation. This refers to the fact that the data used to train and test the AI models may have been collected over a limited or outdated time period, potentially not reflecting the latest imaging technologies, clinical practices, or patient demographics. To enhance the suitability of AI algorithms for clinical applications, future research should consider external validation or prospective validation with a predefined cutoff to support their implementation.

INNOVATION

This study advances the field by applying the QUADAS-AI tool—a methodological innovation tailored for AI-based diagnostic accuracy studies—to systematically assess the quality of included evidence. It also integrates subgroup analyses across injury stages, algorithm types, and geographical regions, offering a more granular understanding of AI performance variability. By combining rigorous quality assessment with comprehensive subgroup analysis, the study provides a critical foundation for guiding future research and clinical implementation of AI in PIs diagnosis.

KEY FINDINGS

AI systems achieved a pooled sensitivity of 0.77 and specificity of 0.92 in diagnosing PIs, indicating high diagnostic accuracy.

The SROC-AUC reached 0.928, demonstrating strong overall performance across included models.

Subgroup analyses revealed variation in performance by injury stage, algorithm type, and geographic region.

AI-based tools hold promise for early and standardized PIs detection in clinical settings.

Most studies were retrospective and lacked external validation, highlighting the need for higher quality evidence.

AUTHORS’ CONTRIBUTIONS

Y.G. developed the study concept and design, conducted the electronic searches, performed data extraction, and wrote the first draft of this article. Y.G. and X.Z. contributed to data analysis and interpretation, and risk-of-bias assessment. S.A., Y.M., and X.H. assisted with electronic searches and literature screening. X.Z. contributed to the contextualization, critical review, and supervision. All authors have approved the final version of this article.

Footnotes

ACKNOWLEDGMENTS AND FUNDING SOURCES

Research reported in this publication was supported by the National Natural Science Foundation of China (No. 72404014).

AUTHOR DISCLOSURE AND GHOSTWRITING

No competing financial interests exist. The content of this article was expressly written by the authors listed. No ghostwriters were used to write this article.

ABOUT THE AUTHORS

Yu Ge, BS, Subinuer Aihemaitiniyazi, Yutong Mao, and Xin Hu are undergraduate students at the School of Nursing, Peking University. Xu Zhang, PhD, is an Associate Professor at the School of Nursing, Sun Yat-sen University.

Supplemental Material

Abbreviations and Acronyms

References

Mervis

, Phillips

. Pressure ulcers: Pathophysiology, epidemiology, risk factors, and presentation. J Am Acad Dermatol, 2019; 81(4):881–890; doi: 10.1016/j.jaad.2018.12.069

Rodgers

, Sim

, Clifton

. Systematic review of pressure injury prevalence in Australian and New Zealand hospitals. Collegian, 2021; 28(3):310–323; doi: 10.1016/j.colegn.2020.08.012

, Zhao

, Wu

, et al. Prediction models for intraoperative acquired pressure injury of adults: A systematic review and critical appraisal. Adv Wound Care (New Rochelle), 2025; doi: 10.1089/wound.2024.0238

Trozic

, Fischer

, Deckert

, et al. Impact of the degree of synergy between patient and nurse perceptions on the clinical outcome of pressure injury prevention: A mixed-methods systematic review protocol. BMJ Open, 2024; 14(9):e080542; doi: 10.1136/bmjopen-2023-080542

Song

, Shen

, Cai

, et al. The relationship between pressure injury complication and mortality risk of older patients in follow-up: A systematic review and meta-analysis. Int Wound J, 2019; 16(6):1533–1544; doi: 10.1111/iwj.13243

Sim

, Wilson

, Tuqiri

. The pressure injury prevalence and practice improvements (PIPPI) study: A multiple methods evaluation of pressure injury prevention practices in an acute-care hospital. Int Wound J, 2024; 21(10):e70050; doi: 10.1111/iwj.70050

Anand

, Kranker

, Chen

. Estimating the hospital costs of inpatient harms. Health Serv Res, 2019; 54(1):86–96; doi: 10.1111/1475-6773.13066

Kottner

, Cuddigan

, Carville

, et al. Prevention and treatment of pressure ulcers/injuries: The protocol for the second update of the international Clinical Practice Guideline 2019. J Tissue Viability, 2019; 28(2):51–58; doi: 10.1016/j.jtv.2019.01.001

Hillier

, Scandrett

, Coombe

, et al. Accuracy and clinical effectiveness of risk prediction tools for pressure injury occurrence: An umbrella review. PLoS Med, 2025; 22(2):e1004518; doi: 10.1371/journal.pmed.1004518

10.

Charalambous

, Koulori

, Vasilopoulos

, et al. Evaluation of the validity and reliability of the waterlow pressure ulcer risk assessment scale. Med Arch, 2018; 72(2):141–144; doi: 10.5455/medarh.2018.72.141-144

11.

Tusar

, Fayyazbakhsh

, Zendehdel

, et al. AI-powered image-based assessment of pressure injuries using you only look once (YOLO) Version 8 models. Adv Wound Care (New Rochelle), 2025; doi: 10.1089/wound.2024.0245

12.

Alves

, Azevedo

, Marques

, et al. Pressure injury prediction in intensive care units using artificial intelligence: A scoping review. Nurs Rep, 2025; 15(4):126; doi: 10.3390/nursrep15040126

13.

Deo

. Machine learning in medicine. Circulation, 2015; 132(20):1920–1930; doi: 10.1161/circulationaha.115.001593

14.

, Gong

, Liu

, et al. Artificial intelligence performance in image-based ovarian cancer identification: A systematic review and meta-analysis. EClinicalMedicine, 2022; 53:101662; doi: 10.1016/j.eclinm.2022.101662

15.

Liu

, Dou

, Guo

, et al. A novel technique for rapid determination of pressure injury stages using intelligent machine vision. Geriatr Nurs, 2025; 61:98–105; doi: 10.1016/j.gerinurse.2024.10.046

16.

, Zhang

, Zhao

, et al. Deep learning algorithms for melanoma detection using dermoscopic images: A systematic review and meta-analysis. Artif Intell Med, 2024; 155:102934; doi: 10.1016/j.artmed.2024.102934

17.

Pei

, Guo

, Tao

, et al. Machine learning-based prediction models for pressure injury: A systematic review and meta-analysis. Int Wound J, 2023; 20(10):4328–4339; doi: 10.1111/iwj.14280

18.

Zhou

, Yang

, Ma

, et al. A systematic review of predictive models for hospital-acquired pressure injury using machine learning. Nurs Open, 2023; 10(3):1234–1246; doi: 10.1002/nop2.1429

19.

Dweekat

, Lam

, McGrath

. Machine learning techniques, applications, and potential future opportunities in pressure injuries (bedsores) management: A systematic review. Int J Environ Res Public Health, 2023; 20(1):796; doi: 10.3390/ijerph20010796

20.

Sounderajah

, Ashrafian

, Rose

, et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med, 2021; 27(10):1663–1665; doi: 10.1038/s41591-021-01517-0

21.

Liu

, Hu

, Zhou

, et al. Application of deep learning to pressure injury staging. J Wound Care, 2024; 33(5):368–378; doi: 10.12968/jowc.2024.33.5.368

22.

Page

, McKenzie

, Bossuyt

, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Bmj, 2021; 372:n71; doi: 10.1136/bmj.n71

23.

Stroup

, Berlin

, Morton

, et al. Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA, 2000; 283(15):2008–2012; doi: 10.1001/jama.283.15.2008

24.

Sen

, DeMazumder

. Getting started on artificial intelligence in health care and clinical research: Includes Rigor Checklist for authors and reviewers. Adv Wound Care (New Rochelle), 2025; doi: 10.1177/21621918251380217

25.

Defloor

, Clark

, Witherow

, et al.; European Pressure Ulcer Advisory Panel. EPUAP statement on prevalence and incidence monitoring of pressure ulcer occurrence. J Tissue Viability, 2005; 15(3):20–27; doi: 10.1016/s0965-206x(05)53004-3

26.

Tescher

, Deppisch

, Munro

, et al. Perioperative pressure injury prevention: National Pressure Injury Advisory Panel root cause analysis toolkit 3.0. J Wound Care, 2022; 31(Sup12):S4–S9; doi: 10.12968/jowc.2022.31.Sup12.S4

27.

Wood

, Egger

, Gluud

, et al. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: Meta-epidemiological study. Bmj, 2008; 336(7644):601–605; doi: 10.1136/bmj.39465.451748.AD

28.

, Tasar

, Utlu

, et al. Deep transfer learning-based visual classification of pressure injuries stages. Neural Comput & Applic, 2022; 34(18):16157–16168; doi: 10.1007/s00521-022-07274-6

29.

Fergus

, Chalmers

, Henderson

, et al. Pressure ulcer categorization and reporting in domiciliary settings using deep learning and mobile devices: A clinical trial to evaluate end-to-end performance. Ieee Access, 2023; 11:65138–65152; doi: 10.1109/access.2023.3289839

30.

Ikuta

, Fukuoka

, Kimura

, et al. An ingenious deep learning approach for pressure injury depth evaluation with limited data. J Tissue Viability, 2024; 33(3):387–392; doi: 10.1016/j.jtv.2024.05.009

31.

Kim

, Lee

, Choi

, et al. Augmented decision-making in wound care: Evaluating the clinical utility of a deep-learning model for pressure injury staging. Int J Med Inform, 2023; 180:105266; doi: 10.1016/j.ijmedinf.2023.105266

32.

Kosmopoulos

, Tzevelekou

. Automated pressure ulcer lesion diagnosis for telemedicine systems. IEEE Eng Med Biol Mag, 2007; 26(5):18–22; doi: 10.1109/emb.2007.901786

33.

Lau

, Yu

KHO

, Yip

, et al. An artificial intelligence-enabled smartphone app for real-time pressure injury assessment. Front Med Technol, 2022; 4:905074; doi: 10.3389/fmedt.2022.905074

34.

Liu

, Christian

, Chu

, et al. A pressure ulcers assessment system for diagnosis and decision making using convolutional neural networks. J Formos Med Assoc, 2022; 121(11):2227–2236; doi: 10.1016/j.jfma.2022.04.010

35.

Orciuoli

, Orciuoli

, Peduto

. A mobile clinical DSS based on augmented reality and deep learning for the home cares of patients afflicted by bedsores. Procedia Comput Sci, 2020; 175:181–188; doi: 10.1016/j.procs.2020.07.028

36.

Sakakibara

, Takekawa

, et al. Construction and validation of an image discrimination algorithm to discriminate necrosis from wounds in pressure ulcers. J Clin Med, 2023; 12(6):2194; doi: 10.3390/jcm12062194

37.

Seo

, Kang

, Eom

, et al. Visual classification of pressure injury stages for nurses: A deep learning model applying modern convolutional neural networks. J Adv Nurs, 2023; 79(8):3047–3056; doi: 10.1111/jan.15584

38.

Shiraishi

, Kanayama

, Kurita

, et al. Performance of artificial intelligence chatbots in interpreting clinical images of pressure injuries. Wound Repair Regen, 2024; 32(5):652–654; doi: 10.1111/wrr.13189

39.

Swerdlow

, Guler

, Yaakov

, et al. Simultaneous Segmentation and Classification of Pressure Injury Image Data Using Mask-R-CNN. Comput Math Methods Med, 2023; 2023:3858997; doi: 10.1155/2023/3858997

40.

Wang

, Jiang

, Yu

, et al. Infrared thermal images classification for pressure injury prevention incorporating the convolutional neural networks. Ieee Access, 2021; 9:15181–15190; doi: 10.1109/access.2021.3051095

41.

Wang

, Guo

, Zhong

, et al. A novel deep-learning based weighted feature fusion architecture for precise classification of pressure injury. Front Physiol, 2024; 15:1304829; doi: 10.3389/fphys.2024.1304829

42.

Yilmaz

, Atagün

, Demırcan

FÖ

, et al. Classification of pressure ulcer images with logistic regression. IEEE International Symposium, 2021; pp. 1–6; doi: 10.1109/INISTA52262.2021.9548585

43.

Zalluhoglu

, Akdogan

, Karakaya

, et al. Region-based semi-two-stream convolutional neural networks for pressure ulcer recognition. J Imaging Inform Med, 2024; 37(2):801–813; doi: 10.1007/s10278-023-00960-4

44.

Kandula

. Impact of multifaceted interventions on pressure injury prevention: A systematic review. BMC Nurs, 2025; 24(1):11; doi: 10.1186/s12912-024-02558-9

45.

Karadaǧ

, Çakar

, Demir

. An inter-assessor reliability study on the categorization and staging of pressure injuries. J Tissue Viability, 2024; 33(4):786–791; doi: 10.1016/j.jtv.2024.09.009

46.

Jiang

, Wang

, et al. Application of an infrared thermography-based model to detect pressure injuries: A prospective cohort study. Br J Dermatol, 2022; 187(4):571–579; doi: 10.1111/bjd.21665

47.

Toffaha

, Simsekler

MCE

, Omar

. Leveraging artificial intelligence and decision support systems in hospital-acquired pressure injuries prediction: A comprehensive review. Artif Intell Med, 2023; 141:102560; doi: 10.1016/j.artmed.2023.102560

48.

Zahia

, Zapirain

MBG

, Sevillano

, et al. Pressure injury image analysis with machine learning techniques: A systematic review on previous and possible future methods. Artif Intell Med, 2020; 102:101742; doi: 10.1016/j.artmed.2019.101742

49.

Gao

, Jiao

, Feng

, et al. Application of artificial intelligence in diagnosis of osteoporosis using medical images: A systematic review and meta-analysis. Osteoporos Int, 2021; 32(7):1279–1286; doi: 10.1007/s00198-021-05887-6

50.

Jiang

, Ma

, Guo

, et al. Using machine learning technologies in pressure injury management: Systematic review. JMIR Med Inform, 2021; 9(3):e25704; doi: 10.2196/25704

51.

Mesa

, Veredas

, Morente

. A hybrid approach for tissue recognition on wound images. International Conference on Hybrid Intelligent Systems (HIS), 2008; pp. 120–125; doi: 10.1109/HIS.2008.33

52.

Fulbrook

, Lovegrove

. Reporting accuracy of pressure injury categorisation in an acute tertiary hospital: A four-year analysis. J Clin Nurs, 2023; 32(17–18):6403–6414; doi: 10.1111/jocn.16662

53.

Lan

, Li

, Chen

. FusionSegNet: Fusing global foot features and local wound features to diagnose diabetic foot. Comput Biol Med, 2023; 152:106456; doi: 10.1016/j.compbiomed.2022.106456

54.

Zaratkiewicz

, Goetcheus

, Vance

. Unstageable pressure injuries: Identification, treatment, and outcomes among critical care patients. Crit Care Nurs Clin North Am, 2020; 32(4):543–561; doi: 10.1016/j.cnc.2020.08.005

55.

Kayser

, VanGilder

, Ayello

, et al. Prevalence and analysis of medical device-related pressure injuries: Results from the International Pressure Ulcer Prevalence Survey. Adv Skin Wound Care, 2018; 31(6):276–285; doi: 10.1097/01.ASW.0000532475.11971.aa

56.

Saidi

, Dasarathy

, Berisha

. Unraveling overoptimism and publication bias in ML-driven science. Patterns (N Y), 2025; 6(4):101185; doi: 10.1016/j.patter.2025.101185

57.

, Guo

, Li

, et al. Deep learning and machine learning in CT-based COPD diagnosis: Systematic review and meta-analysis. Int J Med Inform, 2025; 196:105812; doi: 10.1016/j.ijmedinf.2025.105812

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.06 MB