Artificial Intelligence Techniques for Prognostic and Diagnostic Assessments in Peripheral Artery Disease: A Scoping Review

Abstract

Peripheral artery disease (PAD) is a major public health concern worldwide, associated with high risk of mortality and morbidity related to cardiovascular and adverse limb events. Despite significant advances in both medical and interventional therapies, PAD often remains under-diagnosed, and the prognosis of patients can be difficult to predict. Artificial intelligence (AI) has brought a wide range of opportunities to improve the management of cardiovascular diseases, from advanced imaging analysis to machine-learning (ML)-based predictive models, and medical data management using natural language processing (NLP). The aim of this review is to summarize and discuss current techniques based on AI that have been proposed for the diagnosis and the evaluation of the prognosis in patients with PAD. The review focused on clinical studies that proposed AI-methods for the detection and the classification of PAD as well as studies that used AI-models to predict outcomes of patients. Through evaluation of study design, we discuss model choices including variability in dataset inputs, model complexity, interpretability, and challenges linked to performance metrics used. In the light of the results, we discuss potential interest for clinical decision support and highlight future directions for research and clinical practice.

Keywords

peripheral artery disease (PAD)artificial intelligence (AI)machine learning (ML)diagnosis prognosis

Introduction

Cardiovascular diseases persist as a leading cause of global mortality worldwide, contributing to a significant global burden of disease and associated risks.¹ Among them, Peripheral Artery Disease (PAD) affects over 236 million individuals worldwide.² PAD is characterized by the narrowing or obstruction of arteries, predominantly in the lower limbs. Atherosclerosis serves as the primary cause, leading to a chronic inflammatory process that adversely affects the condition of vessels.³ Main risk factors contributing to the development of the disease include cardiovascular risk factors such as age, smoking status, hypertension, hyperlipidemia, and diabetes status.⁴ In addition, genetic predispositions as well as lifestyle factors have also been identified as contributing risk factors.⁵ Although main risk factors have been identified, PAD often remains underdiagnosed and patients may not always benefit from the best medical treatment.^6,7

Artificial Intelligence (AI) has brought new insights in cardiovascular diseases, offering new techniques for data management and analysis that may help to support decision making.^8,9 The former is related to the ability to use AI techniques to manage, integrate, and reveal valuable evidence from medical records, clinical, biological or imaging information labeled as “big data.”¹⁰ The latter relates to the use of AI to develop new methods to detect, diagnose or classify vascular lesions, or to predict an event or an outcome. While AI has brought perspectives of applications in patients with PAD,^8,9 it is necessary to analyze current strengths and limitations regarding study design, datasets and methods used to test the performances of the models. Understanding the main principles of developed methods and techniques is essential to establish trust and transparency between computer scientists and physicians and to enable the development of robust models capable of significantly impacting patients’ care. In this scoping review, we summarize current knowledge on AI-based methods that have been used for the diagnosis and the prognosis in patients with PAD. We discuss technical aspects of the developed models along with their impact on the interpretability of the results and highlight future directions in the field.

Methods

Search Strategy

A literature search was conducted using PubMed and Google Scholar in January 2024 to identify studies reporting the use of AI for the diagnosis or the prognosis in patients with PAD. The Prisma flow chart details the article selection process (Figure 1). We searched for articles published between the 1st of January 1990 and the 31st of December 2023. The following queries were used in PubMed for literature search: ((((((“machine learning” OR “artificial intelligence” OR “deep learning” OR machine algorithm*) AND (“peripheral vascular disease” OR “peripheral artery disease” OR PAD OR PVD OR “peripheral arterial disease”)) NOT (coronary)) NOT (murine)) NOT (rat)) NOT (mouse)) NOT (pig). We also queried Google Scholar all support with similar queries (allintitle: peripheral artery disease Model OR prediction—coronary—murine—rat—mouse—pig). The literature search was completed by analyzing three related medical reviews.^8,9,11

Figure 1.

PRISMA flow chart diagram for selection of articles.

Eligibility Criteria

Inclusion criteria were studies that reported applications of AI for the diagnosis and/or the prognosis supported by patient data only, therefore excluding animal-based studies. Only original articles written in English language were included. Literature reviews related to the topic were checked to identify and add relevant original articles fitting with the eligibility criteria. Case reports, editorials, letters, and original articles investigating other vascular diseases (i.e., coronary, aortic, carotid, and neurovascular diseases) were excluded. After titles and abstracts were identified and retrieved, full texts were checked. Content-related additional criteria were applied and incomplete articles regarding the model training and AI methods without any technical details were also excluded.

Data Extraction and Analysis

Data were extracted from the selected articles and stored in an Excel spreadsheet. Data extracted included general information (title, journal, year of publication, authors, country) and study characteristics (aim, study design and dataset, AI model, results). The aim of the study is classified into two main categories: diagnosis and/or prognosis. Diagnosis refers to the process of detecting, confirming, characterizing, or classifying PAD. Prognosis refers to the prediction of outcomes related to PAD. The study design and dataset are composed of the characteristics describing the type of data but also the methodology used for collection. The type of data was classified into five classes: demographics (e.g., age, sex), clinical (e.g., smoking status, cardiovascular comorbidities, ankle-brachial index (ABI), Critical Limb Threatening Ischemia (CLTI) status, claudication status), imaging (e.g., Computed Tomography Angiography, ultrasound), genetics/biomarker/laboratory measurements (e.g., inflammatory proteomic markers), and other (e.g., motion data such as walking speed, decreased step length, etc.). The methodology used for data collection included the sample size with the number of subjects, the data collection period, the centers involved (monocentric or multicentric), and the design (retrospective or prospective). The AI model refers to the algorithm used, training specification, methods for validation, number of predictors in the model, rationale for selecting the predictors and performance metrics. Due to heterogeneity of aim and study design among the studies, a scoping review was performed and we conducted a descriptive analysis.

Results

Literature Search and Study Characteristics

Among the 560 citations identified, 28 original articles met the inclusion criteria and were finally included in the analysis. It included 15 AI models focusing on diagnosis, 13 on prognosis, and 1 on both topics. Among them, 12 studies were performed using monocentric databases and 16 using multicentric databases.

AI Models for the Diagnosis of PAD

Detection of PAD

One of the challenges of PAD is undiagnosed cases whereas an early diagnosis would allow early treatment and intervention to limit complications and adverse cardiovascular events. The vast majority of the studies focused on the identification of PAD patients. Table 1 summarizes the main aim of the studies, methodology, and datasets used. From a methodological point a view, the diagnosis of PAD can be considered as a binary classification (i.e., patient with PAD or not) and several AI-models have been proposed to enable this task.

Table 1.

Summary of Main Studies Reporting the Use of AI-Techniques for the Diagnosis of Patients with PAD.

Aim	PAD characteristics		Dataset	Methods	Main results	Sources
PAD detection	Fontaine’s-based Binary classification	Stable PAD (sPAD, Fontaine Class I and II), unstable PAD (unPAD, Fontaine Class III and IV)	Monocentric—clinical and demographics—56 patients	– RF, LR, stepwise regression model (optimization on Accuracy) – Scaling the severity of the PAD between 0 and 100 (cutoff = 50).	AI-score (RF model) is comparable to the ABI for PAD patient identification.	Sonnenschein et al. (2021)
	ABI-based PAD binary classification	ABI <0.90 in either limb on the date of first ABI measurement.	Multicentric—clinical and demographics—8093 patients	– Features correlation to ABI (with RF model)– LR model of the PAD status (maximizing PPV)– ROC-AUC– External validation set (NHANES)	The LR model had an ROC-AUC of 0.68 (internal validation—threshold of 0.53) and an ROC-AUC of 0.72 (external validation—threshold 0.55).	Sonderman et al. (2023)
		PAD was defined as an ABI <0.9 from MVP code mABI	Multicentric (Millions Veteran Program and UK biobank)—clinical, demographics, genomics—243,060 patients from MVP and 394,408 from UK biobank	– Two-phased GWAS to determine the PAD risk– NLP to extract ABI– Adjusted LR (incl. age and sex) to find loci.	18 new PAD-related loci were discovered.	Klarin et al. (2019)
	PAD binary classification	Combination of ABI (defined as ≤0.9 or >1.4)/atherosclerotic stenosis or occlusion/prior peripheral revascularization or nontraumatic amputation	Multicentric—clinical and demographics—6861 patients	– NLP modeling + LASSO LR (10-fold CV– Performance metric ( ROC-AUC and PRC)– Delong-test comparison of ROC-AUCs.	NLP classifier overperformed the LASSO LR	Weissler et al. (2020)
		PAD was defined as an ABI <0.9 at rest or 1 min after exercise; or ABI >1.40 or ankle systolic blood pressure >255 mm Hg)	Multicentric—clinical and demographics—1569 patients	– NLP (MedTagger model)– Performance metrics: accuracy, PPV, specificity	The NLP classifier algorithm has been optimized on the accuracy (91.8%) and present high PPV (92,9%) and specificity (92.5%)	Afzal (2017)
		Clinicians’ primary and secondary diagnosis and surgical codes existing in the UK biobank.	Multicentric (UK biobank)—clinical and demographics and genomics—502,536 patients	– PRS selection derived from the PAD GWAS (Klarin et al.)– GLM model– 90:10 (training:test) split + (10-folds CV)	The PRS can discriminate risk of PAD and other cardiovascular diseases (ROC AUC of 0.76)	Wang et al. (2022)
		Based on the medical records.	Monocentric—clinical, demographics and walking Motion—270 patients	– 75:25 (training:test) split.– Random Forest, SVM, and LR, and ANN– Performance metrics: Accuracy and Matthew’s correlation coefficient	Neural Networks or Random Forest algorithms with 89% accuracy (0.64 Matthew’s Correlation Coefficient) using all laboratory-based gait variables.	Al-Ramini et al. (2022)
		Defined as ≥50% luminal obstruction in ≥1 peripheral arteria	Multicentric (CASABLANCA trial)—imaging and clinical and demographics—354 patients	– LASSO LR– 80:20 (training:test) split: Monte Carlo CV– optimal cutoff for the diagnosis score is 5.546 (optimized on Youden index)	A diagnosis score composed of one clinical variable and six biomarkers with an ROC-AUC of 0.85.	McCarthy et al. (2018)
		Based on the medical records.	Multicentric—clinical and demographics—20,031 patients	– NLP from Duval et al. to extract the features.– LR, RF, RNN models to predict the PAD status	RNN utilizing time-engineered features outperformed RF and LR models (average AUCs 0.96, 0.91, and 0.81, respectively)	Ghanzouri et al. (2022)
	DUS-based PAD binary classification	Based on the Point-of-care DUS.	Monocentric—imaging, clinical and demographics—305 patients	– LSTM network for row signal classification– SVM and LR– 90:10 (training:test) split.	Doppler arterial spectral waveforms based model to discriminate PAD patient with ROC-AUC of 0.93	Normahani et al. (2022)
	PAD multiple classification	Critical limb ischemia (CLI), severe PAD, moderate PAD and no PAD.	Monocentric—imaging and clinical and demographics—703 patients	– RF, NN, GLM– Sensitivities, specificities, PPV, NPV, ROC-AUC	No strong relationship between 6MWT distance and PAD severity (ROC-AUC = 0.69); comorbidities affect 6MWT distance.	Qutrio Baloch et al. (2020)
PAD characterization	PAD stenosis classification	The discretization of a percentage of lumen stenosis (5-group)	Monocentric—imaging and clinical and demographics—265 patients	– p-EffNet model– 89.5:9.5 (training:test) split	The p-EffNet ROC-AUC of above 0.98 for both below and above knee classifier.	Dai et al. (2021)
Binary claudication classfication	Symptom of PAD (claudication or none).	Monocentric—clinical and demographics and walking Motion—120 patients	– Models CatBoost, ET, XGBoost, GBC, LightGBM, RF– 80:20 (training:test) split 10-fold CV	Detection of claudication with an accuracy of 92.25% with XGBoost with gait feature	Pinto et al. (2023)	No-claudication, PAD, lumber spinal canal stenosis affected the vertebra L5 or the L4)
	Monocentric—clinical and demographics and Walking Motion—46 patients	– Leave-one-out cross-validation– SVM, RF	Discriminant classifier total accuracy of 83%.	Watanabe et al. (2014)	Stenosis location classification	Subclassification into normal, aortoiliac disease, femoral-popliteal arterial disease, and trifurcation
Monocentric—imaging and clinical and demographics—5761 patients	– CNN (MobilNets) to subclassify three forms of waveforms– HNN, SVM, RF to subclassify the three PAD locations	HNN outperformed SVM, RF on classifying task.	Ara et al. (2019)

Abbreviations: ABI, ankle-brachial index; ANN, artificial neural network; CNN, convolutional neural network; CTA, computed tomography angiography; DUS, duplex ultrasound; ET, extra trees classifier; GBC, gradient boosting classifier; GLM, generalized linear model; GWAS, genome-wide association study; HNN, hierarchical neural networks; LASSO, least absolute shrinkage and selection operator; LightGBM, light gradient boosting machine; LR, logistic regression; LSTM, long short-term m emory (recurrent neural network); MVP, million veteran program; NLP, natural language processing; OR, odds ratio; PA, peripheral artery; PAD, peripheral artery disease; PR, precision-recall curve; PRS, polygenic risk score; RF, random forest classifier; RNN, reccurent neural network; SVM, support vector machine; UK, United Kingdom; XGBoost, extreme gradient boosting.

Using a Natural Language Processing (NLP) algorithm for the feature extraction out of the Electronic Health Records (EHR), Weissler et al.¹² managed to reach an Area Under the Curve of the Receiver Operating Characteristic (ROC-AUC) of 0.88 with a Least Absolute Shrinkage and Selection Operator (LASSO) regression for the prediction. Other investigators also optimized an open-source NLP algorithm called MedTagger with text processing components and a PAD classifier and the model reached an accuracy of 91.8% with a high positive predictive value (PPV) of 92.9% and a specificity of 92.5%.¹³ Sonderman et al.¹⁴ achieved a more modest ROC-AUC of 0.72 by using a logistic regression model. On the other hand, another study reported a ROC-AUC of 0.85 with logistic regression on the training set.¹⁵ While all three studies are based on a multicentric dataset, only one study provided results on an external testing dataset while optimizing the PPV value instead of the ROC-AUC.¹⁴ The generalization capabilities of the other models thus remain to be investigated. Finally, despite the lack of external validation, Ghanzouri et al.¹⁶ applied a Residual Neural Network (RNN) and obtained a ROC-AUC of 0.96 outperforming both linear regression (0.81) and the random forest (0.91).

While all the above studies aimed to develop AI-models for PAD detection, a variability in the criteria used to define or diagnose PAD is observed. PAD diagnosis mainly refers to a binary classification problem to assess, meaning whether the patient has PAD or not. However, most of the datasets used were retrospective and the criteria used to discriminate PAD patients were heterogeneous. Some relied on the reported ABI alone,^5,14 whereas others used a combination of the ABI, presence of significant atherosclerotic stenosis or occlusion, prior peripheral revascularization, or non-traumatic amputation.¹² Some retrospective studies directly relied on PAD diagnosis confirmation from medical records or patient’s history,¹⁷ or used corresponding standardized coding systems identifying PAD patients.¹⁸ Finally, some investigators relied on imaging data and used Computed-Tomography Angiography (CTA) to classify patients into two groups, with and without obstructive PAD.¹⁵ In this study, the binary classification was based on the measurement of the lumen arterial stenosis (≥50% luminal obstruction in ≥1 peripheral artery).

In addition to develop AI-models to identify PAD patients, a last approach aimed at investigating molecular and biological patterns from patients with PAD.^5,15 Such an approach may help to better understand factors contributing to PAD development and could potentially lead to the identification of relevant diagnostic biomarkers. As an example, Klarin et al.⁵ identified 19 genomic loci implied in the raising of the PAD risk and atherosclerosis in general. Using LASSO regression, McCarthy et al.¹⁵ developed a diagnosis score composed of one clinical variable and six proteomic biomarkers including molecules linked to inflammation (interleukin-23, eotaxin-1), vessel structure stability (angiopoietin-1) and fibrinolysis (midkine).

The most frequent Machine learning (ML) algorithms used were Linear Regression (LR) and Random Forest (RF) which are explainable and favor transparency.¹⁹ RF models are also more suitable for categorical variables since they handle it better than LR. Another approach relies on deep learning methods, which reached comparable or even better results than the LR and RF but poses challenges regarding the interpretability of models.^13,16

Classification of PAD

While the detection of PAD is a first step in the diagnosis, PAD characterization is necessary to assess the severity and stage of the disease. Clinical stages, calcium scoring index and the degree of stenosis are most often used to characterize and classify PAD lesions. Some investigators developed an AI-algorithm to detect claudication with an accuracy of 92.25%.²⁰ Claudication, typically associated with PAD, is also a symptom of lumbar spinal canal stenosis. A recent study categorized claudication symptoms into five groups based on the underlying condition: absence of claudication, PAD, and lumbar spinal canal stenosis affecting either vertebra L5 or L4.²¹ Finally, Dai et al.²² designed five stages of PAD stenosis by discretizing the lumen obstruction stenosis ratio and built a combined Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) called p-EffNet that exhibited a good performance of 91.5% accuracy and a ROC-AUC of 0.987 in classifying above-knee artery and 90.9% accuracy and a ROC-AUC of 0.981 in classifying below-knee artery. The location of vascular lesions has also been analyzed and subclassified into four states (i.e., no-PAD, aortoiliac disease, femoral-popliteal arterial disease, and below-knee lesions).²³ In this study, the CNN model reached an F1-score for each class between 0.85 and 0.98.²³ Despite providing innovative and new characterization, those papers present class-balancing sensitive performance metrics (i.e., accuracy).

AI-Methods for the Prognosis of PAD

Several studies investigated the use of AI-models to predict the likely course and outcomes of patients with PAD. Table 2 summarizes the predicted outcomes, study population, modeling methods, and main results of the PAD prognosis publications.

Table 2.

Summary of Main Studies Reporting the Use of AI-Techniques for the Prognosis of Patients with PAD.

Aim	Predicted outcome		Study population	Methods	Main results	Sources
Mortality	Short-term mortality	30-day procedure-related mortality prediction.	Multicentric—clinical and demographics—14,444 patients	– Missing data (Optimal imputation)– Features selection with mRMR– SVM, RF, XGBoost, MLP modeling	RF outperformed others: an ROC-AUC of 0.75, accuracy of 0.87 sensitivity of 0.77 and specificity of 0.68 (test set)	Cox et al. (2022)
	Long-term mortality	Binary risk mortality (median split)	Monocentric (GenePAD)—clinical, genetics and demographics—1755 patients	– 10-fold CV– Stepwise LR and RF modeling	RF models predicting future mortality (AUC 0.76)	Ross et al. (2016)
	Long-term mortality, MALE, MACE	Mortality and MALE and MACE at 5 years.	Multicentric—imaging and clinical and demographics—10,437 patients	– Binary classifier DNN and 1-dimensional CNN with ReLU as activation function– (60; 20; 20) training;validation;test split—100 epochs– CR to estimate the HR with log-rank test and C-index to measure discrimination	Doppler waveform based model provided an independent prediction of death (HR = 2.45), MACE (HR = 1.98) and MALE (HR = 11.65) at 5 years with similar results at 1 year.	McBane et al. (2023)
Major bleeding event	Major Bleeding events = an intracranial and/or gastrointestinal bleeding event and/or hemorrhagic stroke) after 1 year of hospitalization is determined/4 risk groups predictions		Multicentric (Insurance Fund—BARMER)—clinical and demographics—81,930 patients	– CR of the four risk groups with C-index to discriminate of the score (OAC3-PAD Risk Score)– 60:40 (training/validation) split	A risk score composed of eight independent predictors (oral anticoagulation, age >80 years, Chronic Limb threatening ischemia, congestve heart failure, chronic kidney disease, Prior Bleeding events, Anemia, Dementia) informs on the risk and benefits of antithrombotic strategies	Behrendt et al. (2022)
CLTI	CLTI at 1 year = A composite of survival with remission of ischemic rest pain, wound healing, and freedom of MALE without recurrent CLTI.		Multicentric (PREVENT III clinical trial)—clinical and demographics—1238 patients	– Topic modeling NLP (supervised latent Dirichlet)	Three distinct stages of 1-year CLTI-free survival were identified with a respective survival rates of 82.3%, 61.1%, and 53.4%.	Chung et al. (2022)
AFS	Short-term AFS	AFS at 30 days post infrainguinal endovascular intervention	Multicentric—clinical and demographics—14,444 patients	– RF– Model interpretation with Gini importance	Most important features are claudication, wound infection, white blood cell count, and albumin (RF ROC-AUC of 0.81)	Cox et al. (2022)
	Long-term AFS	AFS = A prognosis at 5 years.	Multicentric (Insurance Fund—BARMER)—clinical and demographics—81,930 patients	– 10 most important features selected with LASSO– 4:1 (training/validation) split– 10-folds CV– CR model (5 equal sized risks groups)	GermanVascScore accuracy is c-statistic = 0.79 being stratified by CLTI	Kreutzburg et al. (2021)
		The postoperative AFS at 1/3/5-year was predicted.	Monocentric—clinical and demographics—2130 patients	– Features selection with LASSO– 8:2 (training/validation) split– LR, GBoost, RF, XGBoost, ANN, CR, RSF	RSF algorithm had best performance with 1/3/5-year ROC-AUCs 0.741, 0.768, 0.836 (validation set) and GermanVasc Score (c-index 0.788 vs 0.730).	Liu et al. (2023)
AFS	Long-term AFS	AFS at 6 months, Mortality, reintervention	Multicentric—clinical and demographics—88,898 patients	– LR, RF– Performance metric LR; McFadden pseudo-R2	For AFS at 6 months, we see age 85+ and urban residing as predictors and for death and reintervention. both models (LR and RF) indicate age is most predictive.	Austin et al. (2022)
ICLU	Ischaemic Chronic Leg Ulcers		Multicentric—clinical and demographics—80 patients	– PredyCLU model based on the fuzzy system logic.	ICLU risk	Serra et al. (2020)
MALE	MALE includes critical limb ischemia and/or limb amputation at 2 years.		Monocentric—imaging, clinical and demographics—392 patients	– OR computed with a univariate LR for feature selection– ANN and LR models– ROC-AUC with Delong test	The ANN predicted the MALE at 2 years with (ROC AUC = 0.80) a sensitivity of 0.62 and a specificity of 0.90.	Pan et al. (2022)
Life years, quality-adjusted life years, costs of healthcare	Estimate of the healthcare costs per life year.		Multicentric (VIVA trial)—clinical and demographics—50,000 patients	– Markov decision model	Screening for PAD is clinically effective and cost-effective over a lifetime. Low dose of rivaroxaban could decrease costs per life year gained by 40%.	Lindholt and Søgaard (2021)
Appropriate PAD treatment	Definition of clinical stenosis criteria based on the TASC classification to identify patients that are more suitable for drug treatment or surgery		Monocentric—clinical and demographics—186 patients	– Manual selection of 16 clinical features (clinician decision)– 10-folds CV– Model RBFNN– ROC-AUC, accuracy, sensitivity (recall), specificity, PPV, NPV, F-score, and Yuden Index	An effective assessment tool for femoral peripheral arterial disease treatment.	Yurtkuran et al. (2013)

Abbreviations: AFS, amputation free survival; ANN, artificial neural network; CLTI, chronic limb-threatening ischemia; CNN, convolution neural network; CR, Cox regression; CV, cross validation; DNN, deep neural network; F-score, F1-score performance metric; HR, Hazard ratio; ICLU, ischaemic chronic leg ulcers; iCLU, ischaemic chronic leg ulcer; LASSO, least absolute shrinkage and selection operator; LR, logistic regression; MACE, major adverse cardiac event; MALE, major adverse limb event; MLP, multilayer perceptrons; mRMR, minimum redundancy maximum relevance; OAC3-PAD, oral-anticoagulation, age, chronic limb threatening ischaemia, congestive heart failure, chronic kidney disease—peripheral artery disease; NLP, natural language processing; OR, odds ratio; PAD, peripheral artery disease; PredyCLU, prediction model of chronic leg ulcer; RBFNN, Radial Basis Function Neural Network; ReLU, Rectified Linear Unit; RF, random forest; ROC-AUC, area under the curve; RSF, random survival forest; SSI, surgical site infection; SVM, support vector machine; SVM, support vector machines; XGBoost, eXtreme gradient boosting.

Mortality Risk Prediction

Cox et al.²⁴ elaborated models based on clinical and demographic data to determine mortality risk 30 days after lower extremity endovascular revascularization procedure. The resulting random forest model achieves a ROC-AUC of 0.75, accuracy of 0.87 sensitivity of 0.77, and specificity of 0.68 on the test set.

Other studies aimed to predict mortality after longer follow-up. Ross et al.²⁵ designed a model to predict the risk of mortality based on the GenePAD study that lasted 8 years. A binarization of a time-to-event outcome implies a definition of a decision threshold to generate the two groups. This threshold must be defined before the modeling to ensure a rigorous method and also needs to be justified to avoid a bias due to the balancing of the outcome. In this case, the dataset was divided into two groups using the median survival time as the split point which is the common practice. The median follow-up time for included patients was 5.3 years with a total of 129 patients with a death event and 918 controls. Patient tracking was facilitated through regular reviews of EHR and check-up phone calls. The random forest model demonstrated superior performance with an ROC-AUC of 0.76 compared with stepwise regression. Analysis of feature importance identified age, serum glucose level, serum creatinine, systolic blood pressure, and ABI as the five most crucial variables for predicting mortality.

Limb Adverse Events

PAD is associated with risks of amputation with a major impact on patients’ quality of life. Some authors aimed to predict amputation free survival (AFS) at 30 days after lower extremity infra-inguinal endovascular intervention.²⁴ The random forest model achieved a ROC-AUC of 0.81 on the test set to distinguish high and low AFS. The risk of amputation was associated with elective vascular surgery designation meaning scheduled in advance without emergency constraint, claudication, open wound/wound infection, white blood cell count, and albumin level. On longer follow-up, another study aimed to predict the AFS at 5 years and used the database obtained from the second largest insurance fund in Germany.²⁶ The AI-model, called the GermanVascScore, developed on a dataset randomly stratified by the intermittent claudication variable to homogenize the training set, reached an accuracy score c-statistics of 0.70.²⁶ Some investigators focused on the prediction of major adverse limb events (MALE) using clinical, and demographic predictors and imaging data.²⁷ The Artificial neural network achieved a ROC-AUC of 0.80 on the validation set to predict MALE at 2 years. MALE can also include reintervention events (i.e., limb revascularization) and acute and chronic limb ischemia.²⁸

Beside the risk of amputation, PAD is associated with the risk of ischemic chronic limb ulcer (ICLU). Stratification of the risk of ICLU might help to improve early detection and care. Some investigators developed a model based on fuzzy logic concept, called “PredyCLU,” to determine risk groups (low, medium and high) of ICLU in patients with PAD.²⁹

Major Bleeding Events

The recent publication of the Oral-anticoagulation, Age, Chronic limb threatening ischaemia, Congestive heart failure, Chronic kidney disease (OAC³)-PAD bleeding risk score has brought new insights to evaluate this outcome in patients with PAD.³⁰ The score was developed using data from Germany’s second largest health insurance fund and included 81,930 patients.³⁰ Using LASSO, the authors selected eight predictor features and classified the risk of 1-year major bleeding into four risk groups. The OAC³-PAD risk score exhibited adequate calibration and discrimination between four risk groups (Harrell’s c-index = 0.69, 95% CI 0.67–0.71).³⁰ Interestingly, the score was then externally validated in another prospective German cohort and the model discrimination was adequate (Harrell’s c-index = 0.61, 95% CI 0.43–0.80).³¹ Additional external validation using the French National Health data corroborated these findings. Using a 10-year retrospective cohort including 161,205 patients who underwent PAD repair in French public and private health institutions, the OAC³-PAD model achieved an AUC of 0.650 to predict 1-year major bleeding (95% CI 0.645–0.655), with a sensitivity of 0.67 and a specificity of 0.57.³² These results suggest the robustness of the model and its potential for further generalization of its use to guide the management of patients with PAD.

Critical Limb Threatening Ischemia (CLTI)

CLTI is a composite endpoint associated with mortality, risk of amputation, and impaired quality of life.⁶ The Society for Vascular Surgery elaborated the Wound, Ischemia, and Foot Infection (WIfI) score with a classification system.³³ Chung et al.³⁴ aimed to elaborate a ML-based risk stratification scheme using the WIfI score. The methodology followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines allowing a standardization of the reporting process. Furthermore, this study also introduced the use of Topic models (NLP methods) to cluster the patients into K groups defined by extracted features from the clinical documents. The data were analyzed from K = 2 to K = 9. The authors state K = 3 based on the possibility by a clinician to use features to assess a patient’s group and they observed that the a-posteriori probability obtained by a Bayesian model selection was the highest.

Clinical Decision Support

In addition to predict patient’s outcomes related to the disease or to post-operative complications, ML has also been proposed to support clinical decisions and therapeutic decision making. Some investigators designed a model to identify patients who are more suitable for medical treatment or surgery.³⁵ The model Radial Basis Function Neural Network (RBFNN) combined pre-selected features by human experts (cardiologists, surgeons, and anesthetists) and reported performance metrics, as well as the confusion matrix,³⁶ examined the economic and life quality effectiveness of PAD screening and associated interventions in men during the early stages. They demonstrate the importance of an early screening of the PAD for both patient outcomes (success rate of revascularization, major bleeding risk reduction) and also the impact on the cost of life year gain. To compare the medical protocols and treatment benefits, they applied a decision model such as the Markov Decision model widely used to estimate the clinical and economic effect of a treatment or to compare treatment procedures. It requires the outcome target to be split into exclusive states; in the article, they split it into no/detected/undetected PAD which are the Markov states. However the modeling comes with limits regarding the states; it assumes that a series of sequential phases (e.g., the patient can go from no PAD, to undetected PAD and to Detected) and a nonrestricted transition between all the states with a known probability, an individual without PAD may receive a diagnosis (transitioning to a detected PAD state). If the condition is successfully treated (with a certain probability or rate of success), there is a possibility of reverting to the “no PAD” state.

Discussion

This section delves into comprehending and analyzing publication outcomes, spotlighting implicit assumptions and methodological constraints. It covers the key points of the AI application process by following the stages: (i) data and outcome preparation (ii) modeling, and, (iii) reporting.

Data and Outcome Preparation

Input Data

Dataset Availability

Data availability directly affects the choice of AI-model development and validation. The datasets including routinely acquired clinical and demographic data related to patient care are the most commonly available. Since the features are collected as a standard of care, large datasets such as UK-Biobank and Million Veteran Program (MVP) with more than 240,000 patients could be used. While imaging data (mainly Doppler ultrasound, Computed Tomography Arteriography) are commonly performed in practice, they require more constraints to be collected and extracted from the storage service (e.g., Picture Archiving and Communication System—PACS). There are two main drawbacks around the imaging data, the first one is the need of large storage capacities to store all the images and the second one is linked to the need of annotation in order to be used in the modeling (i.e., tissues and vascular lesions have to be segmented on the image). Annotating data involves outlining the regions of interest within an image, a process that demands a considerable amount of time and consequently restricts the size of the final annotated dataset. The challenge is tackled by employing and developing automatic segmentation methods relying on CNN for example, to aid in the time-consuming task. Further variables, such as gait motion and genomic data, can be available, but they are mainly performed in research settings and not collected in routine clinical settings. Including new data sources allows the detection of new predictors (e.g., such as genomic loci) or offers new trackers of the disease evolution (e.g., phone application tracking the gait motion). To enhance their broader utilization, affordable costs and standardization are necessary, as it currently results in relatively small final datasets in the literature.^5,20 The more the data are heterogeneous and aggregated, the more the dataset size is small. In addition, complex eligibility criteria in the cohort selection can restrict the model’s clinical usefulness due to the challenges in real-world data acquisition.

Structured Versus Unstructured Data

EHR is a digital version of the paper-based medical chart. It concentrates all the patient-centered records in real-time in a text format. With digitalization, the number of (unstructured) data increased tremendously and are not arranged according to a model or target. On the opposite, the structured dataset (fewer compared with the unstructured ones) requires hand-crafting to be organized which is time-consuming and costly.

Big Data Versus Good Data

The training dataset size affects the model’s robustness, necessitating the training sample size calculation to estimate the number of data points to include. The calculation relies on statistical methodologies and takes into account the prediction modeling method, the number of features, the target balancing in cases of categorical variables, and the desired predictive performance often attached to clinical minimal requirements.³⁷ However, a big dataset size does not automatically mean good performance as the data quality is also essential. Especially in structured data with a lot of handcrafting, the different manipulation, the incomplete reporting and wrong data inputs may lead to bias induced by the dataset.

Variability in the Ground Truth

Partial Definition

Various PAD classifications have been reported.³ Among them, Rutherford and Fontaine classifications consist of respectively six and four grades. While Fontaine’s classification relies only on clinical symptoms, Rutherford also incorporates non-invasive clinical symptom measures like ankle blood pressure after treadmill tests.³ Since the observed PAD classification papers rely mainly on retrospective studies, heterogeneity is found regarding the definitions used for PAD diagnosis and classification. Some studies relied on unique criteria such as the ABI measurement only;^14,38 this sub-optimal definition of the disease may include as ground truth false negative patients since its value is affected by artery stiffness due to calcification that will be associated with a higher ABI score.³⁹ Other papers proposed to use a CTA imaging based definition of the PAD by using the ratio of the lumen in the arteries but this requires image acquisition.¹⁵ The variability of definitions illustrates the difficulty of identifying PAD patients in research and for care purposes.

Clinical Relevance

The binarization of a clinical outcome in the prognosis makes a lot of sense from a computational scientist’s point of view (e.g., the risk of amputation 30 days after an endovascular intervention²⁴ since patients are categorized into low and high-risk subcohorts). However, the clinical use and the actual usefulness for decision support are more unclear and difficult to evaluate. The models do not currently provide clear meaningful information on the therapeutic decision to choose and patient management such as recommendations on the frequency of visits for follow-up. This highlights one of the main current limitations of AI in healthcare which is linked to the practical use of the model.

Composite Endpoints

Finally, the composite survival risk scores derived from time-to-event datasets are frequently contingent on the specific dataset used. These composite endpints often incorporate multiple parameters and demand careful consideration when comparing conclusions across articles, as they may vary from one study to another. It adds a new obstacle to the generalization capabilities of a model.

Modelling Choice

Predictors Discovery with NLP in Unstructured Data

The use of deep learning especially NLP allows the highlighting of predictors.^12,13,23,34 The NLP offers the possibility to deal with a really large dataset and perform a feature selection by definition.¹¹ Among the NLP algorithms, the topic models are one example. It can be used to cluster patients into a predefined number of K-patient risk groups by identifying topics that co-occur together. The topic models can be supervised by criteria according to the latent Dirichlet allocation model, allowing supervising the formation of the clusters.⁴⁰ Chung et al.³⁴ used the 1-year CLTI-free survival outcome to supervise the model and managed to distinguish three clusters of patients corresponding to three stages of CLTI risk groups.

Feature Selection

This can be performed by using data-driven ML techniques where the most relevant features will be considered as covariates for the outcomes. It can also be performed before the model training during the univariate analysis where the relationship between the outcome and potential covariates is analyzed (e.g., Spearman correlation) or during the model training itself with penalization process (e.g., LASSO) or directly with the model itself (e.g., RF, RNN). Using these techniques revealed covariates but lacked expert knowledge, such as understanding the dependencies between the covariates, which constitutes knowledge-driven analysis. In addition, some studies also added human expertise for feature selection. This is for example the case of the studies published by Sonderman et al.¹⁴ that included vascular specialists in the features selection to model the PAD diagnosis model and by Afzal¹³ who requested medical expertise to guide the NLP algorithm. Collaborating with experts enables a meticulous review of the data, revealing subtle connections between features, and/or fostering trust in the modeling process.

Fuzzy System Approach

The fuzzy inference system is a form of artificial intelligence and is suitable for modeling complex systems by integrating expert knowledge as logic-based rules in the model decision process.

Explainability and Interpretability Over Complexity

Interpretability focuses on giving insights of the inner mechanism of models while explainability focuses on detailing the decision scheme. Despite more performant but complex models, the choice of interpretable models such as random forest, logistic regression, and Rule-based learner¹³ for PAD diagnosis encourages community trust. Among the prognosis papers, the OAC³-PAD risk score from Behrendt et al.³⁰ is also built on a Cox regression model and then reformulated into a counting risk with only eight covariates to build the score for Major bleeding event risk.

Performance Evaluation

Performance Metrics Choice

Misleading Measurements

Heterogeneity was observed between studies regarding metrics used to report the performances of the AI-models, making very challenging to perform any comparison between the studies. Table 3 summarizes the metrics reported and presents their advantages and limitations, respectively. When diagnosing with categorical endpoints, the variability of performance metrics can be significant due to factors like randomness, data size, perturbations, and class imbalances. Too many publications rely on unique metrics reporting without considering the potential pitfalls. The best example is the reporting of the accuracy only which can be misleading in the case of unbalanced datasets. A general approach would be to report the confusion matrix providing complementary information about the overall performance of the models and allowing the computation of informative but class balance dependent metrics such as the sensitivity, the specificity, the Positive Predictive Value (PPV) and the Negative Predictive Value (NPV). Only very few studies reported all these metrics together.³⁵ Some metrics are less sensitive such as the area under the receiver operating characteristic curve (ROC-AUC), and the area under the precision-recall curve (PR-AUC), F1 measure, balanced accuracy, and the C-index score for the time-to-event prognosis are more suitable for the performance assessments.

Table 3.

Summary of Most Frequent Performance Metrics of AI-Techniques for Diagnosis and Prognosis of Patients with PAD.

Metrics	Objectives	Advantages	Limitations
ROC-AUC	Trade-off between the TP rate and the FP rate under different probability thresholds.	– Global measurement– Not sensible to the threshold’s definition for binary classes data splitting.	– ROC-AUC is an aggregated classification performance index that does not reflect the performance at a specific threshold.– Misleading if the class proportion of the outcomes is unbalanced.
Accuracy	Ratio of the number of the true prediction (TP + TN) on all the samples (TP + TN + FP + FN).	ExplainabilityPerformance index with a specific classification threshold.	Misleading if the class proportion of the outcomes is unbalanced.
Sensitivity	Ability to determine the patient cases correctly.Ratio of the True positive prediction (TP) on the all the positive case (TP + FN)
Specificity	Ability to determine the healthy cases correctly.Ratio of the True negative prediction (TN) on the all the negative case (TN + FP)
PPV	Ratio of the number of true positive prediction (TP) on the positive sample (TP + FP)
NPV	Ratio of the number of true negative predictions (TN) on the negative sample (TN + FN)
PR-AUC	Trade-off between the precision (PPV) and the recall (TP rate) under different probability thresholds.	Good with unbalanced dataset with high cost on the FN or FP (positive class is rare)	Misleading if the class proportion of the outcomes is extremely unbalanced (1/100 ratio)
(Harrel) C-index	The proportion of all usable patient pairs in which the predictions and outcomes are concordant.Interpretation similar to the ROC-AUC	Global measurement	/
F1-score	Concatenate the Precision and the recall by comparing the TP to the errors (FN + FP)	Good with unbalanced dataset with high cost on the FN or FP (positive class is rare)	/
Matthew’s correlation coefficient	Measure the difference between the predicted values and actual values	Summarize perfectly the confusion matrix	/

Abbreviations: C-index, concordance-index; FN, false negative; FP, false positive; HR, Hazard ratio; NPV, negative predictive value; OR, odds ratio; PPV, positive predictive value; PR-AUC, precision recall curve—area under curve; ROC-AUC, receiver operating characteristic curve—area under the curve; TN, true negative; TP, true positive.

Compare and Select a Model

The Cross-Validation

The Cross-Validation (CV) approach is a prevalent method in ML, featured in many publications reviewed especially the ML model based.^15,18,26,41 It involves dividing data into training and validation sets. CV is employed to assess a model’s performance, allowing for the evaluation and comparison of algorithm accuracy on a portion of the data not used during training but originating from the same dataset referred to as the validation set.⁴² The k-fold CV is used in most of the publications however the initial data splitting and the number of folds in the training is various. Nevertheless, it’s essential not to distinguish the objectives of CV with robustness, which is characterized by a model’s proficiency in performing well on external and unseen data, typically referred to as the test set. Robustness is more commonly linked to a model’s generalization capabilities, and its effectiveness in dealing with noise, variations, and uncertainties within the data. Despite the utilization of K-fold CV, certain studies^42,43 address the overestimation introduced when employing K-fold CV, also known as “record-wise CV,” in comparison to leave-subject-out CV, or “subject-wise CV.” The concept revolves around the notion that “record-wise CV” establishes a dependency between the training and validation sets, leading to an overestimation of predictions on the validation set and an underestimation of prediction errors. It also suggests that elevated performance metrics do not necessarily indicate the model’s robustness.

Model Comparability

Based on the principle that performance on a dedicated dataset doesn’t mean the same results on another one, the use of an external validation dataset (i.e., a testing set) is crucial to assess the model’s reproducibility and comparability. Among the selected articles, only four articles reported external validation results. In practice, external validation remains challenging due to constraints linked to data protection, regulation, governance and availability (e.g., genomic or gait data).

Perspectives

Limitations

A limitation of the study is linked to research boundaries related to searchable keywords used that may impact on article selections. Only articles written in English language were checked, potentially selecting publications from western countries and reducing studies from other countries if not written in English. Nevertheless, we used a combination of relevant keywords related to AI and PAD to provide an updated literature overview of technical aspects and AI-methods used for the diagnosis and prognosis of patients with PAD. Based on these results, the growing volume of AI-related publications in healthcare underscores crucial aspects of ML and AI: (i) the need of data enrichment for constructing robust models and earning trust within the scientific community and, (ii) the importance of establishing frameworks and guidelines in reporting to delineate boundaries in the reporting domain.

Database Enrichment

The quantity of data points, referring to the number of patients in a study, serves as the lifeblood of AI. In addition, the type of data, their quality and accuracy in reporting is a critical point. In the field of vascular surgery, national claim databases have been built in many European and Western countries and offer the advantage to allow access to large cohorts of patients. However, the access to specific information can sometimes be limited depending on the availability of coding system used and therefore do not always contain all the relevant clinical features. National and regional vascular registries are also very useful and can provide meaningful information on outcomes following vascular interventions. Efforts to generate real world evidence have pointed to the need to build collaboration between centers and countries to collect and analyze relevant, reliable and representative health data. In this context, international registry collaborations have been created and will undoubtedly play a major role to help developing AI-tools in patients with PAD.^10,44,45 Among others, the VASCUNET committee endorsed by the European Society of Vascular Surgery (ESVS), the European Vascular Research Collaborative (EVRC) and the European Research Hub (ERH) aim to facilitate high quality research in vascular disease throughout Europe and internationally. The Society for Vascular Surgery (SVS) Vascular Quality Initiative (VQI) is very active mainly in the United States and Canada, and is also involved to develop collaboration with ESVS-VASCUNET and the Medical Device Epidemiology Network (MDEpiNet) through the International Consortium of Vascular Registries (ICVR). The development of international multicenter studies and databases is a cornerstone for the development of efficient AI-applications that also bring major challenges regarding data protection, security and governance. Federated learning (FL) has been proposed as an alternative to ensure respect of patient privacy and confidentiality and solve issues related to data sharing, governance and ownership.⁴⁶ FL allows training algorithms collaboratively without exchanging the data themselves.⁴⁶ Each participating center trains ML using their own data in parallel and the model is then be sent to a central server that will aggregate the results into a consensus model. The consensus model is then sent to other collaborative centers to be further trained and tested to assess. Such approach might be useful in the future to develop AI-models in patients with PAD.

Frameworks and Guidelines

In recent years, several frameworks have been developed to enhance readability and establish boundaries in reporting. One such framework is the updated Consolidated Framework for Implementation Research (CFIR).⁴⁷ While serving as a generalist guideline, it offers concepts that facilitate the proper evaluation of results. This framework also offers an extensive vocabulary for describing the data, methods, and results of a research application. Additionally, more specific frameworks tailored to AI or AI extensions to existing guidelines have been proposed and might help to improve and standardize methodology used to develop and evaluate AI-applications.⁴⁸ Among studies investigated in this scoping review, only a few reported the use of guidelines and included: the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE),³⁴ the Prediction model Risk Of Bias ASsessment Tool (PROBAST)^30,49 and the MINimum Information for Medical AI Reporting (MINIMAR).⁵⁰ This highlights the real need to strengthen the methodology in future studies reporting AI-tools for the diagnosis or the prognosis in patients with PAD.

Conclusion

AI offers perspectives to enhance the diagnosis of patients with PAD through innovative applications enabling the detection and/or the classification of the disease. Various ML-models have also shown encouraging results to evaluate the prognosis of patients with PAD, allowing to predict various complications (mortality, limb adverse events, major bleedings, CLTI) and develop support for decision making. However, many technical and methodological challenges remain before implementation in clinical practice from data preparation, to modelling choice and methods used to evaluate the performances of the models. Current efforts to develop large multicenter international databases and the development of AI-frameworks and guidelines would help in the future to build safe, robust, efficient and clinically relevant applications to improve the management and care for patients with PAD.

Footnotes

Abbreviations

ABI, Ankle Brachial Index; AFS, amputation free survival; AI, artificial intelligence; AUC, area under the curve; CFIR, consolidated framework for implementation research; CLTI, critical limb threatening ischemia; CNN, convolutional neural network; COMPASS, cardiovascular outcomes for people using anticoagulation strategies; CR, cox regression; CTA, computed-tomography angiography; CV, cross validation; EHR, electronic health record; ERH, European Research Hub; ESVS, European Society of Vascular Surgery; EVRC, European Vascular Research Collaborative; FDA, Food and Drug Administration; FL, federated learning; HR, Hazard ratio; ICLU, ischemic chronic limb ulcer; ICVR, International Consortium of Vascular Registries; LASSO, least absolute shrinkage and selection operator; LR, linear regression; MDEpiNet, Medical Device Epidemiology Network; ML, machine learning; MVP, Million Veteran Program; NLP, natural language processing; NPV, negative predictive value; OAC³-PAD, oral-anticoagulation, age, chronic limb threatening ischaemia, congestive heart failure, chronic kidney disease—peripheral artery disease; OR, odds ratio; PACS, picture archiving and communication system; PAD, peripheral artery disease; PLAN, Patient risk, Limb severity, and ANatomic complexity; PPV, positive predictive value; PREVENT, Project of Ex Vivo Vein Graft Engineering via Transfection; RBFNN, Radial Basis Function Neural Network; RF, random forest; RNN, residual neural network; ROC, receiver operating characteristic; SSI, surgical site infection; STROBE, strengthening the reporting of observational studies in epidemiology; SVS, Society of Vascular Surgery; VQI, vascular quality initiative; WIFI, Society for Vascular Surgery Wound, Ischemia, and Foot Infection.

Author Contributions

Sébastien Goffart, Juliette Raffort, Hervé Delingette and Fabien Lareyre designed and drafted the manuscript. The rest of the authors participated in analysis and interpretation of data and revising it critically for important intellectual content. All authors approved the final version of the manuscript and agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the French government through the National Research Agency (ANR) with the reference number ANR-22-CE45-0023-01 and through 3IA Côte d’Azur Investments in the Future project, managed with reference number ANR-19-P3IA-0002. It was also supported through the HORIZON-HLTH2022-101080947-VASCUL-AID.

ORCID iD

Fabien Lareyre

References

Roth

Mensah

Fuster

The global burden of cardiovascular diseases and risks. J Am Coll Cardiol. 2020;76(25):2980-1.

Song

Rudan

Zhu

, et al. Global, regional, and national prevalence and risk factors for peripheral artery disease in 2015: an updated systematic review and analysis. Lancet Glob Health. 2019;7(8):e1020-30.

Hardman

Jazaeri

, et al. Overview of classification systems in peripheral artery disease. Semin Interv Radiol. 2014;31(4):378-88.

Meijer

Grobbee

Hunink

, et al. Determinants of peripheral arterial disease in the elderly: the Rotterdam study. Arch Intern Med. 2000;160(19):2934-8.

Klarin

Lynch

Aragam

, et al. Genome-wide association study of peripheral artery disease in the Million Veteran Program. Nat Med. 2019;25(8):1274-9.

Nordanstig

Behrendt

C-A

Baumgartner

, et al. Editor’s choice—European Society for Vascular Surgery (ESVS) 2024 clinical practice guidelines on the management of asymptomatic lower limb peripheral arterial disease and intermittent claudication. Eur J Vasc Endovasc Surg Off J Eur Soc Vasc Surg. 2024;67(1):9-96.

Aboyans

Ricco

J-B

Bartelink

M-LEL

, et al. Editor’s choice—2017 ESC guidelines on the diagnosis and treatment of peripheral arterial diseases, in collaboration with the European Society for Vascular Surgery (ESVS). Eur J Vasc Endovasc Surg. 2018;55(3):305-68.

Raffort

Adam

Carrier

, et al. Fundamentals in artificial intelligence for vascular surgeons. Ann Vasc Surg. 2020;65:254-60.

Flores

Demsas

Leeper

, et al. Leveraging machine learning and artificial intelligence to improve peripheral artery disease detection, treatment, and outcomes. Circ Res. 2021;128(12):1833-50.

10.

Lareyre

Behrendt

C-A

Chaudhuri

, et al. Big Data and artificial intelligence in vascular surgery: time for multidisciplinary cross-border collaboration. Angiology. 2022;73(8):697-700.

11.

Lareyre

Behrendt

C-A

Chaudhuri

, et al. Applications of artificial intelligence for patients with peripheral artery disease. J Vasc Surg. 2023;77(2):650-58.e1.

12.

Weissler

Zhang

Lippmann

, et al. Use of natural language processing to improve identification of patients with peripheral artery disease. Circ Cardiovasc Interv. 2020;13(10): e009447.

13.

Afzal

Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. J Vasc Surg. 2017;65(6):1753-61

14.

Sonderman

Aday

Farber-Eger

, et al. Identifying patients with peripheral artery disease using the electronic health record: a pragmatic approach. JACC Adv. 2023;2(7):100566.

15.

McCarthy

Ibrahim

van Kimmenade

RRJ

, et al. A clinical and proteomics approach to predict the presence of obstructive peripheral arterial disease: from the Catheter Sampled Blood Archive in Cardiovascular Diseases (CASABLANCA) study. Clin Cardiol. 2018;41(7):903-9.

16.

Ghanzouri

Amal

, et al. Performance and usability testing of an automated tool for detection of peripheral artery disease using electronic health records. Sci Rep. 2022;12(1):13364.

17.

Al-Ramini

Hassan

Fallahtafti

, et al. Machine learning-based peripheral artery disease identification using laboratory-based gait data. Sensors. 2022;22(19):7432.

18.

Wang

Ghanzouri

Leeper

, et al. Development of a polygenic risk score to improve detection of peripheral artery disease. Vasc Med. 2022;27(3):219-27.

19.

Duval

Explainable Artificial Intelligence (XAI). 2019. doi:10.13140/RG.2.2.24722.09929

20.

Pinto

Correia

Paredes

, et al. Detection of intermittent claudication from smartphone inertial data in community walks using machine learning classifiers. Sensors. 2023;23(3):1581.

21.

Watanabe

Yoneyama

Hayashi

, et al. Identification of the causative disease of intermittent claudication through walking motion analysis: feature analysis and differentiation. Sci World J. 2014;1:861529.

22.

Dai

Zhou

, et al. Deep learning-based classification of lower extremity arterial stenosis in computed tomography angiography. Eur J Radiol. 2021;136:109528.

23.

Ara

Luo

Sawchuk

, et al. Automate the peripheral arterial disease prediction in lower extremity arterial doppler study using machine learning and neural networks. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Niagara Falls, NY, USA: ACM Press, p. 130-5.

24.

Cox

Panagides

Tabari

, et al. Risk stratification with explainable machine learning for 30-day procedure-related mortality and 30-day unplanned readmission in patients with peripheral arterial disease. PLoS One. 2022;17(11):e0277507.

25.

Ross

Jung

Dudley

, et al. Predicting future cardiovascular events in patients with peripheral artery disease using electronic health record data. Circ Cardiovasc Qual Outcomes. 2019;12(3):e004741.

26.

Kreutzburg

Peters

Kuchenbecker

, et al. Editor’s choice—the GermanVasc Score: a pragmatic risk score predicts five year amputation free survival in patients with peripheral arterial occlusive disease. Eur J Vasc Endovasc Surg. 2021;61(2):248-56.

27.

Pan

Jiang

Liu

, et al. Prediction of 2-year major adverse limb event-free survival after percutaneous transluminal angioplasty and stenting for lower limb atherosclerosis obliterans: a machine learning-based study. Front Cardiovasc Med. 2022;9:783336.

28.

McBane

Murphree

Liedl

, et al. Artificial intelligence of arterial Doppler waveforms to predict major adverse outcomes among patients evaluated for peripheral artery disease. J Am Heart Assoc. 2024;13(3):e031880

29.

Serra

Bracale

Barbetta

, et al. PredyCLU: a prediction system for chronic leg ulcers based on fuzzy logic; part II—exploring the arterial side. Int Wound J. 2020;17(4):987-91.

30.

Behrendt

C-A

Kreutzburg

Nordanstig

, et al. The OAC3-PAD risk score predicts major bleeding events one year after hospitalisation for peripheral artery disease. Eur J Vasc Endovasc Surg. 2022;63(3):503-10.

31.

Peters

Behrendt

C-A.

External validation of the OAC3-PAD risk score to predict major bleeding events using the prospective GermanVasc Cohort Study. Eur J Vasc Endovasc Surg. 2022;64(4):429-30.

32.

Lareyre

Behrendt

C-A

Pradier

, et al. Nationwide study in France to predict one year major bleeding and validate the OAC3-PAD score in patients undergoing revascularisation for lower extremity arterial disease. Eur J Vasc Endovasc Surg. 2023;66(2):213-9.

33.

Mills

Conte

Armstrong

, et al. The society for vascular surgery lower extremity threatened limb classification system: risk stratification based on Wound, Ischemia, and foot Infection (WIfI). J Vasc Surg. 2014;59(1):220-34.e2.

34.

Chung

Freeman

NLB

Kosorok

, et al. Analysis of a machine learning-based risk stratification scheme for chronic limb-threatening ischemia. JAMA Netw Open. 2022;5(3): e223424.

35.

Yurtkuran

Tok

Emel

A clinical decision support system for femoral peripheral arterial disease treatment. Comput Math Methods Med. 2013;1:898041.

36.

Lindholt

Søgaard

Clinical benefit, harm, and cost effectiveness of screening men for peripheral artery disease: a Markov model based on the VIVA trial. Eur J Vasc Endovasc Surg. 2021;61(6):971-9.

37.

de Hond

AAH

Leeuwenberg

Hooft

, et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit Med. 2022;5(1):2.

38.

Aboyans

Denenberg

, et al. The association between elevated ankle systolic pressures and peripheral occlusive arterial disease in diabetic and nondiabetic subjects. J Vasc Surg. 2008;48(5):1197-203.

39.

Gupta

Skali

Claggett

, et al. Heart failure risk across the spectrum of Ankle-Brachial Index: the atherosclerosis risk in communities study. JACC Heart Fail. 2014;2(5):447-54.

40.

Mcauliffe

Blei

Supervised topic models. In:Advances in neural information processing systems. Curran Associates. [cited 2024 Oct 28] Available from: https://papers.nips.cc/paper_files/paper/2007/hash/d56b9fc4b0f1be8871f5e1c40c0067e7-Abstract.html

41.

Ross

Shah

Dalman

, et al. The use of machine learning for the identification of peripheral artery disease and future mortality risk. J Vasc Surg. 2016;64(5):1515-22.e3.

42.

Saeb

Lonini

Jayaraman

, et al. The need to approximate the use-case in clinical machine learning. GigaScience. 2017;6(5):1-9.

43.

Little

Varoquaux

Saeb

, et al. Using and understanding cross-validation strategies. Perspectives on Saeb et al. GigaScience. 2017;6(5):1-6.

44.

Kakkos

Antoniou

Hinchliffe

, et al. European research hub: European Society for Vascular Surgery research initiative has materialised. Eur J Vasc Endovasc Surg Off J Eur Soc Vasc Surg. 2024;67(3):367-9.

45.

Lareyre

Raffort

Mind the gap between clinical trials and real world data: learning from patients revascularised for peripheral artery disease in France. Eur J Vasc Endovasc Surg. 2024;67(6):979

46.

Rieke

Hancox

, et al. The future of digital health with federated learning. NPJ Digit Med. 2020;3:119.

47.

Damschroder

Reardon

Widerquist

MAO

, et al. The updated consolidated framework for implementation research based on user feedback. Implement Sci. 2022;17(1):75.

48.

Lareyre

Wanhainen

Raffort

. Artificial intelligence-powered technologies for the management of vascular diseases: building guidelines and moving forward evidence generation. J Endovasc Ther. 2025;32(3):514-44.

49.

Wolff

Moons

KGM

Riley

, et al. PROBAST: A tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51-8.

50.

Hernandez-Boussard

Bozkurt

Ioannidis

JPA

, et al. MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care. J Am Med Inform Assoc. 2020;27(12):2011-5.