Abstract
Background:
The automated assessment and prediction of diabetic foot ulcer (DFU) severity depends heavily on precise segmentation of the ulcer region. This approach avoided reliance on built-in segmentation tools, which often lacked the accuracy needed to delineate wound boundaries effectively. The objective of this study was to develop and evaluate an artificial intelligence (AI)-driven method for ulcer segmentation and severity classification of DFU using Wagner’s grading system.
Methods:
A novel method was introduced for segmenting the boundaries of DFUs, paired with a lightweight classification model for predicting ulcer severity as per Wagner’s grade. This method was developed using a retrospective cohort of patients in India. A total of 1339 ulcer images were collected from 510 patients and augmented to 6579 images for AI-model generalizability. It incorporated an enhanced active contour model, combined with Sobel edge detection, to achieve precise delineation of ulcer edges. An AI-powered mobile application was developed to facilitate the real-time and remote assessment of the severity of DFUs.
Results:
The proposed segmentation approach successfully delineated ulcer regions, achieving a Dice similarity coefficient of 0.99. The classification model attained an accuracy of 95.58%, with a sensitivity of 95.58%, a specificity of 99.16%, and an F1 score of 95.53%. The method also recorded a false-positive rate of 0.84% and a false negative rate of 4.83%, reflecting improved classification performance compared to existing methods.
Conclusions:
The comparative analysis demonstrated that the proposed method significantly improved both segmentation and classification of DFUs, thereby supporting enhanced clinical management of the condition.
Keywords
Introduction
According to the International Diabetes Federation, approximately 463 million adults worldwide, equivalent to one in every eleven individuals, have diabetes. One of the most severe complications associated with diabetes is diabetic foot ulcer (DFU), which significantly impacts patients’ quality of life and increases the risk of lower limb amputation. 1 The DFUs often arise from poorly managed diabetes, leading to neuropathy, which can result in skin damage primarily on the plantar surface of the foot and beneath the halluxes, potentially exposing underlying tissues and increasing the risk of bone damage. The lifetime risk of developing a DFU among people with diabetes ranged from 19% to 34%. 2 The pathophysiology of DFUs involves a combination of neuropathy, trauma, and, frequently, peripheral arterial disease. Recent advancements in computer vision algorithms have enhanced medical imaging applications for detecting lesions, such as DFUs. 3 Automated detection, segmentation, and classification of DFUs remained critical areas of research. Wagner’s grading system was commonly employed to categorize DFUs by severity, ranging from intact skin but foot at risk (Grade 0) to extensive gangrene affecting the whole foot (Grade 5).4,5 Accurate diagnosis of DFUs requires a detailed medical history, thorough foot examination, and supporting diagnostic tests, including imaging. This study aimed to improve the automated segmentation of ulcer boundaries and the classification of ulcer severity grades using image processing and deep-learning techniques, aligning with Wagner’s grading system. 6 In addition, this classification model was deployed in a smartphone environment for early screening and prompt treatment actions and plans.
This article is organized as follows: section “ Introduction” introduces the study; section “Literature Survey” provides a comprehensive survey of prior studies; section “Data Set Description” describes the data set; section “Materials” outlines the materials, including data preprocessing, augmentation, image rescale; section “Methods” explains the methods employed, including wound region estimation, segmentation techniques, and severity classification model; section “Results and Analysis” presents the experimental results along with detailed analysis; section “Discussion” provides a comparative discussion of the present study with prior works; and section “Conclusion” concludes this article, and proposing directions for future research.
Literature Survey
The literature review has been structured into three subsections for clarity: (1) segmentation-only methods, (2) classification-only methods, and (3) hybrid pipelines.
Segmentation-Only Methods
Several studies have addressed DFU segmentation using classical and deep-learning–based approaches.
Niri et al 7 introduced a superpixel-based convolutional neural network (CNN) architecture using U-Net, achieving 92.68 % accuracy and a Dice score of 75.74% on a data set of 219 images by employing (Simple Linear Iterative Clustering) SLIC superpixels and morphological operations. Heras-Tang et al 8 developed a hybrid segmentation algorithm combining logistic regression, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and morphological operations, which resulted in 94% accuracy, 86% sensitivity, and 91% precision, but was applied to approximately 70 images without external validation. Wang et al 9 proposed Improved Self-Adaptive Gradient Vector Flow (ISAGVF) as an enhancement to the Gradient Vector Flow (GVF) Snake model, improving edge preservation and deep concavity convergence. Chen et al 10 introduced an Active Contour Model based on Jeffreys divergence and K-medoids clustering, achieving segmentation efficiencies of 94.5% (Fuzzy C-means [FCM]), 91.5% (K-means), and 95.7% (K-medoids), although applicability was limited to double-phase images. Peng et al 11 developed a 2D OTSU multi-threshold segmentation algorithm using an optimized genetic algorithm that improved peak signal-to-noise ratio (PSNR) by 13.3768 and reduced segmentation time. Canales et al 12 presented GA11, an automated algorithm that selected optimal segmentation parameters using a genetic algorithm, achieving a Generalized Cross Entropy (GCE) of 0.0240, outperforming Principal Component Analys (PCA’s) 0.0185. EU-Net 13 combined a U-Net architecture with a differential evolutionary algorithm, improving segmentation accuracy for liver (95.9 %), left kidney (90.0 %), right kidney (86.8 %), and spleen (87.6 %). Wang et al 14 created a MobileNetV2-based framework for segmenting wound regions, achieving a Dice score of 90.4% on an augmented data set of 5000 images generated from 1109 originals but did not include severity assessment.
Classification-Only Methods
Alzubaidi et al 15 applied 18-fold augmentation to a 754-image data set for binary classification (normal vs DFU), achieving 95.8% accuracy, but without segmentation or severity grading. Al-Garaawi et al 16 developed a CNN-based classification method using Mapped Binary Patterns from RGB images, reporting an area under the curve (AUC) of 98.1% and F1-score of 95.2%. Toofanee et al 17 emphasized rapid DFU intervention and proposed a Siamese Neural Network combining CNN and Vision Transformers, achieving 95% accuracy for four-category ulcer classification.
Hybrid Segmentation-And-Classification Pipelines
In Motta et al, 18 K-means clustering followed by an artificial neural network yielded an error rate of less than 8%, integrating segmentation with downstream classification. Rajathi et al 19 designed the DUTC-Net for DFU classification, attaining 97.9% sensitivity and 97.5% accuracy in stage classification.
Limitations of Prior Studies
Although prior research has advanced the analysis of DFUs, these studies remain constrained by several limitations:
Few studies integrate segmentation and ulcer grading within a unified framework or validate models under varied image acquisition conditions. As a result, most approaches remain fragmented and less applicable to real-world clinical workflows.
Data Set Description
This retrospective randomized observational study was conducted among resident Indian patients with type 2 diabetes at hospitals in Kolkata, India, from 2012 to 2021. A large data set was used to train and test the classification model.
Study Population
Our ulcer wound data set consisted of 1339 images obtained from 510 outpatient visits at ILS Hospital, Kolkata, West Bengal, which functions as a referral center serving a broad catchment area across eastern India and included patients not only from Kolkata and its suburban areas (eg, Howrah, Barasat, Madhyamgram, Salt Lake, New Town, Durgapur, Burdwan) but also from neighboring states such as Bihar, Jharkhand, and Tripura. This demonstrates that the data set reflects a geographically and demographically diverse population across eastern India, rather than being restricted to a single urban catchment area. The images were captured using a Samsung Galaxy S II (GT-I9100) smartphone, released in 2011, equipped with an 8-megapixel rear camera (f/2.6, 1/3.2” sensor) capable of 1080p resolution. The device employed a Super AMOLED Plus display with autofocus and LED flash support. During acquisition, a flexigrid was placed adjacent to the ulcer to capture wound dimensions (height and breadth). For each visit, two images were captured: one before debridement and one immediately after. Images were stored in JPEG format at the device’s native resolution (3264 × 2448 pixels). Image acquisition steps involved:
All patients underwent initial assessment using a standardized foot proforma.
Ulcer images were captured before and after debridement and at each follow-up visit to monitor progression.
A flexigrid was used during imaging to ensure accurate scaling of ulcer dimensions.
Images were taken in the medical examination room or at the bedside for severe cases, following a consistent protocol.
Infection status was assessed according to the Infectious Diseases Society of America (IDSA) classification, which categorizes infections as mild, moderate, and severe. Deep-tissue cultures were taken from all patients, and a negative culture result was interpreted as confirmation of infection control. Ischemia was assessed in every patient using the Ankle-Brachial Index (ABI), where values between 0.90 and 1.40 were considered normal, 0.89 and 0.70 as mild disease, 0.69 and 0.50 as moderate disease, and less than 0.50 as severe disease. Charcot foot was identified when the temperature difference between the affected foot and the contralateral foot was greater than 2°C, after other causes were excluded. The diagnosis was then confirmed radiographically by identifying characteristic bony changes on a foot X-ray. In this study, “healing at eight weeks” was defined as an area reduction of the ulcer region by more than 50%.
To ensure consistency in image quality and wound visibility, photographs were taken post-debridement in cases involving slough, and heavy dressings were removed before image acquisition. These measures were implemented to minimize variability and improve the reliability of the data set.
A clinical expert from ILS Hospital, Kolkata, India, annotated all ulcer images according to the Wagner grading system (grades 0-5). Given that this was a retrospective study using existing, de-identified data, formal ethical approval was not required, as per institutional guidelines. To maintain patient confidentiality, particularly with sensitive image data, we used pseudonymization. This method replaced all original patient identifiers with randomly generated, non-identifiable tokens. This approach allowed us to securely link data for analytical purposes (such as diagnosis or age) while ensuring that a patient’s true identity remained protected. The critical mapping between original identifiers and pseudonymized tokens is secured in a separate, highly restricted-access system that adheres to data protection protocols. To improve model generalization, we employed data augmentation techniques, such as cropping, rotation, and flipping, which increased the sample size to 6579. Table 1 outlines the severity grades along with the corresponding number of images for each grade.
Wagner’s classification of DFU images in different grades.
The inclusion criteria for participants in this study were as follows: (a) diagnosed with type 2 diabetes, (b) diagnosed with diabetic neuropathy or peripheral arterial disease, (c) having an active foot ulcer below the ankle, (d) aged between 16 and 80 years, and (e) capable of using a handheld device. The exclusion criteria included (a) individuals aged under 16 or over 80, (b) those with a total foot amputation, (c) patients with other types of peripheral neuropathy (such as those caused by alcohol abuse, familial conditions, or hypovitaminosis), and (d) poor-quality images, such as those that are blurry, too dark, or too bright. The data set comprises images with varying resolutions, spanning from 576 × 1040 to 4608 × 4608 pixels. Representative examples of these graded images are presented in Figure 1.

Annotation of ulcer wounds as per Wagner’s grades.
Study Population—Train, Test, and Validation Data Sets
The patient data set was randomly divided into training and testing subsets in an 80:20 ratio.
In addition, internal validation was conducted on 600 image samples, separate from the total of 6579 samples, to ensure the model’s generalizability. The internal cohort of 600 patients had no overlap with the training and test data sets, ie, the image samples differ in terms of geography and/or time. Similar inclusion and exclusion criteria were applied to this image data set.
Materials
Data Preprocessing
Data Augmentation
To overcome the challenge of limited image samples and reduce the risk of overfitting, data augmentation techniques were employed. Deep-learning models, especially convolutional neural networks (CNNs), require large volumes of training data to perform optimally. With a smaller data set, the CNN’s parameters were less likely to converge effectively, leading to overfitting. To address this, various augmentation strategies, including cropping, random rotations at 45, 90, 135, and 180 degrees, as well as horizontal and vertical flipping, were applied. As a result, the data set was expanded to a total of 6579 samples. These enhancements improved the model’s generalization ability by introducing greater variability into the training data.
Image Rescale
To improve efficiency, all images were resized to a uniform dimension of 300 × 300 pixels. This process decreased the computational load and ensured consistent input dimensions for subsequent stages, facilitating smoother processing.
Grayscale Conversion
After resizing, the images were converted to grayscale, as different tissues exhibited distinct grayscale values, allowing for targeted analysis of intensity variations that are critical for detecting ulcer regions.
Aims
We developed a multi-stage segmentation and classification model for DFUs aimed at improving early and accurate diagnosis based on Wagner’s grading system. The model included the following:
A novel segmentation approach was developed to detect and isolate the foot ulcer region by constructing a bounding box, which was subsequently used for classifying ulcer severity based on Wagner grades.
A carefully structured, multi-stage segmentation pipeline was designed, comprising (a) morphological operations for noise reduction and structural refinement, (b) an improved Active Contour Model (iACM) that enabled robust boundary fitting, even in complex wound geometries, (c) a Sobel Operator (SO) that enhanced gradient transitions and guided contour convergence, and (d) Contour Extraction (CE) for further refinement of the segmentation output.
A lightweight MobileNetV3-Small deep-learning model was employed to categorize DFUs across six Wagner grades, from early stage lesions (Grade 0) to advanced gangrene (Grade 5). This compact architecture was deliberately selected to support seamless integration into a smartphone application, aligning to enable accessible, app-based early screening in health care settings.
Validation was performed with an independent patient cohort with no overlap with the training data, ensuring the model’s performance reflected its robustness in real-world clinical scenarios.
Methods
Following image preprocessing, the ulcer region of interest (ROI) was detected using a combination of Canny edge detection, image binarization, and morphological operations. This multi-step approach enhanced structural features and isolated the wound area for precise segmentation. A lightweight classifier was then employed to assign Wagner grades to the segmented ulcers. As this study exclusively utilized ulcer images for segmentation and classification, no tabular clinical data or patient-level variables were incorporated into the modeling process. Consequently, missing data handling and Least Absolute Shrinkage and Selection Operator (LASSO)-based variable selection procedures were not applicable. The complete workflow, spanning preprocessing, wound segmentation, and severity classification are depicted in Figure 2.

Process overview of the proposed segmentation method and classification model of DFU.
Refinement of Ulcer Images
This section focuses on utilizing the Canny edge detector, image binarization, and morphological operations to identify the ROI or ulcerous wound regions for segmentation.
Canny Edge Detector
The Canny Edge Detector 20 is a multi-stage algorithm that effectively detects edges or potential boundaries of ulcers with a low error rate. It begins by applying a Gaussian filter to remove noise from the ulcer images, followed by non-maximum suppression that eliminates false edges of the ulcer regions. The Canny Edge Detector generates edge maps, which highlight the boundaries of the ulcer images, as shown in Figure 3.

Ulcer edge maps after applying the Canny edge detector.
Binarization
The edge maps generated by the Canny detector were transformed into binary masks using a threshold-based binarization process. This step enhanced the contrast between the ulcer (foreground) and the surrounding tissue (background), making it easier to isolate contour boundaries specific to the affected region. By leveraging pixel intensity gradients, the binarization step ensured smooth and distinct edge delineation. Pixels with intensities above the defined threshold of 80 were treated as strong edges and assigned a value of 255 (white), while those at or below the threshold were set to 0 (black). The resulting binary image clearly outlined the ulcer margins as bright contours against a dark backdrop, streamlining the segmentation process.
Morphological Operations
Once the binary mask was generated, morphological operations such as erosion and dilation were applied to remove noise from the binary ulcer images. Erosion smoothed and reduced the boundaries of the ulcer regions, while dilation enhanced them.
21
A closing operation was performed prior to opening to further refine the ulcer contours.
21
Let
Closing Operation
The closing operation, which consisted of dilation followed by erosion, was used to fill small holes and gaps within the foreground regions of the ulcer image. This process helped produce a more cohesive representation of the target features, thereby enhancing the overall quality of the binary image. The closing operation was defined in equation 3.
Opening Operation
The opening operation reduced noise while preserving the structural integrity of the ulcer regions. Erosion was first applied to eliminate small noise artifacts, although it also slightly shrank the ulcer boundaries. To compensate, dilation was subsequently used to restore the eroded ulcer regions without introducing additional noise. This combined process effectively removed thin lines while maintaining the shape and size of the significant ulcer areas in the binary image, as illustrated in Figure 4. The mathematical representation of the opening operation was provided in equation 4.

Ulcer edge maps after applying binarization and morphological operations.
Segmentation of Ulcer Region
After generating the edge maps of the ulcers, the active contour model (commonly known as the snake model) 22 was applied to segment the wound area. However, due to its reliance on the initial selection of edge points, the conventional snake model struggled to segment complex objects automatically. To overcome this limitation, the model was enhanced to improve its effectiveness.
Improved Active Contour Model
We enhanced the original snake model
23
using a controlled continuity spline framework, influenced by both image-derived and external constraint forces. Internal spline forces maintained piecewise smoothness, while image-based forces directed the snake toward key features such as edges and contours. The position of the snake was represented parametrically as
The term
In this context,
The radius
Sobel Operator
The SO 24 was applied within the region defined by the bounding box generated by the iACM to detect and delineate the prominent edges of the ulcer area. This technique significantly enhanced the visual clarity of the ulcer boundaries, thereby facilitating the extraction of precise contours. Two 3 × 3 convolution kernels were used for this process, as illustrated in equations 12 and 13.
with
Binary Mask Creation
At this stage, a refined binary mask had been generated from the outputs of the iACM and the SO. This step removed thin lines and irrelevant small objects, retaining only the ulcer region. The resulting binary mask is illustrated in Figure 5.

Binary mask of the wound region after applying iACM and SO.
Contour Extraction
From the refined binary mask, contours were extracted. Among the detected contours, the most significant one was selected, as it accurately represented the ulcer region. A bounding box was then generated around the isolated ulcer to ensure alignment with the image dimensions, as illustrated in Figure 6. The generated segmentation masks were systematically stored in structured folders, categorized by the severity grade of their corresponding original images. This ensured each mask was accurately linked to its clinically assigned severity grade, which then served as input for training the classification model, thereby eliminating the need for manual annotation of the segmentation masks.

Bounding box of the segmented ulcer region.
Classification of Ulcer Severity
Figure 7 illustrated the workflow of our proposed iACMSOCE method with the MobileNetV3-Small classifier. The MobileNetV3-Small classifier, which is pre-trained on ImageNet, is used to classify the severity of ulcer wounds. This variant was chosen over the MobileNetV3-Large due to its compact architecture, lower parameter count, and reduced number of floating-point operations. Its efficient design allows for faster inference and lower memory usage, making it well-suited for mobile applications. We have developed a mobile app that integrates this lightweight classifier, making it ideal for use in a smartphone environment. The mobile app is designed to be simple and user-friendly for both patients and health care providers, as shown in Figure 8. New users complete a quick registration, providing basic personal and medical details (name, email, username, password, gender, and age). The app calculates body mass index (BMI) and offers a notes section for additional health information. The interface is centered around a dashboard with a bottom navigation bar for key functions: Dashboard, Scan, History, and Profile. The dashboard provides a summary of the user’s activity and record completeness. The Scan feature allows users to either capture a new photo of an affected foot or upload an existing one. This image is then combined with the user’s health data to run the built-in prediction model for DFU detection and grading. The results, including ulcer grade and risk level, are presented clearly on an output screen. All results are saved in the History section, enabling users to track their condition over time and share updates with their doctors. The workflow is a straightforward progression: Registration or Login → Profile Setup → Dashboard → Image Capture/Upload → AI Prediction & Result → History Tracking. Users have the flexibility to skip medical data entry and add it later. This AI-powered mobile application was designed to support primary care physicians, community health workers, remote clinicians, and patients by providing real-time assessment of DFU severity using the Wagner grading system. Its primary utility lies in facilitating early detection of DFUs to enable timely intervention, guiding precise treatment planning based on severity classification, and enhancing remote monitoring for continuity of care, particularly in underserved areas. Furthermore, the application ensures secure, centralized access to patient images and records through cloud storage, thereby bridging the gap between advanced technical capabilities and practical clinical utility, especially in resource-limited or remote settings. The app interface displays ulcer severity outcomes based on inputs provided by the user, which include clinical features and images of foot ulcers. In the context of mobile deployment, images can be captured or uploaded from the mobile device and are transmitted to the cloud backend using secure, encrypted communication protocols (eg, HTTPS with TLS 1.2). To minimize risk, images are stored on the mobile device for the absolute minimum necessary time, ideally not at all, before deletion upon successful upload. For secure, centralized storage and data persistence, all patient data, including personal details and medical imagery, were maintained on a confidential cloud server. The app was deployed on a cloud hosting platform, allowing for real-time, remote DFU severity prediction through a lightweight classifier. This architecture enabled seamless and secure communication between the app and the server, making it accessible to clinicians, patients, and remote health care providers from any location.

Schematic representation of the workflow of the proposed iACMSOCE-based segmentation method and MobileNetV3-Small classifier.

Design pipeline of the AI-powered mobile app for DFU severity prediction.
Results and Analysis
All experiments were performed using Python on an Intel i5 processor equipped with 8 GB of RAM. A data set comprising 510 patients was collected, initially containing 1339 image samples. These were subsequently augmented to a total of 6579 samples. The data set was divided into train and test sets in an 80:20 ratio, yielding 5264 samples for training and 1315 for testing. The proposed method was validated using an independent cohort of 600 patients, with predictor variables consistent with those in the augmented data set (Table 2). To enhance the prominence of ulcer edges, one iteration of closing and two iterations of opening were applied using a 3 × 3 kernel. All ulcer images were resized to 300 × 300 pixels to improve generalization. The radius
Distribution of augmented ulcer images across Wagner’s grades in the augmented data set.
Edge detection was performed using the SO with a
The area of overlap indicated the intersection between the predicted and ground-truth masks, where a Dice Coefficient of 1 represented perfect alignment and 0 denoted no overlap. A total of 300 images were analyzed, with 50 images for each of Wagner’s grades (0-5). The computed Dice scores were as follows: Grade 0 (0.97), Grade 1 (0.99), Grade 2 (0.99), Grade 3 (0.99), Grade 4 (0.99), and Grade 5 (0.98). The model consistently achieved Dice scores between 0.97 and 0.99, demonstrating high accuracy in segmenting ulcer regions across all severity levels. This level of precision contributed to a reduction in misclassification errors during the severity grading. We compared our iACMSOCE segmentation method with several existing segmentation approaches.19,25-27 The bounding box in Figure 9 highlights that the iACMSOCE method outperformed existing segmentation approaches in accurately delineating the ulcer region. Different classifiers, including MobileNetV3, Artificial Neural Networks, and MobileNetV2, 28 were employed to analyze existing approaches19,25-27 and evaluate classification accuracy. A brief interpretation of each metric and its clinical relevance is as follows:
Accuracy reflects the overall proportion of DFUs correctly classified into their respective Wagner grades.
Precision indicates the proportion of DFUs predicted as a specific grade (eg, Grade 3) that are truly of that grade. High precision minimizes over-classification and avoids unnecessary escalation of care.
Recall measures the proportion of actual DFUs of a given grade (eg, Grade 4) that the model correctly identifies. High recall is essential for detecting severe cases requiring urgent intervention.
Specificity is the model’s ability to correctly identify DFUs that are not a specific grade, preventing it from incorrectly flagging low-grade DFUs as more severe.
F1-score balances precision and recall, offering a robust measure of performance, especially in data sets with class imbalance.
False-Negative Rate (FNR) quantifies the proportion of severe DFUs that the model fails to identify.
The performance of each approach was assessed based on accuracy, precision, sensitivity, specificity, F1 score, false-positive rate (FPR), and FNR as presented in Table 3.

Segmented ulcer (ROI) using the iACMSOCE method and existing methods.
Comparative evaluation of iACMSOCE method coupled with MobileNetV3-Small versus existing segmentation methods and classifiers.
The iACMSOCE segmentation with the MobileNetV3-Small classifier achieved an accuracy of 95.58%, a precision of 95.54%, a sensitivity of 95.58%, a specificity of 99.16%, and an F1 score of 95.53% after 100 testing epochs. The proposed method achieved the lowest FPR (0.84%) and FNR (4.83%). In contrast, ACM 19 recorded 4.56% and 23.76%, K-means 25 reported 10.88% and 7.97%, FCM 25 reported 1.08% and 6.35%, ResNet18 26 recorded 9.12% and 19.43%, and GVF 27 reported 9.88% and 5.03%, respectively. This demonstrates that the iACMSOCE with MobileNetV3-Small consistently minimized both false positives and false negatives, thereby improving overall reliability in clinical assessment. To further evaluate the generalizability of the proposed model, internal validation was conducted using 600 ulcer images collected from the same hospital but from a different patient cohort than those included in the training and testing sets. The validation results demonstrated a strong specificity of 94.83%, effectively reducing false-positive rates. Furthermore, the method achieved a sensitivity of 89.04% and a precision of 86.80% on the validation data set.
Discussion
The Wagner Grading System was selected for this technical feasibility study due to its direct clinical relevance to ulcer depth, aligning with our image-based visual assessment. Its simplicity ensures consistent application and enhances inter-rater reliability for initial model validation. As it is commonly documented in retrospective and resource-limited settings, it provided a pragmatic choice for early development. Wagner also maintains its role as a foundational tool in clinical guidelines, complementing broader assessment frameworks. Although the Wagner grading system’s use has declined in favor of more comprehensive frameworks such as the wound/ischemia/foot infection (WIfI), University of Texas (UT), and International Working Group on the Diabetic Foot (IWGDF) systems, some centers still employ it. Our collaborating clinical sites have since transitioned to using these updated systems. In future studies, we plan to evaluate and integrate these comprehensive frameworks into our algorithms to enhance clinical relevance and alignment with current practice. This study serves as a technical proof of concept, demonstrating the model’s ability to identify key visual features. Future work will incorporate ischemic and systemic parameters for classification with comprehensive systems like WIfI and UT to enhance clinical utility. Wagner’s continued relevance is supported by recent clinical studies29-31 which utilize or compare this grading system. The iACMSOCE with MobileNetV3-Small framework is a multi-stage pipeline integrating an iACM with Sobel edge detection, CE, and a lightweight MobileNetV3-Small classifier. Our method utilized a diverse data set of 1339 original ulcer images from 510 patients. This was significantly augmented to 6579 images through various techniques, including rotations, scaling, and brightness variations. This comprehensive augmentation strategy was crucial for enhancing the classifier’s generalization across different ulcer grades and effectively mitigating overfitting, a common issue observed in previous studies with smaller data sets. In contrast to earlier efforts such as Niri et al 7 with 219 images, Heras-Tang et al 8 with 37 images, and Rajathi et al 19 with 1500 images, iACMSOCE with MobileNetV3-Small benefited from broader visual diversity. Similar augmentation strategies were employed by Wang et al 9 with 4050 images, Yu et al 13 with 62 124 images, Alzubaidi et al 15 with 13 572 images, and Toofanee et al 17 with 15 683 images, all of whom reported strong classification results using large data sets and hybrid networks. The iACMSOCE with MobileNetV3-Small framework achieved 95.58% accuracy, 95.58% sensitivity, and an F1 score of 95.53% while remaining lightweight and suitable for mobile deployment. The performance measures highlighted how targeted data augmentation improved both training diversity and the efficacy of compact classification models in clinical practice. Statistically, iACMSOCE outperformed a wide range of existing DFU segmentation and classification methods. For segmentation, it achieved DSCs between 0.97 and 0.99 across all Wagner grades, exceeding the performance of Niri et al’s 7 superpixel-based CNN (Dice: 0.75 on 219 images) and Wang et al’s 14 MobileNetV2 framework (Dice: 0.90 on 5000 images). Hybrid techniques, such as those by Heras-Tang et al 8 and Chen et al, 10 delivered moderate results (eg, Jaccard Index: 0.81, Accuracy: 94%, Sensitivity: 86%) but lacked robustness across complex ulcer geometries. In classification, iACMSOCE achieved 95.58% accuracy, 95.58% sensitivity, 99.16% specificity, and an F1 score of 95.53% while maintaining a low FPR of 0.84% and an FNR of 4.83%. A 4.83% FNR indicates that approximately 5% of severe DFU cases may be misclassified, resulting in critical clinical consequences, including delayed treatment, accelerated disease progression, increased amputation risk, patient harm, and a higher health care burden. While this study is a technical feasibility assessment, we prioritize reducing this FNR through refinement, broader validation, and clinical integration for patient safety. Comparative performance included ACM (80.2% accuracy, 80.20% sensitivity, 95.44% specificity, F1: 80.00%), 19 K-means (89.03% accuracy, F1: 88.99%), 25 FCM (91.12% accuracy, F1: 88.72%), 25 ResNet18 (88.57% accuracy, F1: 88.44%), 26 and GVF (94.97% accuracy, F1: 94.83%). 27 Although DUTC-Net 19 achieved strong scores (97.5% accuracy, 97.1% sensitivity, 98% specificity), its reliance on computationally heavy preprocessing, such as reflection and hair removal, made it less feasible for mobile applications. Its lightweight architecture, paired with strong performance, makes iACMSOCE highly suitable for resource-constrained mobile health environments, offering both precision and scalability for real-time DFU screening and severity classification. The proposed model serves primarily as a triage and remote monitoring tool to assist in the early detection and management of DFUs. It enables non-specialists and primary care providers to identify ulcer severity at an early stage, facilitating prompt referral, earlier diagnosis of infection, and timely management. A pilot test involving 100 patients at ILS Hospital confirmed the app’s ease of use, smooth workflow, and desired results. While larger usability tests are planned for the future, the current version shows significant promise as a tool for DFU risk prediction.
Despite the strong performance of the proposed framework, a few limitations must be acknowledged. The retrospective design of this study carries a risk of selection bias. Furthermore, its single-center origin limits the generalizability of our findings to broader populations. Since patient demographics, clinical practices, and wound care protocols may vary across regions and institutions, multi-center data collection in future studies will help validate the model’s robustness and ensure wider clinical applicability. External validation and model calibration are currently being performed as part of an ongoing follow-up study and will be presented in a future publication. A significant methodological constraint is the reliance on Wagner’s Classification, which does not incorporate ischemic parameters, thereby omitting a critical dimension when assessing the true severity of DFUs. This study did not include vascular status or offloading practices in the data set, which may influence ulcer healing outcomes and affect the overall performance and generalizability of the model.
Conclusion
The proposed method integrates an iACM, the SO, and CE to accurately isolate ulcer regions in foot images. The segmented regions were classified using MobileNetV3-Small, a lightweight deep-learning model optimized for resource-constrained devices. Utilizing a comprehensive data set spanning six severity grades, the novel algorithm automates the segmentation process without relying on any in-built tools. A comparative study demonstrates that this method improves classification accuracy and outperforms current state-of-the-art techniques across all evaluation metrics. By combining an optimized segmentation algorithm with a lightweight classification architecture, this approach facilitates efficient foot ulcer grading, enabling timely clinical decision-making and enhancing patient outcomes. In addition, a mobile app was developed for real-time severity grading of ulcers according to the Wagner classification system. In the future, the model will be validated using an external data set that does not overlap with the training and testing data sets to ensure its robustness and generalizability.
Footnotes
Acknowledgements
The authors would like to thank the staff at the ILS Hospital, Kolkata, India, for their assistance with data collection. The authors confirm that this manuscript does not use generative AI for creating ulcer images, tables, applying any mathematical operations, formulas, during ulcer segmentation, accuracy results, Dice scores, or references. Generative AI was used exclusively to improve the English language structure of the text.
Abbreviations
ACM, active contour model; AI, artificial intelligence; CE, contour extraction; CNN, convolutional neural network; DFU, diabetic foot ulcer; DSC, dice similarity coefficient; FCM, Fuzzy C-means; FNR, false-negative rate; FPR, false-positive rate; GVF, gradient vector flow; iACM, improved active contour model; iACMSOCE, improved active contour model + Sobel operator + contour extraction; UT, University of Texas classification; PSNR, peak signal-to-noise ratio; ROI, region of interest; SO, Sobel operator; U-Net, U-shaped convolutional network; WIfI, wound/ischemia/foot infection classification system.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
