Abstract
Objective:
Kidney stone disease is a growing global health problem, with increasing diagnostic challenges due to heterogeneous clinical presentations and imaging demands. This review aims to evaluate the role of artificial intelligence (AI) in imaging-based detection and assessment of urinary tract stones.
Methods:
A systematic review was conducted by searching PubMed/MEDLINE, EMBASE, Cochrane CENTRAL, and Web of Science from database inception to May 2025 for studies evaluating AI-based imaging in the diagnosis of urinary tract stones. The search followed Preferred Reporting Items for Systematic reviews and Meta-Analyses literature search extension (PRISMA-S) guidance and used a Population, Intervention, Comparison, Outcomes and Study (PICOS) framework to identify eligible adult studies applying AI models to computed tomography (CT), ultrasound, or radiographic imaging, with expert interpretation as reference and diagnostic performance outcomes.
Results:
From 1142 records identified, 11 studies published between 2017 and 2025 met the inclusion criteria. Most studies used CT imaging and reported high diagnostic performance, with accuracies exceeding 90% in eight studies and reaching over 95% in several deep learning–based approaches, while ultrasound-based AI models also demonstrated strong performance with sensitivities and accuracies above 90%.
Conclusion:
AI–based imaging demonstrates high diagnostic accuracy for urinary tract stone detection, particularly with CT. AI-enhanced ultrasound represents a practical and cost-effective alternative for implementation in resource-limited and rural settings.
Level of evidence:
2
Introduction
Kidney stone disease, or urolithiasis, is a growing global health concern, with its prevalence and incidence rates steadily increasing. In the United States, approximately 1 in 10 individuals are expected to experience a kidney stone event during their lifetime, corresponding to a prevalence of 9% and an incidence rate of 239 per 100,000 person-years. 1 This upward trend is attributed to various factors, including dietary habits, climate, and metabolic syndromes.
In Indonesia, the burden of kidney stone disease is significant. Studies indicate that urolithiasis accounts for approximately 10% of patients with kidney failure in tertiary-care centers in Jakarta, a figure higher than reports from Western countries. 2 The disease predominantly affects males, with a male-to-female ratio of 1.8:1, and is most prevalent among individuals aged 51–60 years. The most common type of kidney stone in the Indonesian population is calcium oxalate, followed by uric acid stones. 3 Factors contributing to the high prevalence include dietary habits, warm climate, limited access to clean water, and the increasing incidence of metabolic diseases such as obesity and diabetes. 2
Diagnosing and managing kidney stone disease is complex due to its heterogeneous clinical presentation, which often overlaps with other urological and non-urological conditions. Accurate diagnosis requires a comprehensive assessment of the stone burden, including size, number, anatomical location, and chemical composition. These parameters are crucial for guiding treatment decisions, ranging from conservative management to surgical intervention, and for predicting recurrence risk.4,5
Artificial Intelligence (AI) has emerged as a transformative tool in medical imaging and diagnostics, offering novel solutions to long-standing challenges in the evaluation of stone disease. 6 By leveraging machine learning algorithms, particularly deep learning techniques such as convolutional neural networks (CNNs) and vision transformers (ViTs), AI enhances diagnostic accuracy, speed, and consistency in imaging-based evaluations. These technologies enable automatic detection and segmentation of urinary stones from various imaging modalities, including non-contrast computed tomography (NCCT), renal ultrasonography (RUS), and plain abdominal radiography (KUB).
In Indonesia, the application of AI in diagnosing kidney stones is still in its nascent stages. 7 While some studies have explored AI-based classification of urinary stones using micro-computed tomography (micro-CT), there is a pressing need for more extensive research and development in this area. 8 The integration of AI into clinical practice could significantly enhance diagnostic capabilities, particularly in regions with limited access to specialized radiological expertise.
Despite the growing body of literature on AI applications in urology, there remains a notable gap in consolidated evidence focusing specifically on AI applications in imaging-based detection of urinary tract stones. Most reviews to date have either been limited to singular modalities or have not provided an in-depth analysis of stone detection techniques. Furthermore, this review emphasizes the translational potential of AI in daily radiological practice and underscores the need for standardized datasets, external validation, and regulatory considerations as AI continues to move toward widespread adoption in urologic imaging.
Materials and methods
Protocol and reporting standards
We conducted a comprehensive systematic review by searching PubMed, Scopus, EMBASE, and Web of Science from database inception through May 2025 for studies evaluating AI in imaging-based diagnosis of urinary tract stones. Our search strategy combined terms related to urinary calculi and AI techniques, including renal calculi, kidney stones, deep learning, machine learning, CNNs, ultrasound (US), computed tomography (CT), and radiography. Reference lists of relevant articles and recent reviews were also screened to identify additional eligible studies. Two independent reviewers screened titles and abstracts, followed by full-text assessment; any discrepancies were resolved through discussion with a third reviewer.
Data sources and search strategy
We performed a comprehensive literature search in May 2025 across four electronic databases: PubMed/MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials (CENTRAL), and Web of Science. The search combined controlled vocabulary (e.g. MeSH terms) and free-text keywords related to urinary tract stones (“renal calculi,” “kidney stones,” “ureterolithiasis,” “urolithiasis,” “nephrolithiasis”), artificial intelligence (“machine learning,” “deep learning,” “convolutional neural network”), and imaging modalities (“computed tomography,” “ultrasound,” “plain radiography”). The date range for the literature search was limited from January 2010 to May 2025
Search methods adhered to Preferred Reporting Items for Systematic reviews and Meta-Analyses literature search extension (PRISMA-S) guidance for reporting literature searches and Cochrane Handbook recommendations to maximize sensitivity and reproducibility. No language restrictions were applied, but non-English articles without accessible translation were excluded.
Eligibility criteria
Studies were selected according to the PICOS framework:
Population: Adult patients undergoing imaging (non-contrast CT, CT urography, US, or KUB radiograph) for urinary tract stones.
Intervention: Application of an AI model for stone detection, segmentation, or characterization on medical images.
Comparison: Reference standard interpretation by expert radiologists or urologists, or conventional image analysis without AI.
Outcomes: At least one diagnostic performance metric—sensitivity, specificity, accuracy, or area under the receiver operating characteristic curve (AUC/ROC)—and data on stone location and size.
Study design: Prospective or retrospective cohort studies, diagnostic accuracy studies, and randomized controlled trials published up to May 2025.
Exclusion criteria included phantom or animal studies, case reports, narrative reviews, and studies lacking extractable diagnostic performance data.
Study selection
Two reviewers independently screened titles and abstracts for relevance. Full texts of potentially eligible studies were then assessed against the inclusion criteria. Any disagreements were resolved by discussion, with arbitration by a third reviewer if necessary. The selection process is documented in a PRISMA flow diagram.
Data extraction
Data extraction was performed using a piloted, standardized form by two independent reviewers who collated comprehensive details from each eligible study. For study characteristics, we recorded the first author’s name, year of publication, country of origin, and study design. Patient and imaging data including the total number of cases or images analyzed, stone location (kidney, ureter, or bladder), and stone dimensions (reported as mean ± standard deviation or range) were also captured. Information on the AI models comprised the type of algorithm, the imaging modality used, and the proportions of data allocated to training, validation, and testing. Diagnostic performance metrics extracted included sensitivity, specificity, overall accuracy, and AUC/ROC. In studies that evaluated more than one AI architecture or imaging modality, each distinct analysis was treated as a separate entry in the dataset. Any discrepancies between reviewers’ extractions were resolved through discussion until consensus was achieved.
Results
Study selection
A total of 1142 records were retrieved from PubMed, Scopus, EMBASE, and Web of Science. After removing 312 duplicates, 830 titles and abstracts were screened, yielding 768 exclusions for reasons such as non-AI imaging applications, ineligible modalities, phantom or animal studies, or absence of diagnostic performance data. Sixty-two full-text articles were then assessed, of which 51 were excluded due to lack of a human imaging cohort, insufficient reporting of sensitivity/specificity/accuracy/AUC, or mixed-modality analyses without separate AI results. Ultimately, 11 studies fulfilled all inclusion criteria and were incorporated into our systematic review (Figure 1).

PRISMA flowchart of the study.
A total of 11 studies published between 2017 and 2025 evaluated machine and deep learning methods for detecting urinary tract stones on CT or US images. Sample sizes ranged from healthy control datasets 9 to more than 12,000 CT scans, 10 and stones were found in the renal sinus, renal parenchyma, ureter, or more broadly throughout the urinary tract. Table 1 summarized characteristics of 11 studies included in this study.
Characteristic of the included study.
CT imaging was the predominant modality. Eight investigations reported accuracies consistently above 90%. For example, Verma et al. 9 used K-nearest neighbors (KNNs) and support vector machines (SVMs) on a public CT dataset, achieving over 90% accuracy for stone versus no-stone classification. In Japan, Kobayashi et al. 11 applied a 17-layer ResNet (F-score 0.752; sensitivity 0.872; specificity 0.662), and Preedanan et al. 12 used a cascaded U-Net to segment small-to-medium stones (pixel-wise F₂ score 71.3%). Other CT studies include Patro et al.’s 13 Kronecker CNN in Turkey (98.6% accuracy), Islam et al.’s 10 Swin Transformer in Norway (99.3% multi-class accuracy for stones, cysts and tumors), Cui et al.’s 14 3D U-Net plus thresholding in China (95.9% detection accuracy; PPV 98.7%), Parakh et al.’s 15 cascading CNN in Korea (up to 95% sensitivity, specificity and overall accuracy; AUC 0.954), and Längkvist et al.’s 16 2.5D CNN in Sweden (100% sensitivity for ureteral stones).
Meanwhile, three studies focused on US images. In India, Selvarani and Rajendran paired a particle swarm optimization (PSO)-SVM with real-time US (false alarm rate 1.8%; false rejection rate 3.3%; accuracy 98.8%), 17 while in the United States, Khan et al. 18 applied principal component analysis (PCA) feature extraction with median filtering and thresholding to clinical US, reporting 92.2% sensitivity and 96.8% accuracy.
Geographically, research spans Asia (India, Japan, Turkey, China, Korea), Europe (Sweden, Norway), and North America (USA). Stone size reporting varied; some studies omitted size details, and others covered the full spectrum from under 4 mm to over 10 mm or small-to-medium calculi. Traditional machine learning models (KNN, SVM, PSO-SVM) and classical image processing pipelines remain competitive for US applications, while CNNs, U-Net variants, and transformers dominate CT-based approaches. Overall, AI-driven detection on CT routinely exceeds 95% accuracy, and US-based methods also achieve high performance (over 96%), suggesting that deep learning architectures, particularly CNN and transformer models, hold promise for clinical stone detection and characterization.
Discussion
The systematic review encompassed 11 studies published between 2017 and 2025, focusing on the application of AI in detecting urinary tract stones using imaging modalities such as CT and US. The majority of these studies utilized CT imaging, reporting high accuracy rates. For instance, Parakh et al. 15 developed a cascading CNN model enriched with modality-specific CT images, achieving an AUC of 0.954 and an accuracy of 95% in detecting urinary tract stones. Similarly, Längkvist et al. 16 employed a deep learning CNN model to distinguish ureteric stones from phleboliths based on thin-slice CT images, reporting a sensitivity of 100%.
US-based studies, though fewer, also demonstrated promising results. Selvarani and Rajendran combined a PSO-SVM with real-time US imaging, achieving a false alarm rate of 1.8%, a false rejection rate of 3.3%, and an overall accuracy of 98.8%. 17 Khan et al. 18 applied PCA-based feature extraction with median filtering and thresholding to clinical US images, reporting a sensitivity of 92.2% and accuracy of 96.8%.
These findings align with other studies in the literature. For example, a study by Imamura et al. 20 highlighted the importance of selecting an appropriate imaging modality for stone diagnosis, noting that CT scans provide the most accurate diagnosis but expose patients to ionizing radiation, whereas US has lower sensitivity and specificity but avoids radiation exposure. In addition, a review by Altunhan et al. 21 reported that AI models achieved an average precision of 96.9% in stone detection tasks.
Implementing AI-assisted urinary stone detection in Indonesia, particularly in rural areas, presents both opportunities and challenges. CT imaging, while highly accurate, may not be readily available in rural healthcare facilities due to cost and infrastructure limitations. US imaging, being more accessible and cost-effective, offers a viable alternative. 7 The high accuracy reported in US-based AI models suggests that integrating AI with US imaging could enhance diagnostic capabilities in resource-limited settings. 22
Implementing AI-assisted US imaging for urinary stone detection in rural Indonesia presents both opportunities and challenges. The integration of AI into US medicine has revolutionized medical imaging, enhancing diagnostic accuracy and efficiency.7,23 Infrastructure development is a critical factor. Ensuring the availability of portable US machines equipped with AI capabilities is essential for deployment in rural healthcare settings. Advancements in AI-enabled US devices have made them more accessible and user-friendly, facilitating their use in areas with limited resources. Training and education of healthcare professionals are paramount. Providing adequate training ensures that medical staff can effectively utilize AI-assisted diagnostic tools, leading to improved patient outcomes. The integration of AI into US imaging has been shown to enhance the diagnostic capabilities of healthcare providers, even those with limited experience. Data localization is another important consideration. Developing AI models trained on local population data can improve the accuracy and relevance of diagnostic tools. This approach ensures that the AI systems are tailored to the specific characteristics of the Indonesian population, thereby enhancing their effectiveness. Establishing clear policies and regulations is necessary to govern the use of AI in medical diagnostics. This includes setting standards for data privacy, algorithm transparency, and clinical validation to ensure the safety and efficacy of AI-assisted tools. 24 By addressing these factors, AI-assisted US imaging could become a valuable tool in improving urinary stone detection and overall healthcare delivery in rural Indonesian communities.
From a practical clinical perspective, the integration of AI into urinary stone detection should be viewed not merely as a diagnostic enhancement but as a workflow-augmenting tool. In high-volume settings, AI can function as a triage system—prioritizing suspected stone-positive scans, reducing radiologist workload, and shortening time-to-diagnosis, particularly in emergency contexts where rapid decision-making is critical. However, despite the high performance metrics reported, these models often operate under controlled experimental conditions that may not reflect real-world variability. Dataset heterogeneity—including differences in scanner types, imaging protocols, and patient demographics—remains a significant barrier to generalizability. Moreover, the predominance of single-center, retrospective designs introduces risks of overfitting and spectrum bias, limiting external validity. Regulatory and medico-legal considerations further complicate implementation, especially in low- and middle-income countries, where standardized approval pathways for AI-based medical devices are still evolving. Importantly, most current models focus on detection alone, with limited capability in clinically relevant extensions such as stone composition analysis, obstruction severity, or treatment guidance. 25
While the reviewed studies demonstrate promising results, several limitations must be considered. Many studies lacked external validation, raising concerns about the generalizability of the AI models across different populations and clinical settings. Variability in imaging protocols and equipment across studies may affect the performance and reproducibility of AI models. Few studies addressed the ability of AI models to predict stone composition, which is crucial for determining appropriate treatment strategies. The use of retrospective data and potential selection biases in study populations may impact the validity of the findings. Implementing AI-assisted diagnostics requires significant resources, including computing infrastructure and trained personnel, which may be challenging in low-resource settings. Addressing these limitations through prospective studies, standardized imaging protocols, and inclusive datasets will be essential for the successful integration of AI into urinary stone detection and management.
Future research should therefore move beyond proof-of-concept accuracy studies toward multi-center, prospective validation with standardized imaging protocols and reporting frameworks. Integration into clinical workflows must also be emphasized, including interoperability with picture archiving and communication systems (PACS), real-time decision support, and clinician-in-the-loop models to maintain accountability. In addition, the development of locally trained or fine-tuned algorithms using population-specific datasets will be crucial to ensure equitable performance across diverse healthcare settings. Ultimately, the clinical value of AI in urinary stone disease will depend not only on its diagnostic accuracy but also on its ability to demonstrably improve patient outcomes, optimize resource utilization, and integrate seamlessly into existing healthcare infrastructures.
Conclusion
In conclusion, this systematic review demonstrates that AI, particularly models using CT and US imaging, shows high accuracy in detecting urinary tract stones and holds promise for enhancing diagnostic precision. While CT offers superior accuracy, US combined with AI is more feasible for use in rural settings like those in Indonesia due to its accessibility and lower cost.
Footnotes
Acknowledgements
The authors would like to thank the Department of Urology, Faculty of Medicine, Universitas Brawijaya, Malang, Indonesia, for their support in this research. No specific funding was received for this study.
Ethical Considerations
Ethical approval was not sought for this article because this study is a systematic review based exclusively on previously published literature.
Consent to participate
Not applicable. As this research is a systematic review and relies entirely on previously published data, no new human subjects were involved, and formal informed consent was not required.
Consent for publication
Not applicable.
Author Contributions
All authors contributed to the planning and design of the study. J.V., P.S., and T.N.B. conducted the literature search, study selection, and data extraction. Data synthesis and analysis were performed by all authors. The first draft of the manuscript was written collaboratively by all authors. All authors discussed, critically revised, and approved the final manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. All original studies included in this review are appropriately cited.
Informed consent
Informed consent was not sought for this article because this study did not involve individual patient data.
Guarantor
J.V.
