Advancements in artificial intelligence for detecting urinary tract stones: A systematic review of imaging-based approaches

Abstract

Objective:

Kidney stone disease is a growing global health problem, with increasing diagnostic challenges due to heterogeneous clinical presentations and imaging demands. This review aims to evaluate the role of artificial intelligence (AI) in imaging-based detection and assessment of urinary tract stones.

Methods:

A systematic review was conducted by searching PubMed/MEDLINE, EMBASE, Cochrane CENTRAL, and Web of Science from database inception to May 2025 for studies evaluating AI-based imaging in the diagnosis of urinary tract stones. The search followed Preferred Reporting Items for Systematic reviews and Meta-Analyses literature search extension (PRISMA-S) guidance and used a Population, Intervention, Comparison, Outcomes and Study (PICOS) framework to identify eligible adult studies applying AI models to computed tomography (CT), ultrasound, or radiographic imaging, with expert interpretation as reference and diagnostic performance outcomes.

Results:

From 1142 records identified, 11 studies published between 2017 and 2025 met the inclusion criteria. Most studies used CT imaging and reported high diagnostic performance, with accuracies exceeding 90% in eight studies and reaching over 95% in several deep learning–based approaches, while ultrasound-based AI models also demonstrated strong performance with sensitivities and accuracies above 90%.

Conclusion:

AI–based imaging demonstrates high diagnostic accuracy for urinary tract stone detection, particularly with CT. AI-enhanced ultrasound represents a practical and cost-effective alternative for implementation in resource-limited and rural settings.

Level of evidence:

Keywords

artificial intelligence urinary tract stones medical imaging diagnostic accuracy

Introduction

Kidney stone disease, or urolithiasis, is a growing global health concern, with its prevalence and incidence rates steadily increasing. In the United States, approximately 1 in 10 individuals are expected to experience a kidney stone event during their lifetime, corresponding to a prevalence of 9% and an incidence rate of 239 per 100,000 person-years.¹ This upward trend is attributed to various factors, including dietary habits, climate, and metabolic syndromes.

In Indonesia, the burden of kidney stone disease is significant. Studies indicate that urolithiasis accounts for approximately 10% of patients with kidney failure in tertiary-care centers in Jakarta, a figure higher than reports from Western countries.² The disease predominantly affects males, with a male-to-female ratio of 1.8:1, and is most prevalent among individuals aged 51–60 years. The most common type of kidney stone in the Indonesian population is calcium oxalate, followed by uric acid stones.³ Factors contributing to the high prevalence include dietary habits, warm climate, limited access to clean water, and the increasing incidence of metabolic diseases such as obesity and diabetes.²

Diagnosing and managing kidney stone disease is complex due to its heterogeneous clinical presentation, which often overlaps with other urological and non-urological conditions. Accurate diagnosis requires a comprehensive assessment of the stone burden, including size, number, anatomical location, and chemical composition. These parameters are crucial for guiding treatment decisions, ranging from conservative management to surgical intervention, and for predicting recurrence risk.^4,5

Artificial Intelligence (AI) has emerged as a transformative tool in medical imaging and diagnostics, offering novel solutions to long-standing challenges in the evaluation of stone disease.⁶ By leveraging machine learning algorithms, particularly deep learning techniques such as convolutional neural networks (CNNs) and vision transformers (ViTs), AI enhances diagnostic accuracy, speed, and consistency in imaging-based evaluations. These technologies enable automatic detection and segmentation of urinary stones from various imaging modalities, including non-contrast computed tomography (NCCT), renal ultrasonography (RUS), and plain abdominal radiography (KUB).

In Indonesia, the application of AI in diagnosing kidney stones is still in its nascent stages.⁷ While some studies have explored AI-based classification of urinary stones using micro-computed tomography (micro-CT), there is a pressing need for more extensive research and development in this area.⁸ The integration of AI into clinical practice could significantly enhance diagnostic capabilities, particularly in regions with limited access to specialized radiological expertise.

Despite the growing body of literature on AI applications in urology, there remains a notable gap in consolidated evidence focusing specifically on AI applications in imaging-based detection of urinary tract stones. Most reviews to date have either been limited to singular modalities or have not provided an in-depth analysis of stone detection techniques. Furthermore, this review emphasizes the translational potential of AI in daily radiological practice and underscores the need for standardized datasets, external validation, and regulatory considerations as AI continues to move toward widespread adoption in urologic imaging.

Materials and methods

Protocol and reporting standards

We conducted a comprehensive systematic review by searching PubMed, Scopus, EMBASE, and Web of Science from database inception through May 2025 for studies evaluating AI in imaging-based diagnosis of urinary tract stones. Our search strategy combined terms related to urinary calculi and AI techniques, including renal calculi, kidney stones, deep learning, machine learning, CNNs, ultrasound (US), computed tomography (CT), and radiography. Reference lists of relevant articles and recent reviews were also screened to identify additional eligible studies. Two independent reviewers screened titles and abstracts, followed by full-text assessment; any discrepancies were resolved through discussion with a third reviewer.

Data sources and search strategy

We performed a comprehensive literature search in May 2025 across four electronic databases: PubMed/MEDLINE, EMBASE, Cochrane Central Register of Controlled Trials (CENTRAL), and Web of Science. The search combined controlled vocabulary (e.g. MeSH terms) and free-text keywords related to urinary tract stones (“renal calculi,” “kidney stones,” “ureterolithiasis,” “urolithiasis,” “nephrolithiasis”), artificial intelligence (“machine learning,” “deep learning,” “convolutional neural network”), and imaging modalities (“computed tomography,” “ultrasound,” “plain radiography”). The date range for the literature search was limited from January 2010 to May 2025

Search methods adhered to Preferred Reporting Items for Systematic reviews and Meta-Analyses literature search extension (PRISMA-S) guidance for reporting literature searches and Cochrane Handbook recommendations to maximize sensitivity and reproducibility. No language restrictions were applied, but non-English articles without accessible translation were excluded.

Eligibility criteria

Studies were selected according to the PICOS framework:

Population: Adult patients undergoing imaging (non-contrast CT, CT urography, US, or KUB radiograph) for urinary tract stones.

Intervention: Application of an AI model for stone detection, segmentation, or characterization on medical images.

Comparison: Reference standard interpretation by expert radiologists or urologists, or conventional image analysis without AI.

Outcomes: At least one diagnostic performance metric—sensitivity, specificity, accuracy, or area under the receiver operating characteristic curve (AUC/ROC)—and data on stone location and size.

Study design: Prospective or retrospective cohort studies, diagnostic accuracy studies, and randomized controlled trials published up to May 2025.

Exclusion criteria included phantom or animal studies, case reports, narrative reviews, and studies lacking extractable diagnostic performance data.

Study selection

Two reviewers independently screened titles and abstracts for relevance. Full texts of potentially eligible studies were then assessed against the inclusion criteria. Any disagreements were resolved by discussion, with arbitration by a third reviewer if necessary. The selection process is documented in a PRISMA flow diagram.

Data extraction

Data extraction was performed using a piloted, standardized form by two independent reviewers who collated comprehensive details from each eligible study. For study characteristics, we recorded the first author’s name, year of publication, country of origin, and study design. Patient and imaging data including the total number of cases or images analyzed, stone location (kidney, ureter, or bladder), and stone dimensions (reported as mean ± standard deviation or range) were also captured. Information on the AI models comprised the type of algorithm, the imaging modality used, and the proportions of data allocated to training, validation, and testing. Diagnostic performance metrics extracted included sensitivity, specificity, overall accuracy, and AUC/ROC. In studies that evaluated more than one AI architecture or imaging modality, each distinct analysis was treated as a separate entry in the dataset. Any discrepancies between reviewers’ extractions were resolved through discussion until consensus was achieved.

Results

Study selection

A total of 1142 records were retrieved from PubMed, Scopus, EMBASE, and Web of Science. After removing 312 duplicates, 830 titles and abstracts were screened, yielding 768 exclusions for reasons such as non-AI imaging applications, ineligible modalities, phantom or animal studies, or absence of diagnostic performance data. Sixty-two full-text articles were then assessed, of which 51 were excluded due to lack of a human imaging cohort, insufficient reporting of sensitivity/specificity/accuracy/AUC, or mixed-modality analyses without separate AI results. Ultimately, 11 studies fulfilled all inclusion criteria and were incorporated into our systematic review (Figure 1).

Figure 1.

PRISMA flowchart of the study.

A total of 11 studies published between 2017 and 2025 evaluated machine and deep learning methods for detecting urinary tract stones on CT or US images. Sample sizes ranged from healthy control datasets⁹ to more than 12,000 CT scans,¹⁰ and stones were found in the renal sinus, renal parenchyma, ureter, or more broadly throughout the urinary tract. Table 1 summarized characteristics of 11 studies included in this study.

Table 1.

Characteristic of the included study.

Author	Country	Characteristic patient	Stone location	Stone size	AI model	Outcome	Sensitivity	Specificity	Accuracy	AUC/ROC
Verma et al.⁹	India	Healthy subjects (UCI dataset)	Kidney	Not specified	KNN, SVM	Binary classification (stone vs. no stone)	Not reported	Not reported	Accuracy >90%	Not reported
Preedanan et al.¹²	Japan	Patients with urinary stones	Urinary tract	Small to medium	Cascaded U-Net	F2 score for segmentation	Not reported	Not reported	F2 score 71.28% (pixel-wise)	Not reported
Kobayashi et al.¹¹	Japan	1017 patients with upper tract stones	Upper urinary tract	Not specified	ResNet (17-layer)	F-score 0.752	0.872	0.662	Not reported	Not reported
Patro et al.¹³	Multi-national	CT images from public dataset	Kidney	All sizes	Kronecker CNN	Stone detection	Not reported	Not reported	98.56%	Not reported
Islam et al.¹⁰	Norway	12,446 CT images	Kidney	All sizes	Swin Transformer	Multi-class classification (stone, cyst, tumor)	Not specified	Not specified	99.30%	Not reported
Yildirim et al.¹⁹	Turkey	433 subjects, 1799 images	Kidney	All sizes	Deep Learning (CNN)	Stone detection	Not reported	Not reported	96.82%	Not reported
Cui et al.¹⁴	China	117 test cases	Renal sinus	Measured by algorithm	3D U-Net+ thresholding	Stone detection and S.T.O.N.E scoring	95.9%	98.7% (PPV)	Not reported	Not reported
Parakh et al.¹⁵	Korea	535 CT scans	Urinary tract	<4 mm to ⩾10 mm	Cascading CNN	Stone detection	Up to 95%	Up to 95%	95%	0.954
Längkvist et al.¹⁶	Sweden	465 CT scans	Ureter	All sizes	CNN (2.5D)	Ureteral stone detection	100%	Not reported	Not reported	Not reported
Selvarani & Rajendran¹⁷	India	Real-time US images from Mithra Scans, India (150 calculi, 100 healthy)	Renal calculi		Meta-Heuristic PSO-SVM	FAR: 1.8%, FRR: 3.3%			98.8%
Khan et al.¹⁸	The United States	Clinical US images from St. Lukes Roosevelt Hospital, NY (50 test cases)	Kidney stones		PCA-based feature extraction + median filtering + thresholding	PSNR: 1.82, Avg. SNR: 1.58	92.16%		96.82

CT imaging was the predominant modality. Eight investigations reported accuracies consistently above 90%. For example, Verma et al.⁹ used K-nearest neighbors (KNNs) and support vector machines (SVMs) on a public CT dataset, achieving over 90% accuracy for stone versus no-stone classification. In Japan, Kobayashi et al.¹¹ applied a 17-layer ResNet (F-score 0.752; sensitivity 0.872; specificity 0.662), and Preedanan et al.¹² used a cascaded U-Net to segment small-to-medium stones (pixel-wise F₂ score 71.3%). Other CT studies include Patro et al.’s¹³ Kronecker CNN in Turkey (98.6% accuracy), Islam et al.’s¹⁰ Swin Transformer in Norway (99.3% multi-class accuracy for stones, cysts and tumors), Cui et al.’s¹⁴ 3D U-Net plus thresholding in China (95.9% detection accuracy; PPV 98.7%), Parakh et al.’s¹⁵ cascading CNN in Korea (up to 95% sensitivity, specificity and overall accuracy; AUC 0.954), and Längkvist et al.’s¹⁶ 2.5D CNN in Sweden (100% sensitivity for ureteral stones).

Meanwhile, three studies focused on US images. In India, Selvarani and Rajendran paired a particle swarm optimization (PSO)-SVM with real-time US (false alarm rate 1.8%; false rejection rate 3.3%; accuracy 98.8%),¹⁷ while in the United States, Khan et al.¹⁸ applied principal component analysis (PCA) feature extraction with median filtering and thresholding to clinical US, reporting 92.2% sensitivity and 96.8% accuracy.

Geographically, research spans Asia (India, Japan, Turkey, China, Korea), Europe (Sweden, Norway), and North America (USA). Stone size reporting varied; some studies omitted size details, and others covered the full spectrum from under 4 mm to over 10 mm or small-to-medium calculi. Traditional machine learning models (KNN, SVM, PSO-SVM) and classical image processing pipelines remain competitive for US applications, while CNNs, U-Net variants, and transformers dominate CT-based approaches. Overall, AI-driven detection on CT routinely exceeds 95% accuracy, and US-based methods also achieve high performance (over 96%), suggesting that deep learning architectures, particularly CNN and transformer models, hold promise for clinical stone detection and characterization.

Discussion

The systematic review encompassed 11 studies published between 2017 and 2025, focusing on the application of AI in detecting urinary tract stones using imaging modalities such as CT and US. The majority of these studies utilized CT imaging, reporting high accuracy rates. For instance, Parakh et al.¹⁵ developed a cascading CNN model enriched with modality-specific CT images, achieving an AUC of 0.954 and an accuracy of 95% in detecting urinary tract stones. Similarly, Längkvist et al.¹⁶ employed a deep learning CNN model to distinguish ureteric stones from phleboliths based on thin-slice CT images, reporting a sensitivity of 100%.

US-based studies, though fewer, also demonstrated promising results. Selvarani and Rajendran combined a PSO-SVM with real-time US imaging, achieving a false alarm rate of 1.8%, a false rejection rate of 3.3%, and an overall accuracy of 98.8%.¹⁷ Khan et al.¹⁸ applied PCA-based feature extraction with median filtering and thresholding to clinical US images, reporting a sensitivity of 92.2% and accuracy of 96.8%.

These findings align with other studies in the literature. For example, a study by Imamura et al.²⁰ highlighted the importance of selecting an appropriate imaging modality for stone diagnosis, noting that CT scans provide the most accurate diagnosis but expose patients to ionizing radiation, whereas US has lower sensitivity and specificity but avoids radiation exposure. In addition, a review by Altunhan et al.²¹ reported that AI models achieved an average precision of 96.9% in stone detection tasks.

Implementing AI-assisted urinary stone detection in Indonesia, particularly in rural areas, presents both opportunities and challenges. CT imaging, while highly accurate, may not be readily available in rural healthcare facilities due to cost and infrastructure limitations. US imaging, being more accessible and cost-effective, offers a viable alternative.⁷ The high accuracy reported in US-based AI models suggests that integrating AI with US imaging could enhance diagnostic capabilities in resource-limited settings.²²

Implementing AI-assisted US imaging for urinary stone detection in rural Indonesia presents both opportunities and challenges. The integration of AI into US medicine has revolutionized medical imaging, enhancing diagnostic accuracy and efficiency.^7,23 Infrastructure development is a critical factor. Ensuring the availability of portable US machines equipped with AI capabilities is essential for deployment in rural healthcare settings. Advancements in AI-enabled US devices have made them more accessible and user-friendly, facilitating their use in areas with limited resources. Training and education of healthcare professionals are paramount. Providing adequate training ensures that medical staff can effectively utilize AI-assisted diagnostic tools, leading to improved patient outcomes. The integration of AI into US imaging has been shown to enhance the diagnostic capabilities of healthcare providers, even those with limited experience. Data localization is another important consideration. Developing AI models trained on local population data can improve the accuracy and relevance of diagnostic tools. This approach ensures that the AI systems are tailored to the specific characteristics of the Indonesian population, thereby enhancing their effectiveness. Establishing clear policies and regulations is necessary to govern the use of AI in medical diagnostics. This includes setting standards for data privacy, algorithm transparency, and clinical validation to ensure the safety and efficacy of AI-assisted tools.²⁴ By addressing these factors, AI-assisted US imaging could become a valuable tool in improving urinary stone detection and overall healthcare delivery in rural Indonesian communities.

From a practical clinical perspective, the integration of AI into urinary stone detection should be viewed not merely as a diagnostic enhancement but as a workflow-augmenting tool. In high-volume settings, AI can function as a triage system—prioritizing suspected stone-positive scans, reducing radiologist workload, and shortening time-to-diagnosis, particularly in emergency contexts where rapid decision-making is critical. However, despite the high performance metrics reported, these models often operate under controlled experimental conditions that may not reflect real-world variability. Dataset heterogeneity—including differences in scanner types, imaging protocols, and patient demographics—remains a significant barrier to generalizability. Moreover, the predominance of single-center, retrospective designs introduces risks of overfitting and spectrum bias, limiting external validity. Regulatory and medico-legal considerations further complicate implementation, especially in low- and middle-income countries, where standardized approval pathways for AI-based medical devices are still evolving. Importantly, most current models focus on detection alone, with limited capability in clinically relevant extensions such as stone composition analysis, obstruction severity, or treatment guidance.²⁵

While the reviewed studies demonstrate promising results, several limitations must be considered. Many studies lacked external validation, raising concerns about the generalizability of the AI models across different populations and clinical settings. Variability in imaging protocols and equipment across studies may affect the performance and reproducibility of AI models. Few studies addressed the ability of AI models to predict stone composition, which is crucial for determining appropriate treatment strategies. The use of retrospective data and potential selection biases in study populations may impact the validity of the findings. Implementing AI-assisted diagnostics requires significant resources, including computing infrastructure and trained personnel, which may be challenging in low-resource settings. Addressing these limitations through prospective studies, standardized imaging protocols, and inclusive datasets will be essential for the successful integration of AI into urinary stone detection and management.

Future research should therefore move beyond proof-of-concept accuracy studies toward multi-center, prospective validation with standardized imaging protocols and reporting frameworks. Integration into clinical workflows must also be emphasized, including interoperability with picture archiving and communication systems (PACS), real-time decision support, and clinician-in-the-loop models to maintain accountability. In addition, the development of locally trained or fine-tuned algorithms using population-specific datasets will be crucial to ensure equitable performance across diverse healthcare settings. Ultimately, the clinical value of AI in urinary stone disease will depend not only on its diagnostic accuracy but also on its ability to demonstrably improve patient outcomes, optimize resource utilization, and integrate seamlessly into existing healthcare infrastructures.

Conclusion

In conclusion, this systematic review demonstrates that AI, particularly models using CT and US imaging, shows high accuracy in detecting urinary tract stones and holds promise for enhancing diagnostic precision. While CT offers superior accuracy, US combined with AI is more feasible for use in rural settings like those in Indonesia due to its accessibility and lower cost.

Footnotes

Acknowledgements

The authors would like to thank the Department of Urology, Faculty of Medicine, Universitas Brawijaya, Malang, Indonesia, for their support in this research. No specific funding was received for this study.

ORCID iDs

Taufiq N. Budaya

Juvensius Viosandy

Ethical Considerations

Ethical approval was not sought for this article because this study is a systematic review based exclusively on previously published literature.

Consent to participate

Not applicable. As this research is a systematic review and relies entirely on previously published data, no new human subjects were involved, and formal informed consent was not required.

Consent for publication

Not applicable.

Author Contributions

All authors contributed to the planning and design of the study. J.V., P.S., and T.N.B. conducted the literature search, study selection, and data extraction. Data synthesis and analysis were performed by all authors. The first draft of the manuscript was written collaboratively by all authors. All authors discussed, critically revised, and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. All original studies included in this review are appropriately cited.

Informed consent

Informed consent was not sought for this article because this study did not involve individual patient data.

Guarantor

J.V.

References

Aiumtrakul

Thongprayoon

Suppadungsuk

, et al. Global trends in kidney stone awareness: a time series analysis from 2004–2023. Clin Pract 2024; 14: 915–927.

Hustrini

Susalit

Lydia

, et al. The etiology of kidney failure in Indonesia: a multicenter study in tertiary-care centers in Jakarta. Ann Glob Health 2023; 89 : 36.

Widyasmara

Birowo

Rasyid

Urinary stone composition analysis in Indonesian population: a single major centre analysis. Indones J Urol 2018; 25(2): 1–6. https://doi.org/10.32421/juri.v25i2.406

Shastri

Patel

Sambandam

, et al. Kidney stone pathophysiology, evaluation and management: core curriculum 2023. Am J Kidney Dis 2023; 82(5): 617–634.

Akram

Jahrreiss

Skolarikos

, et al. Urological guidelines for kidney stones: overview and comprehensive update. J Clin Med 2024; 13: 1114.

Cil

Dogan

The efficacy of artificial intelligence in urology: a detailed analysis of kidney stone-related queries. World J Urol 2024; 42: 158.

Sulaksono

Kurniawati

Penerapan artificial intelligence dalam Mendeteksi Batu Ginjal secara Otomatis pada Citra CT Scan. J Imejing Diagn 2024; 10: 42–46.

Fitri

Haryanto

Arimura

, et al. Automated classification of urinary stones based on microcomputed tomography images using convolutional neural network. Physica Medica 2020; 78: 201–208.

Verma

Nath

Tripathi

, et al. Analysis and identification of kidney stone using Kth nearest neighbour (KNN) and support vector machine (SVM) classification techniques. Pattern Recognit Image Anal 2017; 27: 574–580.

10.

Islam

Hasan

Hossain

, et al. Vision transformer and explainable transfer learning models for auto detection of kidney cyst, stone and tumor from CT-radiography. Sci Rep 2022; 12: 11440.

11.

Kobayashi

Ishioka

Matsuoka

, et al. Computer-aided diagnosis with a convolutional neural network algorithm for automated detection of urinary tract stones on plain X-ray. BMC Urol 2021; 21: 102.

12.

Preedanan

Suzuki

Kondo

, et al. Urinary stones segmentation in abdominal X-ray images using cascaded U-net pipeline with stone-embedding augmentation and lesion-size reweighting approach. IEEE Access 2023; 11: 25702–25712.

13.

Patro

Allam

Neelapu

, et al. Application of Kronecker convolutions in deep learning technique for automated detection of kidney stones with coronal CT images. Inf Sci 2023; 640: 119005.

14.

Cui

Sun

, et al. Automatic detection and scoring of kidney stones on noncontrast CT images using S.T.O.N.E. Mol Imaging Biol 2021; 23(3): 436–445.

15.

Parakh

Lee

, et al. Urinary stone detection on CT images using deep convolutional neural networks: evaluation of model performance and generalization. Radiol Artif Intell 2019: 1(4): e180066.

16.

Längkvist

Jendeberg

Thunberg

, et al. Computer aided detection of ureteral stones in thin slice computed tomography volumes using convolutional neural networks. Comput Biol Med 2018; 97: 153–160.

17.

Selvarani

Rajendran

Detection of renal calculi in ultrasound image using meta-heuristic support vector machine. J Med Syst 2019; 43: 300.

18.

Khan

Das

Parameshwara

MC.

Detection of kidney stone using digital image processing: a holistic approach. Eng Res Express 2022; 4: 035040.

19.

Yildirim

Olcucu

Colak

. Trends in the treatment of urinary stone disease in Turkey. PeerJ 2018; 6: e5390.

20.

Imamura

Kawamura

Sazuka

, et al. Development of a nomogram for predicting the stone-free rate after transurethral ureterolithotripsy using semi-rigid ureteroscope. Int J Urol 2013; 20(6): 616–621.

21.

Altunhan

Soyturk

Guldibi

, et al. Artificial intelligence in urolithiasis: a systematic review of utilization and effectiveness. World J Urol 2024; 42: 579.

22.

Perez

Wisniewski

Ari

, et al. Investigation into application of AI and telemedicine in rural communities: a systematic literature review. Healthcare 2025; 13: 324.

23.

Yan

, et al. Progress in the application of artificial intelligence in ultrasound-assisted medical diagnosis. Bioengineering 2025; 12: 288.

24.

Herington

McCradden

Creel

, et al. Ethical considerations for artificial intelligence in medical imaging: deployment and governance. J Nucl Med 2023; 64: 1509–1515.

25.

Najjar

Redefining radiology: a review of artificial intelligence integration in medical imaging. Diagnostics 2023; 13: 2760.