Abstract
Objective
To evaluate the diagnostic accuracy of a multi-disease offline artificial intelligence system (Medios-AI, MAI), integrated into a smartphone-based fundus camera, for simultaneous screening of diabetic retinopathy (DR), glaucoma, and age-related macular degeneration (AMD) in a real-world setting.
Methods
In this prospective cross-sectional study, 193 adults (371 eyes) aged ≥18 years with DR, glaucoma, AMD, or normal fundus were enrolled between May and December 2024. Dilated fundus imaging was performed using the Remidio Fundus on Phone (FoP) and Zeiss Clarus 500 cameras. Ungradable images were excluded. The offline MAI algorithm generated disease-specific reports, which were compared to masked grading of Clarus images by two fellowship-trained ophthalmologists. In ambiguous cases, the AI report defaulted to “either DR or AMD.”
Results
MAI achieved sensitivity of 99.3% (95% CI: 96–100), specificity of 95.7% (95% CI: 92–98), and AUROC of 0.99 for detecting any retinal disease. For glaucoma (n = 109), sensitivity was 98.2% (95% CI: 94–100), specificity 99.0% (95% CI: 97–100), AUROC 0.99. For AMD (n = 56), sensitivity was 88.9% (95% CI: 77–96), specificity 97.5% (95% CI: 95–99), AUROC 0.93. For DR (n = 78), sensitivity was 84.6% (95% CI: 75–92), specificity 99.0% (95% CI: 97–100), AUROC 0.92. Agreement on vertical cup-to-disc ratio between AI and graders ranged from −0.1 to +0.1, with intergrader ICC of 0.97 (P < 0.001 for all comparisons).
Conclusions
MAI demonstrated significant diagnostic accuracy for DR, glaucoma, and AMD using an offline, smartphone-based platform, supporting scalable, point-of-care retinal screening in resource-limited settings.
Keywords
Introduction
The integration of artificial intelligence (AI) in ophthalmology has revolutionized the approach to diagnosing and managing retinal diseases, particularly diabetic retinopathy (DR), glaucoma, and age-related macular degeneration (AMD).1–3 The Medios-AI (Remidio, India) is an offline, smartphone-integrated artificial intelligence tool designed to detect DR, AMD, and glaucoma from fundus photographs. It runs on the Remidio Fundus on Phone (FOP) device without internet connectivity, providing rapid results (<15 s) and detailed annotated reports, making it particularly suited for remote and resource-limited settings. The Medios AI software has previously demonstrated its efficacy in automating the detection of these conditions through separate models tailored for each pathology.4–8 However, the advent of the Multidisease AI, which consolidates the detection capabilities for all three diseases into a single module, represents a significant advancement in the field.
This integrated approach not only allows for the simultaneous evaluation of multiple ocular conditions but also enhances diagnostic efficiency, thereby streamlining the workflow for healthcare providers. Despite the potentially good results achieved with the individual Medios AI models, the Multidisease AI has yet to be evaluated in a clinical setting. Mounted on the Remidio Fundus on Phone (FOP) camera, this tool is designed for offline, real-time disease detection, making it particularly valuable in resource-limited environments where access to specialized eye care may be restricted.
The current study aims to assess the performance of the Multidisease AI, providing diagnostic capabilities and potential impact on the early detection DR, glaucoma and AMD. By addressing the limitations of previous models that focused solely on individual diseases, this study seeks to contribute to the growing body of evidence supporting the use of AI in comprehensive ocular health assessments.
Methods
This cross-sectional study was conducted at a tertiary eye care center in western India. The study was approved by the institutional review board, and informed consent was obtained from all participants. The study adhered to the principles of the Declaration of Helsinki.
Participants
Consecutive consenting eligible patients aged ≥18 years were prospectively recruited between May 2024 and December 2024. Participants with an established diagnosis of DR, glaucoma, AMD, or normal fundus (controls), based on prior clinical evaluation, were included. Patients presenting for their first visit were excluded to avoid inclusion of ambiguous or incompletely characterised cases. Enrolment continued until a minimum of 50 eyes had been obtained in each of the four predefined categories (normal, DR, AMD, and glaucoma). Because recruitment continued until the smallest subgroup reached this threshold, some categories exceeded the minimum target (Figure 1).

Study flow diagram showing patient enrollment and eye-level distribution. Consecutive eligible patients with confirmed diagnoses were prospectively recruited, and enrollment continued until a minimum of 50 eyes was achieved in each predefined category. Unequal group sizes reflect this recruitment strategy.
All included participants underwent mydriatic retinal photography using both the Remidio FOP smartphone-based fundus camera (Remidio Innovative Solutions Pvt Ltd, Bangalore, India) and the Zeiss Clarus desktop fundus camera (Carl Zeiss Meditech, Germany). Exclusion criteria included media opacities affecting image capture, previous retinal surgeries, active infections, optic disc abnormalities or other ocular comorbidities that could interfere with image interpretation. Additionally, images not meeting quality thresholds were automatically flagged by the MAI software during acquisition in real-time and were not saved to the device; accordingly, no record of such acquisition attempts was maintained.
Baseline examination
Demographics, best-corrected visual acuity (BCVA), Intraocular pressure (IOP), and Medical and ophthalmic histories were recorded for all participants. A comprehensive dilated eye examination was performed, including slit-lamp biomicroscopy and indirect ophthalmoscopy using a + 90D and +20D lens by fellowship-trained retina (AKK) and glaucoma specialists (PB).
Classifications
Diabetic retinopathy was classified according to the International Clinical Diabetic Retinopathy (ICDR) classification, 9 AMD was classified according to the Age-Related Eye Disease Study (AREDS) classification, 10 and glaucoma was diagnosed based on optic disc assessment.
Fundus imaging
After clinical examination, all fundus images were acquired using the Remidio FoP device and the Zeiss Clarus 500 desktop fundus camera by imaging technicians at our institute. The Remidio Fundus on Phone (FoP) device was integrated with Medios AI on an iPhone 13 platform. The smartphone was used exclusively for the study and was adapted by the manufacturer for retinal imaging, possibly including enhancements such as infrared guidance. The device guided users in real-time to acquire centered, gradable images, thereby standardizing image acquisition for both AI and human interpretation. Images not meeting quality thresholds were automatically flagged by the software during acquisition and were not saved to the device. Once acquired, images are locally stored on the FoP app on the iPhone, and the AI generated an anotated diagnostic report in under 15 s. Although cloud storage and remote syncing options were available via a HIPAA-compliant Amazon server for teleophthalmology use cases, these features were not used in the current study.
Fundus images obtained using the FOP and the Clarus were independently assessed for image quality by one ophthalmologist (YG) and one retina specialist (MRB), both masked to the participant's clinical findings and imaging modality. Based on predefined criteria, images were categorized as: ‘excellent’ (clear visibility of third-order vessels, including capillaries), ‘acceptable’ (visibility of first- and second-order vessels, typically the major arcade vessels), and ‘ungradable’ (no vessels visible).” 11 For comparative analysis, Clarus 500 images were cropped to match the field of view of the FoP. Each image, irrespective of acquisition modality, was independently reviewed by two fellowship-trained retina specialists (SS, ASK) and two glaucoma specialists (MK, HJ). Images were presented in a randomized order to prevent paired-eye bias, and graders were blinded to both the image source and clinical details. However, for the purpose of defining the reference standard diagnosis, the final disease classification was based on consensus derived from Clarus 500 images, which are of higher resolution and served as the diagnostic benchmark.
Grading and report format
The output from Medios AI included disease-specific reports for DR, glaucoma, and AMD. For DR, the AI reported whether diabetic changes were present and whether the case was referrable or not (i.e., moderate NPDR or worse and/or macular edema). For glaucoma, the AI provided a vertical cup-to-disc ratio (VCDR) along with a referral recommendation. For AMD, the presence of AMD-related signs was noted, along with a referrability status. In cases of uncertainty between DR and AMD, the AI defaulted to the more probable diagnosis based on its internal probability scores and applied the corresponding disease-specific referral criteria.
Human grading classifications were defined as follows: DR was graded as no DR, mild/moderate/severe non-proliferative diabetic retinopathy (NPDR), or proliferative diabetic retinopathy (PDR); glaucoma was assessed using VCDR and referral status, and AMD was recorded as present or absent based on characteristics of fundus findings.
Outcome measures
The primary outcome measures included sensitivity, specificity, inter-observer agreement, and the area under the receiver operating characteristic curve (AUROC) for overall and individual disease detection using the AI-based FOP camera in comparison to the Zeiss Clarus 500 desktop camera with human grading.
Sample size estimation
To determine the appropriate sample size for evaluating the sensitivity and specificity of the MAI for detecting retinal disease compared with human graders, we performed a binomial confidence interval analysis based on the expected discordance rate between the AI and the human reference standard. Discordance was defined as the proportion of images for which the AI and human grader classifications differed.
We considered a range of plausible true discordance rates (10%, 20%, 30%, 40%, and 50%) and calculated the corresponding 95% confidence intervals (CIs) for various sample sizes (n = 50100, 150, 200, 250, 300). The calculations were based on the binomial distribution, using the formula:
Based on these calculations, a sample size of 200 images was selected, as it provided a 95% CI width of approximately ±4–6% for true discordance rates between 10% and 30%. This level of precision was deemed sufficient to characterize the agreement between MAI and human graders. To ensure balanced representation, the 200 images were evenly distributed across four disease categories (a minimum of 50 images per category).
Statistical analysis
Sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) were calculated to assess the diagnostic performance of Medios AI for overall and individual disease detection (AMD, DR, and glaucoma). For disease-specific analyses, each pathology group was compared against normal eyes, which served as the reference standard for true negatives. All eyes included in the DR group had a confirmed diagnosis of diabetic retinopathy, and no eyes from non-diabetic individuals were included in this group.
Intergrader agreement between the two fellowship-trained graders for categorical disease diagnosis was assessed using Cohen's kappa statistic, with equal weighting for all categories. In the event of disagreement between the two graders, adjudication was performed through mutual discussion. If consensus could not be reached, the more severe of the two grades was considered for analysis. This approach was applied to 7 eyes (1.9% of the dataset), in which the disagreement was restricted to severity level (e.g., mild vs moderate NPDR) rather than disease presence. Intra-grader agreement was similarly evaluated by comparing the same grader's diagnosis across FoP and Clarus 500 images. The agreement between Medios AI and human graders (for parameters such as cup-to-disc ratio), was evaluated using intraclass correlation coefficients (ICC) and Bland-Altman plots.
Post-hoc power analysis was performed for the AMD cohort to assess whether the achieved sample size provided sufficient statistical power (defined as >0.80) to support conclusions regarding AMD detection accuracy. The kappa statistic was calculated using the irr package in R, and power was calculated using the pwr package in R, based on the observed sensitivity and sample size. All remaining statistical analyses were conducted using STATA 12.1 I/c (Stata Corp, Fort Worth, Texas), and a significance level of 0.05 was considered statistically significant throughout.
This study was designed, conducted, and reported in accordance with the Guidelines on Clinical Research Evaluation of Artificial Intelligence in Ophthalmology, 12 including transparent reporting of reference standards, performance metrics, and real-world clinical applicability
Results
A total of 193 patients (371 eyes) were included in the final analysis. The final eye-level distribution was 140 normal eyes, 78 eyes with DR, 56 eyes with MD, and 109 eyes with glaucoma. Unequal subgroup sizes reflect the prospective recruitment strategy, whereby enrollment continued until each disease category reached a minimum target of 50 eyes. The mean age of participants was 61.4 ± 10.9 years, and 56% were men.
For overall disease detection, defined as the detection of any of the three included retinal pathologies, (Figure 2), the MAI achieved a sensitivity of 99.3% (95% CI: 96–100) and a specificity of 95.7% (95% CI: 92–98), with an area under the receiver operating characteristic curve (AUROC) of 0.99 (Table 1). The intergrader agreement between grader 1 and grader 2 for disease diagnosis was excellent, with a Cohen's kappa of 0.98 (p < 0.001), indicating almost perfect agreement. Disagreements were noted in only 7/371 eyes (1.9%), all of which involved differences in severity level (e.g., mild vs moderate NPDR or intermediate vs advanced AMD), rather than disagreement about disease presence or absence. Similarly, intra-grader reliability between diagnoses made from FoP images and Clarus 500 images was excellent, with a Cohen's kappa of 0.98 (p < 0.001) for both graders. Again, the discrepancies (n = 5) between the FoP and Clarus 500 images for both graders were due to differences in the grading of diabetic retinopathy severity, especially between mild and moderate NPDR, rather than misclassification of disease presence.

(A–C) Mild non-proliferative diabetic retinopathy (NPDR) as seen on the Clarus fundus image (A), Remidio Fundus on Phone (FoP) image (B), and the Medios-AI user interface (UI) detecting DR (C). (D–F) Proliferative diabetic retinopathy (PDR) with neovascularization of the disc (NVD) and elsewhere (NVE) visualized on the Clarus image (D), FoP image (E), and correctly identified by Medios-AI (F). (G–I) Drusen in a case of age-related macular degeneration (AMD) on the Clarus image (G), FoP image (H), and Medios-AI UI showing detection of AMD (I).
Sensitivity and specificity of disease detection by the Medios multi-disease AI.
DR = Diabetic Retinopathy, AMD = Age-related Macular Degeneration.
In the detection of DR (figure 2), the sensitivity was 84.6% (95% CI: 75–92) and specificity was 99% (95% CI: 97–100), with an AUROC of 0.92 (n = 78) (Figure 2 A-F). For referrable DR (n = 44, Figure 2D-F), the sensitivity was higher at 95% (95% CI: 85–99%) with the same specificity at 99% (95% CI: 91–99%). For AMD (Figure 2 G-I), the sensitivity was 88.9% (95% CI: 77–96), specificity was 97.5% (95% CI: 95–99), and the AUROC was 0.93 (n = 56). The post-hoc power analysis for the AMD cohort demonstrated a power of 0.87, indicating that the sample size was sufficient to support the observed sensitivity for AMD detection.
The highest performance was observed in glaucoma detection (Figure 3 A-C), where the sensitivity reached 98.2% (95% CI: 94–100) and specificity was 99% (95% CI: 97–100), yielding an AUROC of 0.99 (n = 109). The MAI achieved its highest accuracy in identifying glaucomatous optic neuropathy, followed by AMD and DR (Table 1). The mean (SD) and median (IQR) cup-to-disc ratios (CDRs) in glaucoma eyes were 0.754 (0.103) and 0.750 (0.125) for Grader 1, 0.752 (0.110) and 0.750 (0.125) for Grader 2, and 0.737 (0.096) and 0.730 (0.115) for MAI, respectively. In contrast, normal eyes showed substantially lower CDR values: 0.349 (0.104) and 0.300 (0.100) for Grader 1, 0.358 (0.116) and 0.300 (0.100) for Grader 2, and 0.419 (0.105) and 0.410 (0.150) for the AI. The agreement on the cup-to-disc (CD) ratio between human graders and MAI ranged from −0.1 to +0.1 (figure 4), with excellent intergrader agreement between the two graders (ICC = 0.97).

(A–C) Glaucoma case correctly identified: Clarus fundus image (A), Remidio Fundus on Phone (FoP) image (B), and Medios-AI user interface (UI) showing glaucoma detection (C). (D–F) Case of high myopia with peripapillary atrophy shown on Clarus image (D), FoP image (E), and misclassified as AMD by Medios-AI (F). (G–I) Severe non-proliferative diabetic retinopathy (NPDR) with hard exudates seen on Clarus image (G), FoP image (H), and labeled as “AMD or DR” by Medios-AI (I).

Bland-Altman plot showing agreement in vertical cup-to-disc ratio (VCDR) between human graders and Medios-AI.
Most missed cases of DR were in the mild NPDR category, although two cases of severe NPDR were misclassified as AMD. A confusion matrix illustrating diagnostic discrepancies between human grading and AI output at the disease level is provided in Supplementary Table 1. A qualitative review of one case (figure 3 G-I) revealed that the retinal appearance was dominated by large, confluent hard exudates with fewer microaneurysms and hemorrhages, which may have contributed to the misclassification as AMD by the AI system.
Discussion
This study evaluated the performance of a Multi-disease Medios-AI algorithm integrated with the Remidio FoP for the screening of AMD, DR, and glaucoma in a real-world clinical setting. The AI demonstrated excellent overall accuracy with a sensitivity of 99.3% and specificity of 95.7% for detecting any retinal disease. Among individual pathologies, it performed best in glaucoma detection (AUROC 0.99), followed by AMD (AUROC 0.93) and DR (AUROC 0.92). While most missed DR cases were mild NPDR, one case of severe NPDR was incorrectly labelled as AMD, but not missed entirely. The agreement on cup-to-disc ratio between AI and human graders ranged from −0.1 to +0.1, highlighting the reliability of the system.
A key advantage of the MAI system lies in its ability to detect multiple diseases from a single retinal image while functioning entirely offline. When mounted on the Remidio FoP, which provides high-resolution fundus images via a user-friendly, portable smartphone interface, the integration becomes seamless. This capability is particularly valuable in remote or resource-limited settings where internet access and trained ophthalmologists may not be available. A recent large-scale study by Dong et al. validated the RAIDS deep learning system, which detects ten retinal conditions from fundus images in a centralized, cloud-based setting. 13 While both RAIDS and MAI enable multimodal disease detection, a key distinction lies in MAI's ability to operate in real time, completely offline, and be embedded within the same smartphone that captures the image, streamlining the workflow and enhancing accessibility. Given that AMD, DR, and glaucoma together account for over 50% of preventable blindness worldwide, a reliable AI system that can screen for all three conditions without relying on internet infrastructure could transform community eye health screening and referral networks.
To further contextualize MAI's performance, Table 2 summarizes key features of different AI systems. Unlike single-disease models like IDx-DR and EyeArt that are optimized exclusively for DR detection, Medios balances performance across three disease pathways using a single image per eye. Despite this broader task complexity, Medios AI achieved 95% sensitivity for referable DR, comparable to established single-disease systems, demonstrating its robustness even under higher diagnostic cognitive load.
Comparison of commonly used AI models for the detection of various retinal diseases in comparison with the multi-disease Medios AI.
DR = Diabetic Retinopathy, AMD = Age-related Macular Degeneration, FoP = Fundis on Phon.
Compared to its performance as a standalone DR detection system, the multi-disease MAI exhibited slightly lower sensitivity and specificity for DR in this study. Previous evaluations of Medios AI for DR alone have shown sensitivities ranging from 93% to 98.8% and specificities from 86.7% to 95.5%.4,14,15 In our study, the sensitivity was 84.6%, with a specificity of 99%. This drop may be attributed to the broader scope of the MAI, which now simultaneously processes multiple disease signatures, potentially introducing subtle trade-offs in disease-specific optimization. Nonetheless, the specificity remains high, and the real-world performance in detecting moderate-to-severe NPDR and PDR remains acceptable, although further refinement is needed for detecting early-stage DR.
In contrast, the MAI's performance for glaucoma detection remained robust and comparable to previously validated standalone models. The high sensitivity and specificity observed (98.2% and 99%, respectively) align closely with results reported by Shroff et al. 6 and Upadhayay et al., 5 where the glaucoma-specific Medios-AI demonstrated AUROCs above 0.95. Importantly, the MAI maintained strong agreement in CDR estimation, with intergrader ICC of 0.97 and minimal variance compared to human grading. This consistency reinforces its clinical utility for identifying glaucomatous optic neuropathy, particularly in teleophthalmology settings where remote and accurate evaluation of the optic nerve head is critical.
The results for AMD detection also showed good alignment with prior studies involving AI-based AMD models.7,14 Sensitivity and specificity in our study were 88.9% and 97.5%, respectively, which is comparable to findings reported by previous studies. 16 While there are fewer studies on standalone AI models for AMD compared to DR and glaucoma, those available confirm the utility of AI in detecting early AMD signs. The MAI offers the added advantage of integrating AMD screening into a broader framework, facilitating efficient population-level screening without sacrificing diagnostic performance. However, variability in fundus presentation and fewer ground-truth datasets for AMD may continue to pose challenges in improving sensitivity further.
The primary merits of this study include its prospective, real-world design; use of both human expert grading and machine interpretation; and application of rigorous diagnostic metrics such as AUROC and intergrader agreement. The use of both the Remidio FoP and Zeiss Clarus systems allowed direct benchmarking of AI outputs against a gold-standard imaging modality. However, the study is not without limitations. Sample sizes for individual disease groups, particularly AMD, were relatively small. Additionally, image quality from the smartphone-based system, while generally high, can vary based on operator skill and ambient conditions. The offline nature of MAI, while advantageous for scalability, also means real-time cloud-based updates and continuous learning are currently limited. We also acknowledge that image exclusion rates could not be computed, as the Medios AI-integrated FoP system filters ungradable images during acquisition and does not save them. While this mimics real-world use where the AI guides acquisition and screening, it limits our ability to report precise failure rates or compare quality-related exclusions across disease subgroups. Lastly, another key limitation of this study is its single-center design, which may introduce selection bias; future studies involving multi-center datasets and diverse populations are warranted to validate and generalize these findings. In particular, this quaternary referral setting has a high disease prevalence, which likely leads to an inflated Positive Predictive Value (PPV) compared to what would be observed in general population or primary care settings. As PPV is directly influenced by disease prevalence, real-world community screening studies are essential to accurately estimate the system's performance in lower-prevalence contexts.
Future research should focus on expanding the MAI's training datasets, particularly for underrepresented disease stages like mild NPDR and early AMD, to improve sensitivity. There is also scope to incorporate multimodal data, such as OCT or autofluorescence, in future versions of the AI to enhance diagnostic granularity. Additionally, cost-effectiveness studies and deployment trials in rural or low-resource areas will be valuable in validating the MAI's scalability and real-world impact. Integration with mobile health applications and screening workflows can further augment its utility for primary and secondary prevention programs.
In conclusion, the multi-disease Medios-AI, when integrated with a smartphone-based fundus imaging platform, demonstrates encouraging diagnostic performance for the screening of AMD, DR, and glaucoma. Its ability to function offline, accurately detect multiple diseases, and maintain high agreement with human graders makes it a promising tool for expanding access to retinal screening in underserved regions. Continued refinement, especially in DR detection, and further validation in diverse settings are warranted to fully realize its potential in combating avoidable blindness.
Supplemental Material
sj-docx-1-ejo-10.1177_11206721261462321 - Supplemental material for Smartphone-based offline AI for multi-disease retinal screening: Real-world accuracy
Supplemental material, sj-docx-1-ejo-10.1177_11206721261462321 for Smartphone-based offline AI for multi-disease retinal screening: Real-world accuracy by Aditya Kelkar, Jai Kelkar, Yash Garg, Harsh H. Jain and Sabyasachi Sengupta in European Journal of Ophthalmology
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Conflict of interest
No authors declare any conflicts of interest with this manuscript.
Sources of support
The Remidio Fundus on Phone (FoP) device integrated with the Medios-AI was provided by the manufacturer (Remidio Innovative Solutions Pvt Ltd, Bangalore, India) for study purposes. The company had no role in the study design, data collection, analysis, interpretation, or preparation of the manuscript.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
