Abstract
Objective
This systematic review aims to evaluate current digital twin (DT) applications in healthcare, explore their technological foundations, and propose a roadmap for scalable, patient-centered implementation.
Methods
Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 guidelines, a systematic search was conducted across Medline, Scopus, Web of Science, and EBSCO up to May 2025. Eligible studies included peer-reviewed research on DT applications in clinical or healthcare settings involving human or patient-related data. Methodological quality was assessed using appropriate Joanna Briggs Institute critical appraisal tools based on study design. The systematic review protocol was prospectively registered in Prospective Register of Systematic Reviews (registration number: CRD420251120304).
Results
26 studies were included, with most published between 2023 and 2025. DT applications spanned diagnostics, therapy optimization, physiological monitoring, and system-level modeling. Simulation-based designs dominated, often integrating artificial intelligence, internet of things, and machine learning. While several studies reported strong technical performance (e.g. up to 96.3% accuracy), real-world clinical integration was rare. Notable outcomes included better glycemic control, pain management, and disease progression prediction. Barriers included insufficient infrastructure detail, limited validation, and equity concerns. The roadmap highlights three enablers: privacy-preserving, validation pipelines, and interoperability.
Conclusion
DTs offer transformative potential for predictive, personalized, and participatory healthcare. Realizing clinical impact requires bridging the translational gap and scaling personalization. This review outlines key strategies for interdisciplinary innovation and deployment of DTs in healthcare.
Introduction
The ongoing digital transformation of healthcare is revolutionizing diagnostic processes, therapeutic strategies, and patient monitoring by enabling real-time data integration and analysis.1,2
Among the emerging technologies driving this shift, digital twins (DTs) have gained significant attention for their potential to enhance personalization, prediction, and precision in medical care.
DTs technology is an innovative concept that involves the creation of precise virtual replicas of physical systems, processes, or entities. 3 Originally developed in the aerospace and manufacturing sectors, DTs are designed to mirror the behavior and state of their real-world counterparts in real-time. 4 This synchronization is enabled through continuous data streams collected from sensors, which are then processed using advanced computational models. 5 The primary applications of DTs are diverse. On one hand, they can support decision-making, optimize processes, improve system performance, and predict maintenance needs before real-world deployment. On the other hand, they offer a framework for continuous monitoring and adaptation, enabling systems to evolve based on real-time data. 1
DTs integrate different technologies, including the internet of things (IoT), artificial intelligence (AI), machine learning (ML), and advanced data analytics, to create dynamic, data-driven models that reflect the ongoing state of their physical counterparts. 6 These models are continuously updated through sensor data and historical performance records, allowing for real-time analysis. 5 The combination of real-time data processing and predictive analytics enables DTs to optimize performance, reduce downtime, and enhance reliability across a wide range of industrial applications. 7
In healthcare, a DT is a dynamic virtual representation of a physical system, such as the human body, continuously updated with real-world data from sensors.8–11 When coupled with AI and ML, DTs become powerful tools for simulating patient-specific scenarios. It enables data-driven experimentation and optimize treatment strategies without putting patients at risk through simulations. 12
In recent years, the adoption of DTs in healthcare has accelerated, driven by the need for more personalized and proactive approaches to complex medical conditions. DTs enable high-fidelity modeling of individual patients by integrating multimodal data sources. This facilitates not only personalized diagnostics and treatment planning, but also predictive modeling that supports early intervention. 12
Their applications span various domains, including chronic disease management,13–15 surgical planning,16–18 and rehabilitation.19,20 In these contexts, DTs allow for virtual testing of therapies and the optimization of recovery trajectories. Furthermore, they promote integrated clinical management by enhancing the efficiency with which information is shared among specialists. This fosters a coordinated, multidisciplinary approach to caring for patients with chronic conditions. The implementation of DTs thus paves the way for a healthcare model that is increasingly predictive, preventive, personalized, and participatory. 21
Despite their promise, the integration of DTs into clinical practice presents substantial challenges related to data integration and model fidelity. 6 Nevertheless, their transformative potential is undeniable: DTs pave the way toward an era of healthcare that is not only reactive but also proactively tailored and continuously adaptive.12,22,23 For these reasons, some recent reviews have summarized the expanding landscape of digital twin applications in healthcare, including a broad scoping review by Katsoulakis et al. 24 and a meta-review by Ringeval et al. 25 Katsoulakis et al. 24 provided an extensive overview of DT use cases across clinical care, biomedical research, and health-system operations, but, as a scoping review, their work does not appraise the quality of primary studies nor distinguish between conceptual proposals, technical prototypes, and applied patient-level implementations. Similarly, Ringeval et al. 25 synthesize existing narrative and systematic reviews, mapping application categories and implementation challenges, but they focus on reviewing reviews rather than evaluating individual DT studies. In contrast, the present review examines only primary empirical studies and applies a structured quality assessment framework, enabling an appraisal of methodological robustness and the maturity of reported applications, an aspect not addressed in prior reviews. For these reasons, the aim of this systematic review is to assess current DT applications in healthcare and in personalized health management, to identify potential future directions for DT to support their implementation in clinical context.
Materials and methods
Study design
This study was conducted as a systematic review following the guidelines outlined in the Cochrane Handbook for Systematic Reviews of Interventions. 26 The reporting of findings adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement 27 (Table S1). The Population-Situation framework, represented in Table 1, was used to define the review scope and guide the search strategy. 28
Population-situation (PS) elements table.
Protocol registration
The systematic review protocol was prospectively registered in the International Prospective Register of Systematic Reviews under the registration number CRD420251120304. 29
Search strategy
A comprehensive search was performed in four major databases: Medline, Scopus, Web of Science, and EBSCO. The research included all records until May 2025 (Table S2). Keywords and controlled vocabulary (e.g. MeSH terms) were combined using Boolean operators (AND, OR) and tailored to each database's indexing system. The search terms included combinations of: “Digital Twin,” “Virtual Twin,” Healthcare,” “Predictive Analytics,” and their synonyms. No restrictions were applied regarding study design, publication date, or geographic origin.
Study selection
All retrieved records were imported into Rayyan® for systematic review management. 30 Two independent reviewers (PP, LG) conducted a two-step screening process: title and abstract review followed by full-text screening. Discrepancies during screening were resolved through discussion or consultation with a third reviewer (VR). Studies were included if they met the following criteria: a) they involved the application of DT technology within healthcare settings; b) they were original peer-reviewed research articles, including qualitative, quantitative, or mixed-methods studies; and c) they involved human subjects or incorporated patient-related data. Studies were excluded if: a) they focused on non-healthcare domains such as manufacturing or aerospace; b) they consisted of reviews, editorials, opinion papers, or conference abstracts without original data; and c) they addressed purely theoretical aspects of DT technology without any practical or clinical implementation.
Data extraction
Data extraction was carried out independently by two reviewers (AG, VR) using a standardized and piloted extraction form to ensure consistency and accuracy. For each eligible study, key information was systematically recorded, including: a) authors and year of publication; b) country of origin and the specific healthcare setting; c) study's objectives and methodological design; d) clinical or healthcare domain under investigation; e) detailed description of the DT model, including data sources, system architecture, and integrated technologies; and f) reported outcomes, categorized as clinical, operational, or technical. This process aimed to capture both the contextual relevance and the functional implementation of DT technologies across healthcare applications.
Risk of bias and assessment
To assess methodological rigor, each study was evaluated using critical appraisal tools appropriate to its design, following the Joanna Briggs Institute (JBI) framework.31,32 Specifically:
Analytical Cross-Sectional Studies were assessed using a structured checklist that evaluates criteria such as sampling strategy, measurement validity, confounding control, and reliability (Figure 1). Quasi-Experimental Studies were appraised for elements including clarity of cause–effect relationships, the presence of control groups or comparable participants, and the appropriateness of statistical analysis (Figure 2). Text and Opinion Papers were evaluated based on the credibility of the source, logical coherence of arguments, reference to relevant literature, and the relevance of perspectives to the target population (Figure 3).

Checklist results for textual evidence and opinion papers. It illustrates the appraisal of expert opinion and conceptual papers based on JBI textual evidence criteria.

Checklist results for analytical cross-sectional studies. It summarizes the methodological appraisal of cross-sectional studies using the JBI checklist.

JBI checklist results for quasi-experimental studies. It presents the quality assessment of quasi-experimental studies.
Each checklist consisted of 6 to 10 items, and individual criteria were assessed using the options “Yes,” “No,” “Unclear,” or “Not Applicable.” This structure allowed for a nuanced evaluation of each study's alignment between its theoretical framework, methodological rigor, data collection, analysis, and interpretation. The appraisal also considered potential sources of bias, including selection, measurement, and reporting bias, particularly in empirical and simulation-based designs.
The final quality rating of each study was based on the number of criteria marked as “Yes.” All the studies were rated six or more so they were retained for data extraction and synthesis.
To ensure reliability and minimize reviewer bias, all appraisals were conducted independently by two reviewers (LG and PP). Discrepancies were resolved through discussion, and when consensus could not be reached, a third reviewer (AG) was consulted to adjudicate.
Results
Search results
The study selection process is illustrated in the PRISMA flow diagram (Figure 4). The initial database search identified 1420 records. After removing duplicates (520 duplicates, 36.62%), a total of 890 unique records remained for title and abstract screening. Following the application of predefined inclusion and exclusion criteria, which focused on clinical relevance, incorporation of DT methodologies, and sufficient methodological reporting, 840 studies were excluded during the screening phase. Reasons for exclusion included lack of focus on clinical application (e.g. purely engineering or systems design papers). Subsequently, 50 full-text articles were assessed for eligibility. After further selection, 32 studies were excluded, most commonly due to missing clinical context. Finally, 18 studies were included in the review from databases. In addition to structured database searches, eight supplementary studies were identified through manual searches of reference lists. In conclusion, 26 studies were in the final review.

PRISMA flow diagram of study selection process. PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Study characteristics
A total of 26 studies were included in this systematic review, published between 2019 and 2025, reflecting a marked increase in DT research in healthcare, especially in the last three years (2023–2025, accounting for over half the studies). This temporal clustering suggests growing momentum in both the conceptualization and implementation of DT technologies in clinical and preclinical settings.
In terms of study design, most (11/26, 42.3%) employed simulation-based modeling approaches, often incorporating AI, ML, or mechanistic modeling to simulate disease trajectories or treatment responses. Conceptual or prototype development studies were 6/26 (23.1%), including software framework design, virtual twin creation, or early-stage feasibility assessments. Additionally, retrospective or observational studies using real patient data were observed in 4/26 (15.4%), while randomized controlled trials (RCTs) and case reports made up 3/26 (11.5%) and 2/26 (7.7%), respectively. The sample sizes differed dramatically, from single-case interventions to massive virtual populations exceeding 5000 simulated patients, making generalizability and clinical relevance uneven across studies. Human populations studied were predominantly adults or elderly patients, often with chronic or complex conditions such as type 2 diabetes, multiple sclerosis, cardiovascular disease, cancer, or post-operative complications. Several studies also used entirely virtual patients, particularly in pharmacological modeling or surgical planning. DT applications targeted a wide range of clinical objectives, including individualized therapy planning, physiological monitoring, predictive modeling, drug-response simulation, and radiotherapy optimization. While some interventions featured real-time feedback via IoT-enabled devices or mobile apps, many studies remained proof-of-concept with limited or no direct patient interaction. Reported outcomes were equally heterogeneous. Several studies focused exclusively on performance metrics (e.g. accuracy, F1-score, recall), especially in simulation or AI-based diagnostics. Others reported clinical outcomes such as HbA1c reduction, tumor shrinkage, pain levels, or brain atrophy onset, albeit often within small or simulated cohorts. Full details of study characteristics are presented in Table 2. Figure 5 presents the distribution of included studies by macro area and study design, visually summarizing the diversity and methodological breadth of the evidence base.
Summary characteristics of included studies.
DT: digital twin; IoT: internet of thing; PCOS: polycystic ovary syndrome.
Risk of bias results
For studies classified under textual evidence and opinion (n = 10) and under analytical cross-sectional studies (n = 3), JBI scores were all 6 out of 6 and 8 out of 8 respectively, reflecting high quality of the results from these studies. In quasi-experimental studies (n = 13), scores ranged from 7 to 9 out of 9, indicating strong methodological rigor, particularly in the description of interventions, measurement reliability, and outcome assessment. When analyzed by application domain (Table S3), the mean quality scores were consistently high across all areas, ranging from 77.8% to 100%. The highest scores were observed in cardiology, oncology, and chronic disease monitoring, where studies commonly employed simulation or prototype validation designs. Slightly lower, but still robust, scores were reported in neurology, endocrinology, and surgical applications.
Clinical applications of digital twin technology
The 26 included studies demonstrate a broad range of clinical applications for DT technologies (Figure 6). When examined collectively, four overarching areas of application emerged: diagnostic accuracy, treatment personalization, continuous patient monitoring, and system-level planning. The strength of evidence varied notably across these domains, with most contributions remaining early-stage or simulation-based.
Diagnostic accuracy
Nine of the reports included used DTs to support diagnostic tasks,34,35,38,43–45,55,57 generally by integrating multimodal physiological or imaging data into individualized computational models. These DTs were used as augmentative tools that improved signal interpretation, early risk stratification, or anomaly detection rather than replacing established diagnostic workflows. A recurring pattern was that diagnostic gains were strongest in models incorporating high-dimensional and biologically meaningful data inputs. Conversely, DTs built on single-source or low-resolution data produced more variable improvements, and several prototypes lacked external validation, underscoring the early stage of clinical readiness in this area.
Treatment personalization
Treatment personalization represented the most clinically oriented domain of DT application and showed some of the most promising signals for real-world impact.40,41,48,49,51–54,56,57 variety of models leveraged individualized physiological, pharmacokinetic, or behavioral data to simulate treatment responses and identify optimal therapeutic strategies. Across contexts DT-enabled personalization generally outperformed standard protocols in simulated settings. Real-world studies were less common, but when conducted, they consistently showed clinically meaningful improvements. Notably, DT-driven nutritional interventions and insulin-dosing support improved metabolic outcomes in patients with type 2 diabetes, and virtual drug-response modeling reduced variability in pain control among simulated oncology populations. A cross-study pattern was that DTs incorporating real-time or longitudinal data updates produced the most substantial performance gains, highlighting the importance of dynamic adaptation for personalized therapy. Nonetheless, most models remained confined to simulation or retrospective testing, and only a minority were evaluated through controlled clinical trials. Future studies should focus on testing efficacy of DT-interventions in RCTs.
Continuous patient monitoring
Monitoring-focused studies represented a smaller but distinct application area.37,39,41,46,50 These studies deployed IoT devices, wearables, or integrated imaging systems to generate continuously updated representations of physiological status. Across these works, DTs served primarily as real-time data aggregators, linking incoming patient signals to digital models capable of detecting deviations, estimating risk, or triggering alerts. A common pattern was the absence of detailed reporting on data exchange infrastructure limiting interpretability, reproducibility, and clinical scalability. For this reason, although monitoring DTs demonstrated conceptual feasibility, they remain largely exploratory and technically constrained.
System-level planning
A smaller group of studies extended DT use to health-system operations, modeling services, environmental risks, or clinical workflows.33,42,47,48 These applications were largely oriented toward scenario testing, examining, for example, how respiratory infections might propagate in indoor settings, how vaccination center layouts influence throughput, or how telesurgery systems perform under fluctuating network conditions. Evaluation in this area relied on system-level and logistical indicators rather than clinical outcomes, highlighting their focus on operational insight rather than patient care. Although these models provided a safe environment for exploring alternative strategies without disrupting real services, most remained conceptual and were not validated against real-world data, underscoring that system-level DTs are still in a relatively early stage of development within healthcare.
Evaluation metrics and data infrastructure
Among the 26 included studies, 11 (42.3%) reported at least one quantifiable performance metric for evaluating their DT systems. Accuracy values ranged from 84.25% to 98.9%, with precision and F1 scores similarly high in a subset of studies. The most comprehensive metrics were reported by Zhang et al., 30 who achieved 96.3% accuracy, 95.4% precision, and a 96.4% F1 score using an XGBoost model for fetal distress prediction. Other studies, including those by Yuan et al., 44 Qu et al., 40 and Thamotharan et al., 51 reported balanced metric sets across accuracy, recall, and F1 score. However, many studies lacked external validation, and a number relied on internal simulations without real-world deployment, limiting the generalizability of reported results. Only six studies (23%) provided explicit descriptions of the communication devices or data infrastructure linking the physical and digital environments. Some, such as Joshi et al. 41 and Shamanna et al., 43 integrated multiple real-time data sources (e.g. CGMs, wearable trackers, mobile apps), enabling dynamic feedback loops. Most studies, however, referred only vaguely to “IoT” systems or cloud platforms without specifying hardware, protocols, or data flow mechanisms. Furthermore, critical infrastructure aspects such as latency, synchronization, and calibration were largely unaddressed, underscoring the experimental nature of many current DT implementations in healthcare. Performance metrics reported across studies are summarized in Figure 7, which displays a radar chart visualizing the accuracy, precision, recall, and F1 scores aggregated from key evaluations.

Distribution of included studies by macro area and study design. The horizontal stacked bar chart illustrates the number of studies across major healthcare domains (macro areas).

Mapping of intervention types to clinical outcomes across included studies. Bubble sizes represent study sample sizes, and colors indicate the direction of the reported effect (green for positive, orange for unclear).
Digital twin interventions in personalized health management
Five studies (Shamanna et al. 43 Thamotharan et al. 51 Joshi et al. 41 Bahrami et al. 37 Susilo et al. 46 ) analyzed DT applications in personalized health management, encompassing both real-world clinical interventions and individualized virtual modeling. Three studies involved direct application to patient care. A single-patient case report by Shamanna et al. 43 evaluated a DT-guided dietary and lifestyle intervention for polycystic ovary syndrome using wearable sensors and AI-driven personalized recommendations. The DT intervention reported significant improvements in metabolic health and reproductive markers over a 12-month period. A short-term observational study by Thamotharan et al. 51 involving 15 elderly patients with type 2 diabetes demonstrated improved glycemic control through DT-guided insulin dosing; time-in-range increased to 86–97%, with concurrent reductions in hypo- and hyperglycemic episodes. A large-scale RCT by Joshi et al. 41 in 319 adults with type 2 diabetes compared standard care to a DT-enabled lifestyle intervention. The intervention group achieved significant reductions in HbA1c (−2.9% vs. −0.3%), weight (−7.4 kg), liver fat, and insulin resistance (HOMA2-IR), with 72.7% achieving diabetes remission (p < 0.001 for all outcomes). Two additional simulation-based studies37,46 used virtual patient populations to model individualized treatment responses. Bahrami et al. 37 simulated transdermal fentanyl therapy in 3000 virtual cancer patients, showing improved pain control and reduced variability in plasma drug levels by tailoring dosing to physiological features. Susilo et al. 46 applied mechanistic quantitative systems pharmacology modeling to generate multiple DTs per patient (25 each) for non-Hodgkin's lymphoma, enabling prediction of tumor response and identification of predictive biomarkers. Figure 8 summarizes key outcome metrics across these five studies, illustrating the breadth of DT applications.

Radar chart summarizing key performance metrics reported in DT studies, including accuracy, precision, recall, and F1 score. Values represent aggregated results from the subset of studies providing comprehensive evaluation data, illustrating the overall high performance of DT models. DT: digital twin.
Discussion
As the healthcare sector continues its digital transformation, DT are emerging as a promising yet still maturing innovation. 3 This systematic review provides a synthesis of current DT applications, highlighting their growing presence across different clinical settings. While the reviewed studies demonstrated considerable technological advancement and early clinical promise, 59 they also reveal persistent gaps that must be addressed to enable meaningful and scalable adoption. In this discussion, we focus on two central themes derived from the evidence base: (1) the translational gap between simulation-based innovation and real-world clinical integration, and (2) the challenge of achieving personalized care at scale without compromising equity or system performance. These themes frame our analysis of the current landscape and shape recommendations for research, clinical practice, and policy. Compared with previous reviews, our findings reveal several significant differences. Katsoulakis et al. 24 offer a comprehensive overview of DT initiatives in healthcare, mapping eight broad application categories and including studies ranging from conceptual frameworks to industrial prototypes. In contrast, our synthesis is limited to patient-focused DT applications. It demonstrates that within this subset, the evidence falls into four reproducible functional categories: diagnostics, treatment personalization, continuous monitoring, and system-level planning. Ringeval et al. 25 conducted a meta-review of reviews and identified three overarching application areas: personalized medicine, operational efficiency, and research. Our results partially align with these categories, but the distribution of evidence is different: treatment personalization dominates the empirical literature, whereas research-oriented or operational DTs, which were prominent in Ringeval's review of conceptual work, are underrepresented among primary studies with patient-level data. Furthermore, while previous reviews summarize conceptual opportunities and challenges, neither assesses the methodological quality of empirical DT studies. In contrast, our review shows that current DT applications mostly rely on small samples, retrospective designs, and limited validation. This indicates that real-world clinical deployment for DT remains premature.
Bridging the translational gap: from simulation to real-world clinical integration
This systematic review highlights a pronounced translational gap between the promise of DT technologies and their current implementation in clinical environments. While over 40% of the reviewed studies reported high technical performance in simulation-based settings, only a few demonstrated integration into routine clinical workflows or patient-facing applications.41,51
Most models remain in the proof-of-concept or early prototype phase, reflecting ongoing challenges in transitioning DTs from controlled environments to dynamic, data-rich clinical settings. Critical limitations include the lack of real-time infrastructure, inconsistent reporting of IoT or communication devices, and sparse attention to key operational metrics.60,61
Another barrier to translation is the absence of standardized validation protocols, both for technical reproducibility and clinical safety. Without harmonized frameworks for validation there is a risk of overpromising clinical impact without adequate safeguards. Moreover, ethical concerns related to data provenance, algorithmic opacity, and decision accountability remain underexplored in many studies.62,63
To address this translational chasm, future work must prioritize interdisciplinary implementation science, where clinical experts, engineers, and regulatory stakeholders co-develop pilot programs. Particular attention should be given to chronic disease management and rehabilitation settings, where continuous monitoring and personalization are already common, offering a natural entry point for DT-enabled care models. These early implementations can serve as clinical sandboxes, generating evidence for broader regulatory acceptance and healthcare system integration.
Personalization at scale: balancing individualized care with system-level optimization
One of the most compelling contributions of DT technology is its capacity to deliver personalized, predictive, and participatory healthcare.3,4 Several studies in this review demonstrated clinically meaningful improvements in the outcomes evaluated when DTs were applied to tailor interventions to individual patients. Specifically, intervention and longitudinal case studies revealed reductions in in glycemic control and remission of chronic conditions. 64
Simultaneously, DTs have shown promise in macro-level planning and population health modeling, such as optimizing surgical workflows, managing infectious disease risks, or designing vaccination infrastructure.47,65,66 These studies emphasize the dual utility of DTs, not only as individualized care tools but also as systems-level decision aids capable of informing policy.
This duality, however, presents a challenge: how to scale personalization without sacrificing equity, generalizability, or system-level efficiency. 60 Adaptive algorithms must strike a balance between responsiveness to individual variability and robustness across diverse patient populations. Additionally, real-time personalization risks inadvertently reinforcing health disparities if underlying data reflect existing biases or fail to account for underrepresented populations. 67
To ensure both effectiveness and fairness, future DT frameworks should prioritize transparent model governance, continuous performance auditing, and equity-aware training data strategies. Developing modular DT ecosystems, composed of interoperable micro-models for patients, clinicians, and health infrastructure, may allow systems to flexibly integrate personalized inputs without compromising operational cohesion.
Toward a convergent future: a roadmap for integrated digital twin ecosystems
To overcome the translational and personalization challenges in DT innovation, a comprehensive roadmap must bridge the technological, clinical, and ethical dimensions of healthcare. Future DT systems should operate across multiple levels, ranging from organ-level simulations and personalized treatment plans to hospital-level coordination and system-wide policy optimization.
Based on the studies found, we developed a conceptual framework for DT implementation in healthcare (Figure 9). This framework consolidates the recurring themes identified during analysis into a coherent structure that distinguishes between (1) the prerequisites required to operationalize a DT system (Inputs), (2) the DT core itself, and (3) the resulting impact and application domains (Outputs). By reorganizing these elements into “Key Requirements” (Inputs) and “Impact & Application” (Outputs).

Summary of key outcome metrics across five DT studies in personalized health management. Clinical interventions41,43,51 reported improvements in metabolic health outcomes such as weight loss, insulin resistance, glycemic control (time in range, TIR), and diabetes remission rates. Simulation-based studies37,46 highlighted advances in individualized treatment optimization through virtual patient. DT: digital twin; TIR: time-in-range.

A central circle labeled “Digital Twin” connects to six surrounding blocks via dotted arrows. On the left, four colored blocks represent foundational inputs: Privacy preserving, Validation pipelines, Interoperability, Infrastructure and Integration. On the right, two corresponding blocks illustrate outputs: Healthcare applications and Personalization of care. The layout visually links the enabling factors to their impacts within a DT healthcare ecosystem. DT: digital twin.
The framework identifies four Inputs that represent the foundational conditions repeatedly emphasized across the included studies. The first is privacy-preserving mechanisms, which ensure that sensitive patient data remain protected during DT development and deployment. Several studies emphasized that such approaches are essential for enabling continuous learning across institutions without requiring centralized data pooling. The second requirement concerns validation pipelines, reflecting the need for rigorous procedures that integrate simulation-based benchmarks with principles derived from clinical trial methodology. Ensuring model reliability, reproducibility, and safety was a consistent concern across empirical work. A third requirement is interoperability, referring to the need for seamless synchronization between patient-facing systems, provider tools, and healthcare infrastructures. Many studies highlighted limited interoperability as a significant barrier to scaling DT implementations. Finally, infrastructure and integration is positioned as a prerequisite, as robust data systems, reliable connectivity, and real-time integration capabilities form essential technical foundations for DT operation. Together, these four requirements enable the functioning of the DT core system. Once established, this system leads to a range of impacts and applications identified in the reviewed studies. These include the use of DTs in diagnostics, monitoring, surgical guidance, and chronic disease management. The framework also highlights personalization of care and healthcare application as Outputs of DTs, whereby individualized, data-driven modeling enhances treatment planning, predicts health trajectories, and supports precision medicine.
Collectively, these six pillars provide an evidence-informed and operational roadmap to guide the safe, equitable, and clinically meaningful implementation of DTs in healthcare. By investing in scalable yet individualized architectures, the DT paradigm can evolve from fragmented innovation into a unified platform for next-generation healthcare that is predictive, ethical, adaptable, and deeply patient-centered.
Implications for research, practice, and policy
The findings of this review might contain implications for multiple stakeholders in healthcare innovation. For researchers, the diversity of applications and methodologies highlights the need for cross-disciplinary frameworks that integrate clinical objectives with engineering standards. Future studies should explicitly report on data flows, technical constraints, and model generalizability to facilitate reproducibility and meta-analytic synthesis.
For clinicians and healthcare providers, DTs offer a novel means to enhance diagnostic accuracy, treatment precision, prevention, and longitudinal monitoring. However, adoption requires adequate training, workflow integration, and decision support mechanisms. This underscores the importance of developing clinician-centered design principles and involving end-users early in development pipelines. At the policy level, DT integration into healthcare demands regulatory innovation. Standard-setting bodies must address the classification of DTs, whether as software-as-a-medical-device, 68 clinical decision support tools, or infrastructure components. Transparent regulatory pathways will be essential to incentivize investment while safeguarding patient welfare and ensuring ethical standards. Furthermore, robust data governance frameworks and interoperability standards will be critical to foster trust and facilitate widespread adoption. Moreover, equitable access to DT technologies, particularly for low-resource settings, must be prioritized to avoid digital exclusion in personalized healthcare and to promote global health equity.
Limitations
Several limitations emerged during the synthesis of available evidence. First, the heterogeneity of study designs, populations, and endpoints makes quantitative comparison challenging. Most studies employed simulation-based approaches, often lacking external validation or longitudinal follow-up, which limits clinical generalizability. Additionally, the substantial variability in study design, patient populations, and evaluation frameworks, coupled with the lack of standardized or externally validated methods, further constrains the comparability of findings. Second, descriptions of technical infrastructure were often insufficient, with vague references to “IoT” or “cloud platforms” without specifications of hardware, latency, data synchronization, or user interfaces. These omissions hinder replication and real-world deployment. Third, many studies used synthetic or retrospective datasets without accounting for demographic diversity, raising concerns about bias, representativeness, and model fairness. Most of the studies also originated from a limited number of countries, suggesting potential geographic skew in the current literature. Finally, ethical considerations, such as informed consent, algorithmic transparency, and liability in DT-assisted decisions, were scarcely addressed, despite being critical for clinical adoption and public trust.
Conclusion
DT represent a rapidly evolving paradigm with broad potential applications across healthcare, from personalized treatment planning and clinical decision support to system-level optimization and medical device development. Current progress demonstrates growing technical maturity in data integration, modeling fidelity, and real-time simulation, yet challenges remain in achieving robust validation, interoperability, and scalability. Methodologically, trends point toward increasing use of multimodal data fusion, hybrid modeling approaches, and privacy-preserving architectures that enable safe and adaptive learning from clinical environments.
The roadmap proposed in this review emphasizes these enablers as key to advancing DTs from research prototypes to practical, patient-centered tools. By focusing on technical performance and methodological rigor, rather than immediate clinical outcomes, the field can establish a solid foundation for future translational progress. Ultimately, these developments may facilitate a healthcare ecosystem that is more predictive, preventive, personalized, and participatory.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076261415934 - Supplemental material for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions
Supplemental material, sj-docx-1-dhj-10.1177_20552076261415934 for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions by Valeria Calcaterra, Luca Guardamagna, Alessandro Gatti, Virginia Rossi, Pamela Patanè, Luca Marin, Matteo Vandoni and Gianvincenzo Zuccotti in DIGITAL HEALTH
Supplemental Material
sj-docx-2-dhj-10.1177_20552076261415934 - Supplemental material for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions
Supplemental material, sj-docx-2-dhj-10.1177_20552076261415934 for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions by Valeria Calcaterra, Luca Guardamagna, Alessandro Gatti, Virginia Rossi, Pamela Patanè, Luca Marin, Matteo Vandoni and Gianvincenzo Zuccotti in DIGITAL HEALTH
Supplemental Material
sj-docx-3-dhj-10.1177_20552076261415934 - Supplemental material for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions
Supplemental material, sj-docx-3-dhj-10.1177_20552076261415934 for Digital twins in healthcare: A systematic review of current applications, frameworks, and future directions by Valeria Calcaterra, Luca Guardamagna, Alessandro Gatti, Virginia Rossi, Pamela Patanè, Luca Marin, Matteo Vandoni and Gianvincenzo Zuccotti in DIGITAL HEALTH
Footnotes
Author contributions
Conceptualization was done by VC, LM, MV, and GZ; independently collection of the contributions was done by LG, AG, VR, and PP; writing—original draft preparation was done by LG, AG, VR, and PP; writing—review and editing was done by VC, LM, MV, and GZ; supervision was done by VC, LM, MV, and GZ. All authors have read and agreed to the published version of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The review was conducted in the context of the PODiaCar Project (101128946-PODiaCar-EU4H-2022-PJ-3), co-funded by the European Union.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
AI tools disclosure
The English language of the manuscript was refined using ChatGPT-5 (OpenAI); all AI suggestions were carefully reviewed and approved by the authors. No generative AI tools were involved in the study's conception, analysis, or writing.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
