Abstract
Introduction
Training in congenital cardiac surgery varies widely across programs due to its technical complexity and the unique challenges of the specialty, including limited operative volume, significant anatomical variability, and long learning curves. These factors make it difficult to provide consistent operative exposure for trainees across programs. At the same time, trainees are expected to develop competence across a broad range of complex procedures, often with relatively low case numbers and close scrutiny of outcomes. 1 In a national survey of cardiothoracic trainees in the United Kingdom and Ireland, over two-thirds of trainees reported no dedicated placement in congenital cardiac surgery, and more than a quarter trained in regions without access to a congenital cardiac center. Although early interest in the subspeciality was common, limited operative exposure, poorly structured training pathways, and uncertainty regarding progression were frequently cited as barriers to sustained commitment. 2 Similar challenges persist at a fellowship level, where there remains substantial variability in both training duration and primary operator case volume, with many trainees completing fewer than 10 neonatal cases as primary surgeon. 3
As operative exposure becomes increasingly limited, the need for robust, objective methods to assess technical development and competence has become more pressing. Traditional metrics of training progression, including operative case numbers, duration of residency, and subjective supervisor judgment, do not offer a comprehensive insight into a trainee's technical performance. The shift towards competency-based training, therefore, necessitates assessment frameworks that are reproducible, transparent, and independent of variability in operating room exposure.
Simulation-based training offers a way to separate learning from patients and, therefore, allows deliberate, repetitive practice in a controlled, risk-free environment. This approach allows trainees to practice operative steps, decision-making, and error management without time pressure or clinical consequence. In adult cardiac surgery, simulation-based training has been associated with significant improvements in procedural performance, technical skills, and complications management. 4 Within congenital cardiac surgery, there is growing evidence that simulation can support structured, competency-based training to possibly shorten the time to independent practice. 5 Figure 1 highlights a silicone three-dimensional (3D) model used for cardiac surgical training.

Silicone model of double outlet right ventricle with remote VSD used in HOST. Left panel: Complete model. Middle panel: Model following removal of the right ventricular wall to demonstrate intracardiac anatomy. Right panel: Model post-VSD closure..
However, the educational value of simulation is contingent on the use of objective assessment tools that can quantify performance, track progression, and define competency thresholds. Despite increasing interest in simulation-based training, the application of objective assessment tools to evaluate trainee performance using 3D cardiac models in congenital cardiac surgery remains limited and highly heterogeneous. This systematic review aimed to evaluate the types, validity, and outcomes of objective assessments used to assess 3D cardiac models in training congenital cardiac surgeons.
Methodology
Search Strategy
A systematic review was conducted in accordance with the Cochrane Collaboration guidelines and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. A search for all relevant literature was performed using the PubMed, Scopus, Web of Science, and Embase databases.
Search terms included (“congenital heart surgery” OR “paediatric cardiac surgery” OR “pediatric cardiac surgery” OR “congenital cardiac surgery” OR “congenital heart defects” OR “congenital heart disease”) AND (“simulation-based training” OR “surgical simulation” OR “simulation” OR “hands-on surgical training” OR “skills simulation” OR “3D printed model” OR “three-dimensional print” OR “3D model” OR “silicone model” OR “anatomic model” OR “heart model”) AND (“training” OR “education” OR “surgical training” OR “skills training” OR “curriculum” OR “skill*” OR “technical performance”).
Eligibility Criteria
Studies were included if they evaluated the use of 3D-printed or silicone-molded cardiac models for training congenital cardiac surgeons, with performance assessed using objective measures, including technical performance, procedure-specific tasks, global rating scales, time-based measures, or clinically relevant outcomes. Eligible studies involved trainees at any stage, including residents, fellows, or early-career surgeons. Studies were excluded if they relied solely on subjective assessments without objective performance measures, focused exclusively on adult cardiac surgery without a clearly defined congenital component, or evaluated simulations related only to cardiology, imaging, or intensive care without a surgical training objective. Editorials, narrative reviews, expert opinions, conference abstracts, and non-English language studies were also excluded. References of identified papers were reviewed to determine whether additional studies could be included for screening.
Study Selection
Duplicates were removed prior to screening. All resultant search articles were screened using title and abstract by 2 independent reviewers (AG and SB) before a decision was made. A third independent reviewer (JF) resolved any disagreements between the 2 reviewers. Subsequently, screened articles were analyzed through a full-text review. All full texts of included articles were read and reviewed by 2 authors (AG and SB), and a unanimous decision was made regarding the inclusion or exclusion of studies. When there was a disagreement, the final decision was made by a third reviewer (JF). Figure S1 in the Supplemental Material outlines the preferred reporting items for systematic reviews and the meta-analysis flow diagram.
Data Extraction and Appraisal of Evidence
Using a preestablished protocol, the following data were extracted: first author, year of publication, study design, study aim, trainee characteristics, sample size, outcome assessment, and key findings related to technical, nontechnical, or clinical performance. A data extraction sheet for this review was developed and pilot-tested using three randomly selected included studies, and was subsequently refined accordingly. Data extraction was performed by 2 review authors (AG and SB). The accuracy of the tabulated data was validated by a third author (JF). Institutional review board approval was not required as this study involved analysis of previously published studies and did not involve human participants or identifiable patient data.
Due to heterogeneity in study design, simulation models, assessment frameworks, and outcome measures across included studies, a quantitative meta-analysis was not appropriate. Findings were therefore summarized descriptively.
Risk of Bias
The methodological quality of the included studies was assessed using the Joanna Briggs Institute (JBI) critical appraisal tool. The JBI framework evaluates potential sources of bias across key methodological domains, including study participation, outcome measurement, statistical analysis, and identification of potential confounding factors. Each study was independently assessed by 2 reviewers (SB and AG), and any discrepancies were resolved through discussion with a third reviewer (JF). Based on the appraisal criteria, studies were categorized as having a low, moderate, or high risk of bias. A summary of the risk-of-bias assessment for all included studies is presented in Figure S2 in the Supplemental Material.
Results
Nine studies met the inclusion criteria and evaluated the use of 3D heart models for training in congenital heart surgery using objective assessment frameworks. A summary of the included studies is found in Table S1 in the Supplemental Material.
The majority of studies were small, single-center studies using procedure or lesion-specific assessment tools, often focusing on complex or infrequently performed procedures. Study participants included cardiac surgery residents, fellows, and early career congenital cardiac surgeons, with several studies involving participants with limited or no prior experience of surgical simulation.
Across the included studies, simulation-based training predominantly focused on complex congenital lesions. Transposition physiology was the most frequently represented category,6–9 followed by septal defects,6,10,11 single-ventricle defects,6,9,12 arch and outflow tract anomalies,6,9,13 and complex biventricular anatomy.6,9,14 Conotruncal anomalies were less commonly studied.6,11 The distribution of the types and frequency of congenital heart disease (CHD) addressed across the included studies is shown in Figure 2.

Types and frequency of congenital heart defects simulated in included studies.
Seventy-eight percent (n = 7/9) of studies utilized time-based objective measurements,6,8,10–14 with 88% (n = 8/9) including procedural scoring.6–14 Of these studies, 4 used a previously validated assessment tool.6–8,10,12 Only 2 studies reported clinical outcomes following simulation.9,10 Table S2 in the Supplemental Material highlights objective assessment domains reported across all included studies, indicating the presence (●) or the absence (−) of each domain.
All studies reported consistent improvement with repeated simulation exposure, reflected in both technical performance and reductions in procedural time. Reductions in time were observed across a range of lesions and trainee experience, ranging between 15% and 25% reduction following repeated simulation in structured training programs.6,8,12 Lesion-specific time improvements were also reported, with ventricular septal defect (VSD) patch repair time decreasing from 34.4 to 17.3 min and right ventricular outflow tract (RVOT) patch repair time decreasing from 21.4 to 13.6 min over 3 simulation attempts. 11 Technical performance scores improved in all studies using structured scoring systems, including checklist-based scores increasing from 65% to 83% after 2 simulation sessions, 13 and procedure-specific hands-on surgical training-congenital heart surgery (HOST-CHS) score improvements ranging from ∼9% between first and second attempts in complex procedures to cumulative improvements exceeding 80% across repeated simulations.6,8,12 Where clinical correlation was assessed, simulation training was associated with reduced cross-clamp time following training. 10
Technical performance scores refer to objective, procedure-specific measures, including checklist-based or weighted assessments of a procedure's tasks. Time-based metrics are objective measures of procedure completion time. Use of a validated assessment tool indicates application of a formally developed and previously validated scoring instrument. Reliability testing refers to the assessment of inter- and/or intrarater agreement in performance scoring. Discriminative validity highlights the ability of an assessment tool to differentiate performance between surgeons of differing experience levels. Skill retention reflects reassessment of performance after a delay following initial training. Clinical outcomes include objective intra- or postoperative patient metrics, including cross-clamp time, cardiopulmonary bypass time, or subsequent complication rates.
Discussion
All studies in this systematic review reported improvements in measured performance following exposure to 3D simulation models.6–14 The included studies most commonly investigated early technical skill development, practice of rare or complex procedures, assessment of trainee progression, and clinical correlation with repeated simulation practice. Most studies focused on objective assessments of procedure- or lesion-specific models, with few highlighting the correlation between simulation-based practice and real-world clinical improvement.
We found that objective assessments tended to prioritize what is easiest to score, such as speed and task-specific checklists, rather than the attributes that truly determine safe independent operating, including judgment, adaptability, and team-based performance. Surgical simulation has long been proposed as a response to limited clinical exposure, but its educational value depends on whether performance can be measured meaningfully and used to guide progression rather than simply demonstrate improvement. 15
Technical Performance Improvements
All included studies reported improvement following simulation exposure, typically expressed as increased checklist scores, reduced procedural time, or both. This was seen for a spectrum of CHDs from coarctation of the aorta, 13 arterial switch operation (ASO), 12 and the Norwood operation. 8 Objective performance was also observed for trainees with varying levels of experience, from trainees with minimal congenital heart exposure11,13 to more senior trainees and early career surgeons.7,10
Importantly, performance improvements were observed despite differences in model fidelity, assessment tools, and training structure. For example, one study 13 demonstrated technical improvement after just 2 simulation attempts on 3D printed models, while another 11 reported progressive improvement across multiple training sessions using a tetralogy of Fallot (TOF) model with structured video-based scoring. Some studies also showed consistent improvements using formally developed procedure-specific tools for highly complex operations.7,8
Procedure-specific checklists make up the majority of our included studies, particularly for complex lesions and procedures.6,8,12 These tools are appropriate for assessing procedural steps, missed critical steps, and overall technical execution. Their main strengths are standardization, reproducibility, and close alignment with established operative practice. However, the credibility of these findings depends heavily on how performance is defined and measured. Some studies employed expert-derived checklists and video-based assessments, demonstrating statistically significant improvements over repeated attempts.11,13 While this approach offers face validity, many of these tools lack formal evaluation of reliability, discrimination, or construct validity. Consequently, improvements may just reflect increased familiarity with the task rather than actual acquisition of transferable surgical skills.
In contrast, the HOST-CHS program6–8,12 represents a more methodologically rigorous approach by explicitly addressing ways in which assessment tools were developed, subsequent inter- and intrarater reliability, and the discriminative validity between different levels of expertise. By demonstrating that procedure-specific tools can reliably differentiate between junior and expert surgeons,6,12 it supports the argument that simulation-based assessment can move beyond descriptive training aids toward validated competency evaluation. Importantly, this highlights a key methodological weakness in the literature as “objective assessments” are not equal, and in the absence of evidence for reliability and validity, their objectivity is questionable. It is also important to recognize that improvements observed in simulation may overestimate true operative competence. Evidence from the broader simulation literature suggests that skills are most transferable when training is embedded within deliberate practice frameworks and assessed longitudinally across increasing task complexity. 16
Reliance on checklist performance also introduces an important limitation, as smooth task completion may be mistaken for true operative competence. Due to the inherent nature of congenital cardiac surgery, where unexpected complications, anomalous anatomy, and hemodynamic changes commonly arise intraoperatively, such checklists are less sensitive to these adaptive behaviors. While weighted scoring systems partially address this by emphasizing critical steps, they still favor planned execution over real-time judgment.
Time-Based Performance
Given that operative duration is a critical determinant of patient safety in surgery, many of our included studies reported on time-to-completion as an objective outcome.7,11–14 Reduced operative duration is commonly interpreted as improved efficiency and confidence. However, faster execution does not necessarily equate improved patient safety, particularly in early training phases where deliberate pacing may mitigate error.
By contrast, studies that consider time alongside measures of performance quality offer a more balanced and informative assessment. For example, one study 11 reported a reduction in time-to-completion alongside improved structured performance scores, suggesting overall skill acquisition rather than just task acceleration. This is particularly important when comparing simpler procedures with more complex neonatal procedures, where cautious pacing may initially be protective rather than detrimental. Therefore, time should be interpreted contextually, where it is a secondary outcome contingent on acceptable technical execution. Future studies would benefit from explicitly defining whether time is intended to represent efficiency after competence acquisition or merely responsiveness to repetition. Frameworks which quantify autonomy, such as the Zwisch scale, provide a way to quantify progression other than time reduction alone. 17
Clinical Improvement
Ultimately, the value of simulation-based training is based on whether observed improvements in a simulated environment translate into measurable benefits in the operating room and, by extension, patient outcomes. Unfortunately, only a small number of studies attempted to link simulation-based practice to subsequent clinical outcomes. For example, one study 10 demonstrated a shorter cross-clamp time and reduced patch leak following a structured VSD simulation program. Although the study had a small sample size, this approach explicitly tests the assumption that simulator performance translates into safer and more efficient real-world surgery. Similarly, another study 9 demonstrated a reduction in cardiopulmonary bypass time and cross-clamp time following practice on patient-specific models before the operation.
Although causality cannot be established due to the absence of control groups and the multifactorial nature of operative outcomes, these studies suggest that 3D models may also support preoperative planning and operative decision-making. This is consistent with findings from the broader literature, where improvements in simulation performance frequently fail to translate into demonstrable operative benefit, largely due to weak study design and absence of appropriate control groups. 18
Heterogeneity of Assessment Tools
Although all included studies employed objective assessment, how performance was quantified differed fundamentally, reflecting varying assumptions about what constitutes technical competence in simulated 3D models. Several studies assessed performance using single summary measures, such as overall procedure time or total checklist scores.6,11,13 These approaches provide clear, easily interpretable outcomes but offer limited insight into which parts of a procedure improve with repeated practice. As a result, observed gains reflect overall task completion rather than changes in specific technical elements or subtasks.
Heterogeneity was also evident in the temporal application of assessment tools. Most studies assessed performance at only 2 timepoints, typically comparing early and late simulation attempts, which provides limited information on learning trajectories and performance variability.6,11,13 Only one study assessed delayed skill retention. 6 Therefore, objective assessments predominantly capture short-term change rather than sustained skill development, and thus, this limits their ability to inform decisions about progression or readiness for independent practice.
Some studies utilized video assessments of simulation performance,7,8,10,12 while others used a single experienced assessor applying a structured scoring system.11,13,14 Even where validated tools are used, the utility of the assessment tool remains contingent on consistent rater interpretation, the scoring framework, and the context in which performance is observed. As a result, “objective assessment” reflects a range of different measurement approaches across studies, rather than a single, uniform standard.
Limitations
Despite the overall positive findings, this systematic review has some limitations that should be acknowledged. Most studies included had small sample sizes and were conducted within single centers, which limits generalizability. Repeated practice for the same task raises the possibility of learning effects from task repetition, particularly when the same models are reused. Additionally, assessing skill retention beyond the immediate posttraining period is inconsistent, with only a small number of studies evaluating delayed performance. 7 In addition, assessor blinding was variably reported, which introduces the potential for expectation bias when scoring participants. Importantly, nonoperative technical skills (NOTS) were not examined in any of the studies, despite their central importance in operative decision-making, team communication, and patient safety. 19 These limitations do not negate the observed benefits of simulation but highlight the early stage of the field and the need for greater methodological standardization, longitudinal evaluation, and objective assessments, including those not limited to just technical performance.
Future Directions
This systematic review supports the use of objectively assessed 3D simulation practice in congenital cardiac surgery training, particularly for early stage skill acquisition and exposure to infrequently encountered, high-risk procedures. Simulation should function as an adjunct to operative exposure, augmenting training rather than replacing operative experience. Its greatest value lies in standardizing surgical skills, reducing variability in early training exposure, and providing assessment benchmarks in an era of limited case volumes. Beyond its role in technical skills acquisition, validated assessment frameworks, including global rating scales and structured operative assessment tools, have been incorporated into simulation-based curricula to objectively evaluate trainee performance and progression. 17
In Europe and North America, the shift toward competency-based training has also emphasized structured assessment of operative performance alongside traditional case-volume outcomes. Hands-on workshops using 3D heart models have already been incorporated into international meetings such as the American Association for Thoracic Surgery (AATS) annual meeting, 20 or the European Association of Cardiothoracic Surgery (EACTS) annual meeting, 21 where participant feedback has been highly positive and there is growing enthusiasm for integrating such simulation platforms into formal surgical training curricula. 6 Although simulation has not yet been widely adopted as a formal requirement for certification in cardiothoracic surgery, its integration into training curricula across several surgical specialties suggests that simulation-based evaluation may play an increasing role in future credentialing.16,18
Future work must move beyond demonstrating improvement toward establishing meaningful competency thresholds, transferable operative competence, and, importantly, equitable access. The educational value of increasingly sophisticated models is limited if access remains restricted to a small proportion of trainees worldwide. 22 Without widespread availability, simulation risks widening existing training inequities rather than addressing them. Future efforts should therefore prioritize scalability, cost-effectiveness, and structured incorporation into training pathways, ensuring that simulation-based learning is accessible to all trainees.
Conclusion
We demonstrate that objective assessment of training on 3D heart models offers a feasible and reproducible approach to developing technical skills, particularly in light of increasingly limited operative exposure for trainees. Simulation was consistently associated with measurable improvements in technical and clinical performance, supporting its role as a valuable adjunct to traditional training. While current assessment strategies focus primarily on procedural execution, this provides a strong foundation on which a more comprehensive evaluation of adaptive competence and NOTS can be developed. The impact of simulation training will depend on how effectively it is implemented across all training programs to ensure equitable access for all trainees.
Supplemental Material
sj-docx-1-pch-10.1177_21501351261449201 - Supplemental material for Objective Evaluation of Three-Dimensional Models for Training Congenital Cardiac Surgeons: A Systematic Review
Supplemental material, sj-docx-1-pch-10.1177_21501351261449201 for Objective Evaluation of Three-Dimensional Models for Training Congenital Cardiac Surgeons: A Systematic Review by Jeevan Francis, Sara Brophy, Alan George and Nabil Hussein in World Journal for Pediatric and Congenital Heart Surgery
Footnotes
Abbreviations
Acknowledgments
None.
Ethical Consideration
None required.
Author Contributions
JF: conceptualization, data collection, data analysis, manuscript drafting, and revision. SB and AG: data collection. NH: supervision, conceptualization, data analysis, manuscript drafting, and revision.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
