Abstract
Introduction
Extracorporeal membrane oxygenation (ECMO) emergencies require skilled clinical specialist (CS) who manage ECMO circuits. While tools for assessing CS skills have been published, there is significant variation in protocols and circuit design. This study aims to further develop these checklists to produce a generalizable ECMO skill assessment with adequate validity evidence to support its use as a summative evaluation tool.
Methods
An initial survey determined variation in ECMO circuit components and configurations, and the original checklists and simulations were altered through a modified Delphi process. The finalized checklist and simulation were then assessed for validity and reliability. Three trained raters assessed ten simulations from five subjects at two different institutions using two circuit designs. Data analysis was conducted using a fully crossed subject x rater x circuit generalizability (G) and decision (D) study.
Results
The G-study coefficient was 0 with 0% variance across subject and circuit. The greatest variance was among raters (28.7%). Significant variance was also associated with the subject and pump type relationship (27%).
Conclusion
Despite the rigorous process used to modify the assessment, generalizability was poor. Lack of familiarity with center-specific circuit design played a key role. Future endeavors in ECMO skill assessment should focus either on developing and validating site-specific tools or standardizing circuit designs.
Introduction
Extracorporeal membrane oxygenation (ECMO) technology is a complex intervention used to urgently (and in some cases emergently) stabilize critically ill patients with refractory respiratory and/or cardiac failure. Given the highly unstable nature of the patient population involved, potential ECMO circuit issues represent high-risk issues that can compromise patient outcomes. Emergency situations that arise on ECMO require precise and timely responses to any equipment or patient change, making ECMO the classic low-volume, high-risk procedure ideally suited for simulation-based training and assessment.1,2
Simulation based skill assessment checklists are an objective means of assuring that a CS can perform defined competencies based on specific performance criteria.3–6 By utilizing simulated environments, the clinical setting can be standardized, allowing for reproducible assessments to be performed. There is wide variation amongst institutions in protocols used to address ECMO circuit emergencies, however, making such standardization challenging. 7 Abulebda et al previously developed and provided preliminary validity evidence for a set of simulation-based assessments designed with to evaluate the clinical performance of ECMO CS in three ECMO circuit emergencies: (1) venous air, (2) arterial air, and (3) oxygenator failure. 8 A modified Delphi technique was used to develop and validate the assessment tools content. 8 They were then used in a simulated environment to measure CS efficacy and potentially assess CS skills in real time. While these assessment tools have promising evidence to support their use across institutions, they were developed at a single site and are thus tailored to that institution’s ECMO system setup. In addition to this, subsequent psychometric studies performed by our team showed that these assessment tools only demonstrated moderate internal consistent and interrater reliability, key aspects of assessment validity. In this preliminary analysis, the arterial air assessment tool achieved the highest inter-rater reliability of 0.79 by intraclass Correlation, and internal consistency of 0.75-0.78 by Kuder-Richardson 20. This suggest that, while promising, this initial development phase did not produce tools with sufficient generalizability to use across institutional settings, particularly if high-stakes decisions are in view.
Given the need and necessity for a generalizable assessment approach for ECMO skills, our team set out to iteratively refine these checklists, with the initial emphasis on the assessment of an arterial air emergency. The goal of our study was to describe this refinement process and provide validity data for the revised arterial air assessment tool, using Messick’s Unified framework 9 to support its utilization in local credentialling processes.
Methods
A validity study is intended to demonstrate that an assessment tool can produce scores with sufficient robustness to make specific decisions with reference to a specific construct within the learner group and environment in which it was tested. The validity of the assessment tool is the degree to which the tool believably measures aspects of the construct under consideration, hence assisting the decision-making process. In our study, the construct in view was skill at managing arterial air emergencies, and the decision in view was the local credentialling of ECMO providers in this skill. Current literature-based guidance also recommends using a validity framework to guide the argument.10,11 We examined the validity of the revised ECMO arterial air checklist using Messick’s framework, 12 which is composed of five forms of evidence: Content, Response Process, Internal Structure, Relationship to Other Variables and Consequence. We chose to focus on three of these sources of validity evidence: content, response process, and internal structure.
Content validity
Assessment tool survey development
Content evidence evaluates how well the content of the assessment tool aligns with the construct being measured and is often assessed through techniques like the modified Delphi process. Having such a process is important as it assures that experts in the field agreeing on both the overall content and the wording of individual items, enhancing the generalizability of the tool. In this study, we utilized the Delphi process for both the tool itself and the simulation scenario, as these are both integral parts of the assessment process.
An ECMO circuit is made of a number of various components. A standard ECMO circuit consists of a mechanical blood pump, gas exchange device, a heater and tubing to connect the circuit together. Different centers, however, typically utilize customized variants of the core design preference based on budgets, experience, preference and patient need. The size of tubing, different blood flow, pressure, and patient information monitoring as well as circuit access sites or the use of a bridge or hemofilter can all vary significantly, representing a challenging landscape for generalized assessment. To address this, a prior (2017) Extracorporeal Life Support Organization (ELSO) internal survey of 31 different centers circuit designs was used. Centers answered questions related to circuit design and this information was used to create an initial survey with the intent to identify (1) a global set of qualitatively similar ECMO circuit designs in current use, and (2) the impact that these design differences would have on our ability to evaluate CS performance using the current assessment tool. Fifteen centers completed this initial survey (Supplemental Table 1), and five submitted photographs of their current ECMO circuit designs as requested.
Two experienced ECMO physicians at one institution (JFD, AC) and one experienced ECMO physician from a second institution (KA) collaborated to modify the initial CS arterial air simulation and assessment tool with the intent to create a generalizable tool. Specific changes typically included the identification of key unique design elements applicable to specific venters followed by an attempt to phrase the question in a more generalizable manner. (i.e. altering “Remove yellow cap from oxygenator” to read “Remove cap from the de-aeration port of oxygenator”). Each element of the simulation and simulation assessment tool were identified separately and questions pertaining to each step were drafted in a linear stepwise process reflecting the proper order of actions performed in case of an emergency. The modified assessment tool and scenario were then refined further via a three-round modified Delphi process using a panel of content experts.
Modified Delphi process
Demographics and institutions of the expert panel.
All experts were sent an introductory communication via e-mail with a full explanation of the modified Delphi process structure as well as the arterial air assessment tool and simulation developed by Abulebda et al. and modified using the initial survey Experts were asked to consider whether each item/component is useful for the assessment of a particularly described skill and should be included in the final tool, whether the item is clearly phrased, and whether it is worded in a way that would be applicable to that institution’s ECMO management practices. A rating scale of “Strongly Agree, Agree, Neither Agree nor disagree, Disagree, Strongly Disagree” was used. Experts were asked to score each item, suggest revisions as necessary and include comments explaining all suggested revisions, deletions, and additions. The experts were blinded to the identity of other experts to prevent any individual from influencing the panel’s decisions about items. Authors reviewing the panels’ scoring, with the exception of the author team (JFD, AC, KA) were also blinded to the identity of the experts. Items from the simulation and assessment tool that received an agreement rate of over 80% (either “Strongly Agree or Agree”) were kept, whereas items that received an agreement rate of >80% (“Disagree or Strongly Disagree”) were either modified as suggested or removed from the assessment tool.
A response of “Neither agree nor disagree” was addressed only if experts made comments for further revisions. This process was repeated for round two and three using the modified scenario and assessment tool from prior rounds. Steps taken to develop the assessment tool and the Delphi phases are demonstrated in Figure 1. The response rate for all three rounds was 100%, and by the end of round three all items achieved a rating of >80%. The final assessment tool is displayed in Figure 2. Steps for Delphi method revisions. Final modified assessment checklist.

Response process and internal structure
Rater training
Response process evidence addresses the manner in which the tool is scored by raters, as well as the training process used to ensure accuracy. Internal structure evidence examined in this study included internal consistency, which assesses how many constructs are present in the tool, and a Generalizability Study (G-study). Generalizability analysis was chosen as it not only enables the calculation of an overall generalizability “score” that speaks to a tool’s overall psychometric reliability under the conditions studied, but also provides detailed information on potential sources of unwanted variance that can detract from tool performance. This analysis of variance is not limited to the facets alone, but is also measured for relationships between facets, allowing for a more granular evaluation to be conducted.13,14
Three separate raters were chosen based on ECMO CS expertise and knowledge of variations in ECMO circuit setup. The raters were trained via videoconferencing during which the assessment tool was presented and an explanation of use was given. Special emphasis was given to the use of the tool within the given simulation and how the generalized language of the tool was intended to function across sites, learners, and circuit designs. Raters then reviewed three videorecorded CS arterial air example simulations with predefined scores (score filmed: 2, 7, 12), which were used to provide further feedback on tool use.
Assessment simulations
Comparison of two circuit designs.
Statistical analysis
Raters were analyzed using a fully crossed subject x rater x circuit design generalizability study (G-study) with study subject, rater, and circuit design included as facets. Ircuit design was chosen as the final facet due to our desire to create a tool capable of generalizable assessment across institutions and circuit designs. Finally, a decision study (D-study) was also performed which allowed us to predict the reliability of the same data collected under different conditions and gain a sense of the conditions under which adequate generalizability might be achieved. SPSS ver 29, by IBM, was used to calculate variance components.
Results
G-study results with estimate of variance attributable to each component (subject, pump and rater) and the interaction of these components.
Discussion
The previously published simulation assessment tool for ECMO circuit emergencies is one of the few published assessment tools in the literature. 8 Although initial psychometric testing showed reasonable content validity, it ultimately revealed insufficient reliability for generalized assessment of ECMO CS skill, making it unsuitable for certification purposes. Despite a rigorous process for modification, this study was unable to achieve higher levels of reliability and generalizability.
The institutional survey and modified Delphi process described above was intended to convert the initial tool into a more generalizable assessment that could function across institutions and circuit types. By incorporating feedback from multiple, diverse experts in the field of ECMO and obtained consensus on all items, we believe the potential generalizability of the previous emergency arterial air assessment tool as much was strengthened as much as was possible. The G-study, however, showed that the tool was essentially unable to detect variation in learner skill across institutions (g-coefficient of 0).
The individual variance components shed further light on this this result. The highest variance seen in this study was rater variance, indicating relatively low inter-rater reliability, and raising questions as to whether each rater was able to utilize the more generalized language intuitively, despite receiving training. One of the strengths of the G study is the ability to split the overall variation within the study into its component parts, allowing for deeper analysis. In this study no variance (0%) was seen in relation to the design of the two pumps taken as an isolated facet. The second highest level of variance, however, was in the subject by pump category, suggesting that the bulk of subject variance was attributable to specific subject expertise (or lack thereof) with the specific pump setups used within the study. In other words, different subjects demonstrated radically different performance on different pumps. This is perhaps unsurprising as individuals receive training on only one (or at most two) specific pump types within their institutions. Lack of familiarity with equipment during a simulated emergency is likely to contribute to inability to correctly perform a skill. Additionally, different circuit set ups may require different de-airing techniques, meaning that the specific components of this skill may not be the same between institutions. Analyzing the variance in rater by subject and rater by pump showed equal levels of variance, further suggesting that the raters may have been similarly influenced by subject performance and/or their familiarity with specific circuit designs. The “error” of 18.9% suggests that there may be a potentially important interaction between all three variables (raters, subjects and pump types) although a fourth, unassessed variable is also a possibility.
Based on these findings, it appears that checklist scores are strongly influenced by rater and subject experience with specific pump designs. The concept of generalizability in assessment is based on the idea that the construct in question (in this case the skill needed to successfully manage arterial air) can be consistently demonstrated across different contexts where the skill is needed. Given the rigor of the content development process and rater training the overall generalizability of the tool was still quite poor, calling into question whether it is possible to develop an assessment process for ECMO skill that is truly generalizable across pump types and between institutions. Our data suggest that this is largely due to center-specific variation in circuit design.
ECMO as a field is characterized by significant practice variability, including indications/contraindications, cannulation strategies, anticoagulation practice and circuit design components, and this heterogeneity is a growing topic of interest. This can pose challenges when comparing data from multiple centers, developing definitions of adverse events, and educating future specialists. A particular focus of this interest is ECMO circuit construction, as this varies greatly in relation to cannula type, pump type, tubing, connectors, access points and incorporated ancillary equipment (such as renal replacement therapy circuits). 15 The development of a single (or at most a small set) of universal circuit designs would be a significant step toward addressing this issue, and the findings from this study offer valuable insights toward that end.
The attempt to develop a generalizable checklist or assessment tool is also particularly timely due to the recent publication of a compendium of robust simulations that has an accompanying “Simulation Scenario Critical Actions Assessment” checklist included with each.
16
This material certainly meets a critical gap in ECMO training, but our findings raise concern as to the ultimate validity of these checklists for making critical decisions regarding CS performance. While it is worth noting that this publication makes no claim regarding validity, it calls for careful consideration of significant local revision, contextualization, and validation of these tools to be used in the intended manner. Though they may be valuable as templates, more specificity will likely be required. The development of a more universal approach to ECMO as discussed above represents the best long-term solution to this issue.
17
In the interim, however, ECMO circuit design and practice variation will continue to exist, and to address current training and assessment needs during this period we recommend the creation of a generalizable process that could be used to develop assessment checklists consistent with local practice as a possible solution. Figure 3 provides a suggested approach for developing such checklists. Proposed guideline for adapting a prior validated internal ECMO assessment tool to local practice.
Limitations
The use of a Delphi method for the modification of simulations and checklists is lengthy and at times experts may have overlooked or missed items, creating potential bias or incompletion. It is also possible that the use of this process with other checklists (i.e. venous air or oxygenator failure) might have resulted in a more generalizable tool, although the initial poor inter-rater reliability suggests that this is an unlikely outcome. We were also limited to 8 respondents, raising the possibility that a lack of representativeness in this cohort may also be a contributing factor. It is also important to note that, two of the raters did know some of the subjects performing the simulation and this could have influenced the rater by subject variance. Finally, the use of videos in the assessment allowed for a replay option and speed variance for the raters, however not all parts of the circuit may have been always visible introducing limitation in assessment.
Conclusion
Our results suggest that a generalizable ECMO skill assessment tool that can function well across institutions and circuit types may not be possible given current heterogeneity. These negative results support a push towards a creation of a universal design or limited set of designs for ECMO circuits in the pediatric ECMO community, which could then serve as the foundation for standardized training and assessment approaches. Until this standardization can be implemented, we propose the development of generalizable process for the creation of institutionally specific assessment tools based on published best practices in validity. Further studies will be needed to refine and develop this approach to local contextualization.
Supplemental Material
Supplemental Material - Developing a generalizable pediatric ECMO emergency checklist for clinical specialist: Progress and challenges
Supplemental Material for Developing a generalizable pediatric ECMO emergency checklist for clinical specialist: Progress and challenges by Jamie M. Furlong-Dillard, Kamal Abulebda and Aaron W. Calhoun in Perfusion.
Footnotes
Acknowledgements
We thank the following raters for their participation and efforts to make this study possible: Brad Oelkers, Noelle Vasquez, Teka Siebenalar.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
