Abstract
Objective
To test the feasibility and impact of a simulation training program for myringotomy and tube (M&T) placement.
Study Design
Prospective randomized controlled.
Setting
Multi-institutional.
Subjects and Methods
An M&T simulator was used to assess the impact of simulation training vs no simulation training on the rate of achieving competency. Novice trainees were assessed using posttest simulator Objective Structured Assessment of Technical Skills (OSATS) scores, OSATS score for initial intraoperative tube insertion, and number of procedures to obtain competency. The effect of simulation training was analyzed using χ2 tests, Wilcoxon-Mann-Whitney tests, and Cox proportional hazards regression.
Results
A total of 101 residents and 105 raters from 65 institutions were enrolled; however, just 63 residents had sufficient data to be analyzed due to substantial breaches in protocol. There was no difference in simulator pretest scores between intervention and control groups; however, the intervention group had better OSATS global scores on the simulator (17.4 vs 13.7, P = .0003) and OSATS task scores on the simulator (4.5 vs 3.6, P = .02). No difference in OSATS scores was observed during initial live surgery rating (P = .73 and P = .41). OSATS scores were predictive of the rate at which residents achieved competence in performing myringotomy; however, the intervention was not associated with subsequent OSATS scores during live surgeries (P = .44 and P = .91) or the rate of achieving competence (P = .16).
Conclusions
A multi-institutional simulation study is feasible. Novices trained using the M&T simulator achieved higher scores on simulator but not initial intraoperative OSATS, and they did not reach competency sooner than those not trained on the simulator.
Interest in simulation has increased over the past decade as health care providers have recognized its role in education, research, and system improvement. In 2011, the Board of Directors for the American Academy of Otolaryngology–Head and Neck Surgery Foundation (AAO-HNSF) convened a task force to study and report on the potential impact of simulation technology on the specialty of otolaryngology. At that time, the task force concluded that “in general, though, despite several distinct areas of excellence, simulation’s penetration into Otolaryngology is at a relatively early stage, and generalized appreciation and incorporation of simulation is not widespread” (E. S. Deutsch, personal communication, June 2011). In addition, the task force noted that “otolaryngology simulation has the potential to be of value for learners at the full range of their careers, from medical students, to practicing Otolaryngologists seeking ongoing professional development.” Members of the task force conveyed that “our greatest opportunity cost will not come from investing in the development of simulation within the field of Otolaryngology, but from failing to do so.”
As a result of those discussions, a simulation interest group was formed and 3 main projects were developed to expose academy membership to simulation. These included development of 2 opportunities during the AAO-HNSF’s annual meeting: a “simulation experience” designed to offer individuals the opportunity to participate in a simulation training bootcamp and the “simulation showcase and reception” providing a venue for individuals to showcase developing simulation technologies and applications in an informal demonstration. The third initiative was the development of the “SimTube project,” which is the focus of this report.
The overarching purpose of the SimTube project was to (1) provide a broad exposure to simulation training and research in a unified, multi-institutional experience; (2) establish a mechanism or “pathway” to establish future simulation-based training protocols and conduct effectiveness research; and (3) support a national dialogue regarding standardized curriculum implementation and to develop a collaborative simulation community. To achieve these goals, the task force instituted a simple simulation exercise using a validated, low-cost myringotomy and tube (M&T) simulator and an easily implemented study protocol.
The hypotheses of the SimTube project were as follows:
1. A specialty-wide, multi-institutional simulation training study is feasible.
2. Phase 1: Novice residents trained using a low-cost M&T insertion simulator, compared with standard training, will demonstrate improved performance:
a. On the simulator
b. In the initial operative experience as measured on a validated and objective assessment instrument
3. Phase 2: Novice residents trained on the simulator will reach intraoperative competence sooner than those not trained on the simulator.
Methods
A prospective randomized controlled trial conducted from July 2015 through June 2018 assessed the impact of simulation training on the rate of achieving M&T placement competency using a low-cost M&T simulator ( Figure 1 ). The study was reviewed and approved for exemption by the Nationwide Children’s Hospital (NCH) Institutional Review Board (IRB; Protocol #IRB14-00865, Federal Assurance #FWA 000002860) in January 2015. Participating institutions either obtained site-specific approval using the materials developed at NCH as templates or ceded review to the NCH IRB.

SimTube study flow. Hypothesis 1 addresses feasibility of study and is not indicated in this figure.
The physical model chosen to provide the low-cost simulation platform was that instituted by Malekzadeh et al 1 for a skills laboratory to train novice residents in the psychomotor skills for M&T insertion ( Figure 2 ). In that original study, the authors also presented a validated Objective Structured Assessment of Technical Skills (OSATS) composed of a task-based procedural checklist (range, 0-5) and a global rating scale (range, 5-25) developed and validated by standard methodology as defined by Martin et al. 2 That assessment tool was used for the assessment component of the SimTube project ( Figure 3 ) with the OSATS questions and rating. Raters determined competence subjectively based on the following definition: “ready to perform procedure on own without supervision,” which was provided as a competent/not competent answer choice within the rating instrument.

SimTube simulator materials and assembly.

(A) Objective Structured Assessment of Technical Skills (OSATS) assessment tool: Global Rating Scale. (B) OSATS assessment tool: Task Checklist.
Introductory information about the project was presented in 2014 at the Society of University Otolaryngologists and the AAO-HNSF Annual Meeting Simulation Interest Group meeting, as well as in 2015 at the Combined Otolaryngology Societies Meeting. The AAO-HNSF “ENTConnect” online communications portal was used to provide information regarding the project to interested programs, including a brochure detailing the steps in the study, instructions for enrolling institutions in the project, and IRB materials developed at NCH. Following this, a series of instructional webinars were conducted to educate otolaryngology faculty interested in participating as coinvestigators or raters in the project. These online sessions provided detailed study information, protocol review, and opportunity for local primary investigators (PIs) to ask questions. Potential raters were provided with access to a series of video performances depicting various levels of competency in the M&T procedure. Each participating program had to include a faculty member to manage the enrollment and at least 1 additional faculty member, blinded to the intervention or control status of the participant. Neither faculty nor residents were offered financial incentives.
In phase 1, beginning in July 2015, all otolaryngology trainees participating in the study were required to watch a tutorial video on M&T indications and placement. 3 Residents were then randomized to either simulation training (intervention) or no additional training (control). Using the low-cost simulator, their M&T skills were assessed using the OSATS as referenced above. The intervention group trained on the simulator for 1 hour. OSATS ratings occurred on the simulator before randomization and after initial simulator training. In phase 2, OSATS ratings were then performed for first possible live tube insertion and during successive live procedures until competency. Deidentified data were collected using the residents’ assigned random numbers and submitted by computer or mobile phone link via REDCap electronic data capture tools hosted at NCH.4,5 Original study subject inclusion criteria included residents in Accreditation Council for Graduate Medical Education (ACGME)–approved otolaryngology training programs who had not previously performed M&T insertion and consented to participate. Due to anticipated logistical difficulty in initiating subjects into the study before their first intraoperative M&T procedure, inclusion criteria were expanded (prior to initiating the study) to include residents who had performed previous M&T insertion but were still considered novices. Resident enrollment, including year of postgraduate training, was determined by the participating program.
Personal communication with David J. Brown, MD, an expert in otolaryngology education, revealed that trainees in past myringotomy and tympanostomy training programs required an average of 20 procedures to attain competence (personal communication, 2014). In the absence of baseline data, consultation with Brown and other experts in otolaryngology education suggested that a reduction of 5 procedures to reach competence would be an appropriate target for the simulation intervention. A sample size of 196 participants per group was calculated a priori for a total of 392 participants to provide 90% power (at α level of .01) to detect an average reduction of 5 surgeries to attain competence in the simulator group assuming the control group requires an average of 20 surgeries to attain competence.
The first outcome was performance on the simulator following the phase 1 intervention. Differences in the OSATS-Global and OSATS-Task scores were calculated for participants who completed both the pretest and posttest on the simulator (n = 54). Pretest OSATS by intervention group was evaluated using nonparametric Wilcoxon-Mann-Whitney tests. Generalized linear models were used to estimate the effect of the intervention on posttest OSATS after adjusting for pretest OSATS.
Subsequently, during phase 2, we evaluated the effect of the intervention on live surgery performance. Because of substantial breaches in the study protocol, we limited this analysis to participants who had at least 1 evaluation on the simulator and at least 1 evaluation during a live surgery (n = 45). OSATS scores and the number of previous live surgeries for the first evaluated live surgery were analyzed using Wilcoxon-Mann-Whitney tests. Categorical resident and surgery characteristics were evaluated with χ2 and Fisher exact tests. Generalized linear models were again used to estimate the effect of the intervention on OSATS scores after adjusting for the number of previous live surgeries.
We evaluated the effect of the intervention on the rate at which residents achieve competence among participants who had at least 1 evaluation on the simulator and at least 1 evaluation during a live surgery (n = 45). Raters determined competence subjectively based on the following definition: “ready to perform procedure on own without supervision,” and the number of previous live surgeries was the “timing” of competence. We conducted bivariate analyses assessing unadjusted relationships between competence, the intervention, and various resident characteristics. We estimated survival, or in this case the rate of achieving competence, by intervention group with an unadjusted Kaplan-Meier curve. Finally, we conducted a Cox proportional hazards regression to assess the relationship between the intervention and the rate of achieving competence, adjusting for covariates. For the Cox model, the time scale was the number of live surgeries that residents had completed at the time of evaluation. The benefits of a Cox model for this study question were the ability to account for right censoring (loss to follow-up) that occurred among residents at different points in their training and the ability to model the rate of achieving competence rather than merely ever achieving competence. Statistical analyses were conducted in SAS Enterprise 7.15 (SAS Institute, Cary, North Carolina) and an α level of .05 was used to determine statistical significance.
Resident characteristics (sex, handedness) and opinions were evaluated with a survey at the end of data collection. Just 24 residents participated in this survey, and as a result, there is significant missingness in the data. Descriptive statistics are used to evaluate these results.
Results
Initially, 65 residency programs expressed an interest in participating, although not all completed resident ratings. In total, 105 raters provided assessments of the study subject’s performance with a median of 2 ratings per rater and a range from 1 to 74 ( Figure 4 ).

Number of ratings per rater.
A total of 101 residents were enrolled; however, just 63 residents had sufficient data to be analyzed ( Table 1 ). Randomization to treatment was effective in that no baseline differences in sex, postgraduate year (PGY), or handedness were detected in the intervention and control groups. Handedness was included as a demographic since it is a common parameter associated with surgical performance. However, no participants in the study identified as left-handed.
Study Subject Characteristics (N = 63).
Abbreviation: PGY, postgraduate year.
Phase 1: Simulation
In phase 1, 54 residents underwent OSATS evaluations before and after the simulation. The intervention and control groups received comparable OSATS scores on the pretest (P = .15 for OSATS-Task and P = .90 for OSATS-Global); however, posttest scores were notably higher in the intervention group ( Figure 5 ). Mean (SD) OSATS-Global scores were 17.4 (3.5) after the simulation in the intervention group compared to 13.7 (3.3) in the control group (P = .0003). There was also a benefit in OSATS-Task scores in the intervention group, such that the mean (SD) score in the intervention group was 4.5 (0.8) compared to 3.6 (1.4) in the control group (P = .02). Generalized linear models suggest a positive effect of the intervention on posttest OSATS after adjusting for pretest OSATS (P < .0001 for OSATS-Global and P = .04 for OSATS-Task).

Mean Objective Structured Assessment of Technical Skills (OSATS) score for the intervention (simulation) and control (no simulation) groups. *Significant difference.
Phase 2: Intraoperative Performance
During phase 2, a subsequent intraoperative case was evaluated by a rater at the institution. Forty-five residents who participated in phase 1 and also had an evaluation of a live surgery (phase 2) were included in this analysis. OSATS scores from the first evaluated intraoperative M&T were comparable in the intervention and control groups (P = .41 for OSATS-Task and P = .73 for OSATS-Global) ( Figure 6 ). The number of previous live cases ranged from 1 to 18 at the time of the first evaluation, and the mean (SD) number of live cases performed prior to the first rating was 4 (3.0) in the control group and 2.9 (4.3) in the intervention group. In a multivariate regression model, we assessed whether the impact of this difference in previous live surgeries contributed to the null effect of the intervention on OSATS score. There was not a significant effect of the intervention on OSATS-Global (P = .82) or OSATS-Task (P = .53) after adjusting for number of previous live surgeries.

Number of live cases and mean Objective Structured Assessment of Technical Skills (OSATS) scores at first intraoperative assessment.
Competence
Of the 45 participants with an assessment during both phases 1 and 2, 32 (71%) were evaluated as competent during at least 1 surgery. The mean (SD) number of live surgeries at the time competence was achieved was 13.2 (11.6), but the group that did not achieve competence had a mean (SD) of 7.5 (4.8) previous cases to date. Residents in the intervention group were no more likely to be evaluated as competent (P = .95), although competence was strongly associated with OSATS scores. Residents who reached competence had mean OSATS global scores of 20.3 and task scores of 4.9 (vs 13.2 and 3.5, respectively).
Multivariate survival analysis confirmed the findings of the bivariate analysis, suggesting that OSATS scores were a key driver of competency ( Table 2 ). A 1-unit increase in OSATS-Global score was associated with a 10% increase in the rate of achieving competence (hazard ratio [HR], 1.10; confidence interval [CI], 1.05-1.16). Residents in the intervention group who achieved competence had fewer previous cases on average compared to those in the control group (10.7 vs 14.7), but the effect of intervention was not significantly associated with a faster rate (ie, fewer procedures needed) of achieving competence (P = .16).
Multivariate Survival Analysis of OSATS Scores (n = 45).
Abbreviations: OSATS, Objective Structured Assessment of Technical Skills; PGY, postgraduate year.
Survey
Twenty-seven residents completed learning evaluations, 12 of whom were in the intervention group. All 12 residents suggested that the simulator was somewhat or very useful. One resident commented that “the most valuable part was getting used to using the instruments. The model itself wasn’t the most realistic but definitely served its purpose.” When asked, “What would you change?” residents commented that a model with an ear replica with an external auditory canal, different angulations of the tympanic membrane, and different tube sizes would have been helpful. Also, residents commented on the desire for feedback during the practice hour with an attending and, specifically, more help with visualizing the tympanic membrane through the microscope.
Discussion
Hypothesis 1 addressed feasibility. Interest was expressed by approximately two-thirds of potentially participating programs across the United States and Canada, 6 and based on the assumption that an individual resident might initially participate during either PGY1 or PGY2 (but not both), almost a third of potential subjects were evaluated. 6 These results suggest an interest in initiating national simulation training research protocols. After obtaining initial IRB approval from NCH, the sponsoring institution, there was little difficulty obtaining IRB approval at most additional participating institutions. Some local IRBs required only the letter of approval from NCH; others reviewed the entire protocol prior to approval.
Other medical training programs have also demonstrated an interest in national simulation training.7-9 Evaluations of the impact of educational interventions are challenging in individual otolaryngology programs, which generally include only 1 to 5 residents per year. This challenge has been described in other subspecialty surgical simulation reviews that cite small sample size and lack of methodologic rigor as undermining factors in evaluation efforts to date.10,11 In addition, as the bulk of literature surrounding simulation in otolaryngologic training describes the feasibility of simulation models rather than their impact, there is a relative paucity of information on simulation training’s widespread efficacy.12-18 Thus, the development of research collaboratives, such as the one described in this study, provides the potential for more substantial and definitive evaluations of educational interventions.
Hypothesis 2a posited that novices in the intervention (simulation) group will demonstrate improved proficiency when tested using the simulator immediately following the simulator-based training. This was based on the finding that surgical simulation training has the greatest benefit for novice trainees compared with experts.19,20 We observed improvement in both the task-based OSATS and global OSATS scores. Task-based OSATS share characteristics of checklists, 2 which are often used to ensure at least minimum standards. Global rating scores were most affected by the simulation; they tend to capture the process in a more holistic manner and may reflect fluency and higher-level skills, such as anticipation. Results detailing outcomes in task-specific and global rating scores following simulation training vs conventional methods have been conflicting.1,21,22 However, this study represents the largest prospective evaluation demonstrating improvement in both indicators.
The findings that novices in the intervention arm did not achieve higher scores on initial intraoperative OSATS were contrary to hypothesis 2b and that there was no difference between the groups for rate of achieving competency was contrary to hypothesis 3. There are several potential explanations for these findings.
Task Characteristics
It is possible that M&T skills are relatively simple and can be rapidly acquired regardless of training activities. Our results suggest that residents reach competency after a mean (SD) of 13 (11.6) cases.
The intraoperative evaluation occurred, on average, at the fourth surgical case; it is possible that the learning curve from the previous cases overwhelmed any skill differences that may have been present at the initial surgical case.
Data Characteristics
It is possible that time intervals between study enrollment and the first evaluated surgical M&T confounded results. The mean (SD) time between the simulation and the most recent live surgery was 3 (3.5) months (range, 0-14 months).
It is possible that enrollment numbers were insufficient to detect differences. Some residents may have reached competence earlier than indicated in the data, but this is undetectable because of lack of evaluations for each surgery.
The median number of live surgery evaluations was 2. Some residents with few evaluations may have reached competence, but subsequent evaluations are not available.
Simulator Characteristics
It is possible that the simulation intervention was too modest or unrealistic.
Several participants received evaluations indicating competence followed by subsequent ratings of not competent. This may reflect subjectivity by the rater, or it may reflect natural variations in skills over time. It is possible that a resident could perform competently on his or her third M&T in 1 day and not as well if the next opportunity for M&T placement involves a patient with more challenging anatomy or does not occur for a prolonged period of time. Notably, deterioration of benefits garnered from simulation training has been previously noted.23,24
When the full study group (both intervention and control subjects) is analyzed, subjects with higher OSATS scores developed competence more quickly. This finding is reflective of the current literature on this topic.1,25,26 A 1-point difference in global OSATS score translated into a 10% faster rate of gaining competence. This hazard ratio does not evaluate who is more likely to be competent but who is likely to gain competence faster. Because most residents are going to gain competence during their residency, it is impactful to evaluate whether the intervention sped up the rate at which residents became competent.
Residents appreciated the opportunity to practice but requested additional features for the simulator and additional guidance during their practice. It is likely that adding physical features such as external ears would increase the realism of the simulator. With the increasing availability of silicone molds and 3-dimensional printing,18,27 it may become easier to create such models in an inexpensive manner.
The most impactful limitation of this study was the low number of subjects, despite continuing enrollment for 3 years. The study was closed because of declining interest. Funding for rater or resident incentives, or the use of a program coordinator or research assistant to manage enrollment at each site, may have facilitated overall execution of the study. In addition, despite efforts to create a simple protocol, there was insufficient clarification of the protocol and inadequate investigator training. The protocol required a blinded rater so the local PI, who would likely be the most motivated to execute the study, was responsible for assigning subjects to the different arms and would have been recused from performing the ratings. The local PI would be the most likely to have watched the ratings videos and therefore best trained to execute the OSATS.
Although study participants were novice residents, there was a lack of information about resident surgical experience prior to inclusion in the study. The low-cost simulation model may not have been a sufficient training model, despite previous validation. Finally, raters were not required to review rater training videos.
Differences in the degree or quality of teaching during live surgeries could create variance within raters and within institution. A mixed-effects model, taking the rater into account, did not change the findings; however, this still could have influenced which students reached competence and the rate at which they did so.
Conclusions
A specialty-wide multi-institutional simulation study is challenging but feasible. Novices trained using a low-cost M&T simulator, compared with standard training, achieved higher scores on the simulator but not during initial intraoperative OSATS. Novices trained on the simulator did not reach competency sooner than those not trained on the simulator. The additional finding that residents with higher global OSATS scores developed competence more quickly aligns with other literature and may provide guidance for the design of future studies.
Footnotes
Acknowledgements
The authors appreciate the generous contributions of the participating otolaryngology faculty and residents, as well as the essential administrative support of the American Academy of Otolaryngology–Head and Neck Surgery Foundation. Faculty who contributed are listed alphabetically: Sarah N. Bowe, Jennifer Brinkmeier, Jessica Campbell, Swapna Chandran, Dary Costa, Earl Harley, Evelyne Kalyoussf, Brian Kellermeyer, Eleanor P. Kiell, Adrienne M. Laury, Meredith Lind, Diana Mahalbasic, Bruce H. Matt, Anna Messner, Ted A. Meyer, Michel Nassar, Shaun A. Nguyen, Carol Nhan, Huma Quraishi, Hassan Ramadan, Scott Walen, Drik K. Weitzel, Christina Yang, and Yu-Lan Mary Ying. The residents and additional faculty participated anonymously. We also appreciate the helpful suggestions provided by anonymous reviewers.
Paper presented at the Scientific Oral Presentations at the American Academy of Otolaryngology–Head and Neck Surgery 2019 Annual Meeting; September 16, 2019; New Orleans, Louisiana.
