Abstract
Objective
There is concern that current otolaryngology residents may not receive adequate surgical training. We aimed to characterize residents’ surgical experiences at 5 academic centers performing the 14 key indicator procedures (KIPs) outlined by the Accreditation Council for Graduate Medical Education.
Study Design
Prospective study.
Setting
Five otolaryngology training programs.
Methods
Data were gathered from December 2019 to December 2020 with a smartphone application from the Society for Improving Medical Professional Learning. After each operation, residents and faculty rated trainee autonomy on a 4-level Zwisch scale and performance on a 5-level modified Dreyfus scale.
Results
Residents and attendings (n = 92 and 78, respectively) logged 2984 evaluations. Attending ratings of resident autonomy and performance increased with training level (P < .001). Resident self-assessments of autonomy and performance were lower than paired attending assessments (P < .001). Among attending evaluations of KIPs performed by senior residents (postgraduate year 4 or 5), 55% of cases were performed with meaningful autonomy (passive help or supervision only). Similarly, attendings rated 55% of these cases as a practice-ready or exceptional performance. Senior residents had meaningful autonomy for ≥50% of cases for most KIPs, with the exception of flaps and grafts (40%), pediatric/adult airway (39%), and stapedectomy/ossiculoplasty (33%). Similarly, senior residents received practice-ready or exceptional performance ratings for ≥50% of cases across all KIPs other than pediatric/adult airway (42%) and stapedectomy/ossiculoplasty (33%).
Conclusion
In this multicenter study, resident surgical autonomy and performance varied across otolaryngology KIPs. The development of nationwide benchmarks will help programs and residents set educational goals.
Level of evidence
2.
The operative experience in modern surgical training has changed amid growing emphasis on patient safety, adequate trainee supervision, and compliance with duty hour restrictions. Recent studies have sounded an alarm: a survey of general surgery programs suggested that up 25% of graduating residents are not confident performing a variety of open surgical procedures. 1 The number of procedures performed has expanded, while the number of operations that residents perform has stayed roughly the same. 2 Other than the Accreditation Council for Graduate Medical Education (ACGME) key indicator procedure (KIP) minima and the basic skills descriptions embedded within ACGME Milestones criteria, there is currently little standardization of operative experiences among surgical training programs.
In the face of growing concerns about the surgical confidence and competence of graduating trainees, there exists a need to develop new instruments to evaluate and track surgical experiences. A novel smartphone-based application, SIMPL (Society for Improving Medical Professional Learning), has allowed for rapid, structured assessments of resident surgical autonomy and operative performance. The use of this tool has been adapted for numerous surgical specialties3,4 and has been piloted at a single otolaryngology residency.5,6 Herein, we sought to quantitatively characterize the state of otolaryngology training nationally in a multicenter study of resident operative experiences.
Methods
We prospectively collected operative assessments of surgical trainees between December 2019 and December 2020 on the SIMPL smartphone application. Participants at each of 5 centers underwent standardized rater training sessions, either in person or with a prerecorded video. These 1-hour sessions were previously demonstrated to be sufficient to train surgeons to reliably use SIMPL. 7 For each procedure, participants were asked to assess the complexity of the case relative to similar cases in tertiles (easiest, average, and hardest). Residents and faculty were also asked to assess the level of autonomy and performance achieved by the resident during each operation. Autonomy was rated on a 4-level Zwisch scale: show and tell, active help, passive help, and supervision only (see Table 1 for descriptions).5,8 Meaningful autonomy was defined as passive help or supervision only. 3 If autonomy was rated higher than show and tell, performance was evaluated on a 5-level modified Dreyfus scale: unprepared/critical deficiency, inexperienced with procedure, intermediate performance, practice-ready performance, exceptional performance (see Table 1 for descriptions). 5 For cases with multiple unrelated components (eg, mastoidectomy performed with a parotidectomy), participants were asked to log only 1 of these procedures as the focus of each assessment. At the end of the evaluation, attendings could provide an audio recording of feedback to the resident about the case. All assessments were completed or expired within 72 hours of the reported procedure time, as prior research has described the degradation of evaluation quality after this time. 9 When residents increased their postgraduate year (PGY) standing in July of the study, this was reflected in the application database. This study was deemed exempt from review by the Institutional Review Board at each study site (Massachusetts Eye and Ear, University of California San Francisco, Boston University, University of Michigan, Mount Sinai Hospital).
Levels of Resident Autonomy and Performance Measured by the SIMPL Smartphone Application. a
Abbreviation: SIMPL, Society for Improving Medical Professional Learning.
Adapted from Chen et al. 6
Statistical Analysis
Descriptive statistics were used to examine SIMPL data. Correlations between PGY level and studied outcomes (attending ratings of autonomy, performance, case complexity) were calculated with a Spearman’s rank correlation coefficient. Attending surgeons’ ratings were used in analyses unless otherwise indicated. A comparison of resident self-evaluations against attending ratings was conducted via Wilcoxon matched-pairs signed rank tests. PGY 1 residents were excluded from analyses where indicated due to limited data collection. Statistical analyses were performed with Prism version 7.01 (GraphPad).
Results
A total of 2984 evaluations were logged by 92 residents and 78 attendings, covering 367 types of procedures. Residents logged 1746 evaluations and attendings logged 1238. Each resident performed a median 10.5 ratings (range, 1-86, right skew); each attending performed a median 10 (range, 1-83, right skew). Fifty-four percent of residents and 55% of faculty logged at least 10 cases.
Across all cases, attending ratings of resident operative autonomy were positively correlated with level of training (Spearman’s ρ = 0.36, P < .001; Figure 1A ). Similarly, ratings of resident operative performance were positively correlated with level of training (Spearman’s ρ = 0.25, P < .001; Figure 1B ). Participants rated 11% (n = 339) of all cases as among the easiest one-third in case complexity, 68% (n = 2020) as average, and 21% (n = 625) as hardest. There was a significant correlation between attending ratings of case complexity and level of training (Spearman’s ρ = 0.15, P < .001; Figure 1C ).

Attending surgeon ratings for (A) resident autonomy, (B) performance, and (C) case complexity across training levels. Postgraduate year 1 (PGY 1) was excluded as the number was small. *Not applicable: performance ratings not solicited if autonomy was rated “show and tell.”
Among all cases performed by senior residents (PGY 4 or 5), attendings reported that residents achieved meaningful autonomy (passive help or supervision only) in 62% of cases and practice-ready or exceptional performance ratings in 61%.
Key Indicator Procedures
Attendings evaluated 517 cases classified by the ACGME as otolaryngology KIPs, of which 276 were performed by senior residents (PGY 4 and 5). Attending ratings of resident operative autonomy and performance in KIPs were positively correlated with training level (autonomy, Spearman’s ρ = 0.46, P < .001; performance, Spearman’s ρ = 0.43, P < .001). Attending ratings of case complexity for KIPs were also correlated with training level (Spearman’s ρ = 0.11, P = .01).
Senior residents (PGY 4 or 5) achieved meaningful autonomy (passive help or supervision only) in 55% of KIP cases ( Figure 2A ); similarly, they received practice-ready or exceptional performance ratings for 55% ( Figure 2B ). Senior residents achieved meaningful autonomy for ≥50% of cases for most KIPs with the exception of flaps and grafts (40%), pediatric/adult airway (39%), and stapedectomy/ossiculoplasty (33%). Senior residents received practice-ready or exceptional performance ratings for ≥50% of cases across KIPs other than pediatric/adult airway (42%) and stapedectomy/ossiculoplasty (33%).

Percentage of key indicator procedures performed by postgraduate year 4 and 5 (PGY 4 and 5) residents with (A) meaningful autonomy (passive help or supervision only) or (B) at least practice-ready operative performance in attending surgeon ratings. M/M, mandible/midface.
Resident Self-assessments
An overall 1181 cases with paired evaluations were completed by residents and attendings. Resident self-assessment of autonomy, performance, and case complexity were all statistically significantly lower than paired attending ratings (P < .001 for all). This difference was significant across all analyzed PGY levels (PGY 2-5, P < .05 for each) and existed for ratings of autonomy and performance. For assessments of case complexity, PGY 2 and 3 residents’ ratings were not significantly different from paired attending ratings (P > .05 for each), whereas PGY 4 and 5 residents tended to underestimate the difficulty of cases as compared with attendings (P < .001 and P = .02, respectively).
Discussion
This is the largest prospective multi-institutional study tracking otolaryngology surgical training experiences to date. Senior residents (PGY 4 and 5) in this cohort attained meaningful autonomy and achieved practice-ready performance for the majority (61%) of all logged procedures, including ACGME-defined KIPs. However, residents had a lower proportion of cases with meaningful autonomy for specific KIPs, such as flaps and grafts, pediatric/adult airway, and stapedectomy/ossiculoplasty. Larger studies will allow subsets of senior residents to be studied, such as those in their last 3 to 6 months of residency or those going into fellowships versus general practice. The use of a novel assessment strategy like SIMPL to track resident operative experiences facilitates the identification of such areas for possible targeted educational interventions. Such interventions could include changes to case assignments to emphasize specific procedures for certain learners, feedback to faculty on areas in which residents broadly need further experience, and supplemental educational activities in the cadaver or simulation laboratory.
The ability to track operative experiences will be vital to the modernization of surgical training. Currently, residency program directors and accreditation bodies are tasked with overseeing the progression of residents’ surgical skills with limited information about their day-to-day operative experiences. Individual residents also have limited ability to reflect on the longitudinal trajectory of their surgical experiences, making it difficult to set goals and create learning plans. Such information would complement case logs, which are commonly used to evaluate programs yet actually may be inaccurate and uncorrelated with actual competency.10,11 More detailed assessments, such as Objective Structured Assessments of Technical Skills, are infrequently conducted at programs that use them and may not be directly comparable to the scope of training experiences that SIMPL aims to track. In the field of general surgery, for example, SIMPL is currently used at 68 residency programs, allowing for pooling of data across institutions, which could drive research initiatives and policy decisions.12,13
By using SIMPL to compare personal assessments with regional and national benchmarks, residents who need remediation may be more readily identified by themselves and their program directors; this is especially important as studies have shown that trainees who perform the worst in multiprofessional and surgical skills assessments are also the least able to identify their weaknesses.14,15 A recent survey of otolaryngology program directors indicated that while 37.5% of programs have residents who require remediation, 16 there is currently no consensus on what constitutes an evidence-based remediation strategy. The ability to track assessments over time would allow trainees and program directors to identify specific areas for improvement. Attending surgeons and program directors would additionally receive actionable feedback. For example, attending surgeons may learn that they give significantly less autonomy than other colleagues, or a program may discover that it has a deficit of experience in particular procedures or subspecialties.
In addition to the identification of low-performing outliers, the ability to compare attending and trainee assessments is critical. This study corroborates the existing literature that surgical residents, particularly senior residents, consistently underestimate their own skills as compared with attending surgeons’ assessments,6,15 which is problematic since self-reflection is critical for personal improvement. Interestingly, the present study found that senior residents also underestimated the difficulty of their cases as compared with paired attending ratings. We suspect that this may be a result of where these residents train, at large academic centers, where the average case complexity is already high. Senior residents are routinely assigned to the most advanced cases and can lose perspective to how complex these cases are relative to the full spectrum of such procedures in the community. The ability to generate reliable self-assessments should be deliberately refined during training, as it has implications for the self-confidence, self-regulation, and long-term skills development of an independent practitioner. Novel educational instruments that longitudinally compare self-assessments with those of trusted surgical mentors are a possible first step toward cultivating this skill.
Study Limitations
A number of limitations exist in our study design and timing. First, this study was conducted at 5 large academic training centers. Our results may not generalize to other residency programs with different attributes (residency size, case mix, academic versus community setting, etc). Second, data logged could be confounded by elements of recall bias due to challenges of separating the procedure to be evaluated from the context of the larger surgical procedure performed. For example, a resident who performs the initial mastoidectomy during surgery with what ultimately becomes a difficult translabyrinthine approach to the skull base may evaluate one’s autonomy and performance by memory of the more difficult portions of the case. Furthermore, as this study found, resident and attending ratings of operative experiences are not always equivalent. This is a limitation inherent to all research on educational assessments. Third, all evaluations were voluntary and could have been susceptible to selection bias. It is unknown what proportion of all cases performed by residents/attendings were logged on SIMPL and whether evaluated cases were representative. Fourth, this study used SIMPL, which is an application that cannot be customized in its assessment design. We chose to use this application for an annual fee (presently ~$2000 for a program of 25 residents) for its useful features rather than designing our own data collection instrument. For example, residents and faculty could visualize graphs of their autonomy/performance on the interface. By using SIMPL, we were also contributing to the wider research collaborative that now spans >120 training programs in the United States across surgical specialties.3,4,17,18 Last, this study occurred during the SARS-CoV-2/COVID-19 pandemic, which may have had significant effects on resident training, such as reductions in overall case numbers. Therefore, this study may underestimate the average trainee’s operative assessments during a nonpandemic year, a conclusion that may be reassuring to many surgical educators who are concerned about the impact of COVID-19 on surgical training. However, 1 of the study sites (Massachusetts Eye and Ear) participated in a single-center pilot study with SIMPL during a prepandemic year with comparable outcomes. 6
Conclusion
In this multi-institutional study, senior otolaryngology residents achieved ratings of meaningful surgical autonomy and practice-ready performance in the majority of cases logged for most KIPs. The future development of nationwide benchmarks will help programs and residents set specific educational goals and personalize training plans.
Footnotes
Acknowledgements
We are profoundly grateful to all the residents and faculty who participated in this study at each institution and to the faculty, trainees, and institutional members who collectively sustain the work of the Society for Improving Medical Professional Learning.
This article will be presented at the AAO-HNSF Annual Meeting & OTO Experience; October 6, 2021; Los Angeles, California.
