Multi-institutional Study of Otolaryngology Resident Intraoperative Experiences for Key Indicator Procedures

Abstract

Objective

There is concern that current otolaryngology residents may not receive adequate surgical training. We aimed to characterize residents’ surgical experiences at 5 academic centers performing the 14 key indicator procedures (KIPs) outlined by the Accreditation Council for Graduate Medical Education.

Study Design

Prospective study.

Setting

Five otolaryngology training programs.

Methods

Data were gathered from December 2019 to December 2020 with a smartphone application from the Society for Improving Medical Professional Learning. After each operation, residents and faculty rated trainee autonomy on a 4-level Zwisch scale and performance on a 5-level modified Dreyfus scale.

Results

Residents and attendings (n = 92 and 78, respectively) logged 2984 evaluations. Attending ratings of resident autonomy and performance increased with training level (P < .001). Resident self-assessments of autonomy and performance were lower than paired attending assessments (P < .001). Among attending evaluations of KIPs performed by senior residents (postgraduate year 4 or 5), 55% of cases were performed with meaningful autonomy (passive help or supervision only). Similarly, attendings rated 55% of these cases as a practice-ready or exceptional performance. Senior residents had meaningful autonomy for ≥50% of cases for most KIPs, with the exception of flaps and grafts (40%), pediatric/adult airway (39%), and stapedectomy/ossiculoplasty (33%). Similarly, senior residents received practice-ready or exceptional performance ratings for ≥50% of cases across all KIPs other than pediatric/adult airway (42%) and stapedectomy/ossiculoplasty (33%).

Conclusion

In this multicenter study, resident surgical autonomy and performance varied across otolaryngology KIPs. The development of nationwide benchmarks will help programs and residents set educational goals.

Level of evidence

Keywords

surgical education competency residency training autonomy performance

The operative experience in modern surgical training has changed amid growing emphasis on patient safety, adequate trainee supervision, and compliance with duty hour restrictions. Recent studies have sounded an alarm: a survey of general surgery programs suggested that up 25% of graduating residents are not confident performing a variety of open surgical procedures.¹ The number of procedures performed has expanded, while the number of operations that residents perform has stayed roughly the same.² Other than the Accreditation Council for Graduate Medical Education (ACGME) key indicator procedure (KIP) minima and the basic skills descriptions embedded within ACGME Milestones criteria, there is currently little standardization of operative experiences among surgical training programs.

In the face of growing concerns about the surgical confidence and competence of graduating trainees, there exists a need to develop new instruments to evaluate and track surgical experiences. A novel smartphone-based application, SIMPL (Society for Improving Medical Professional Learning), has allowed for rapid, structured assessments of resident surgical autonomy and operative performance. The use of this tool has been adapted for numerous surgical specialties^3,4 and has been piloted at a single otolaryngology residency.^5,6 Herein, we sought to quantitatively characterize the state of otolaryngology training nationally in a multicenter study of resident operative experiences.

Methods

We prospectively collected operative assessments of surgical trainees between December 2019 and December 2020 on the SIMPL smartphone application. Participants at each of 5 centers underwent standardized rater training sessions, either in person or with a prerecorded video. These 1-hour sessions were previously demonstrated to be sufficient to train surgeons to reliably use SIMPL.⁷ For each procedure, participants were asked to assess the complexity of the case relative to similar cases in tertiles (easiest, average, and hardest). Residents and faculty were also asked to assess the level of autonomy and performance achieved by the resident during each operation. Autonomy was rated on a 4-level Zwisch scale: show and tell, active help, passive help, and supervision only (see Table 1 for descriptions).^5,8 Meaningful autonomy was defined as passive help or supervision only.³ If autonomy was rated higher than show and tell, performance was evaluated on a 5-level modified Dreyfus scale: unprepared/critical deficiency, inexperienced with procedure, intermediate performance, practice-ready performance, exceptional performance (see Table 1 for descriptions).⁵ For cases with multiple unrelated components (eg, mastoidectomy performed with a parotidectomy), participants were asked to log only 1 of these procedures as the focus of each assessment. At the end of the evaluation, attendings could provide an audio recording of feedback to the resident about the case. All assessments were completed or expired within 72 hours of the reported procedure time, as prior research has described the degradation of evaluation quality after this time.⁹ When residents increased their postgraduate year (PGY) standing in July of the study, this was reflected in the application database. This study was deemed exempt from review by the Institutional Review Board at each study site (Massachusetts Eye and Ear, University of California San Francisco, Boston University, University of Michigan, Mount Sinai Hospital).

Table 1.

Levels of Resident Autonomy and Performance Measured by the SIMPL Smartphone Application.^a

Zwisch scale of operative autonomy
Level		Characteristic
1	Show and tell	• Attending does the majority of key portions of the procedure, narrating the case and the anatomy Resident opens, closes, first assists
2	Active help	• Attending leads the case for >50% of key portions, identifying anatomy, optimizing the surgical field, and coaching technical skills Resident actively assists and practices component skills
3	Passive help	• Attending follows the resident’s lead for >50% of key portions and coaches for refinement Resident recognizes transition points in the procedure and accomplishes next steps
4	Supervision only	• Attending gives no unsolicited advice for >50% of key portions, monitoring for safety Resident mimics independence and recovers from most errors
Modified Dreyfus scale of operative performance
1	Unprepared/critical deficiency	• Resident poorly prepared to perform the procedure
2	Inexperienced with procedure	• Resident is experienced with the procedure with frequent problems in technique, execution, planning
3	Intermediate performance	• Resident is at an intermediate phase of development; performance of procedural elements is variable but acceptable for the amount of experience with the procedure
4	Practice-ready performance	• Resident is able to perform the procedure safely and independently
5	Exceptional performance	• Resident performs above the level expected of graduating residents

Abbreviation: SIMPL, Society for Improving Medical Professional Learning.

Adapted from Chen et al.⁶

Statistical Analysis

Descriptive statistics were used to examine SIMPL data. Correlations between PGY level and studied outcomes (attending ratings of autonomy, performance, case complexity) were calculated with a Spearman’s rank correlation coefficient. Attending surgeons’ ratings were used in analyses unless otherwise indicated. A comparison of resident self-evaluations against attending ratings was conducted via Wilcoxon matched-pairs signed rank tests. PGY 1 residents were excluded from analyses where indicated due to limited data collection. Statistical analyses were performed with Prism version 7.01 (GraphPad).

Results

A total of 2984 evaluations were logged by 92 residents and 78 attendings, covering 367 types of procedures. Residents logged 1746 evaluations and attendings logged 1238. Each resident performed a median 10.5 ratings (range, 1-86, right skew); each attending performed a median 10 (range, 1-83, right skew). Fifty-four percent of residents and 55% of faculty logged at least 10 cases.

Across all cases, attending ratings of resident operative autonomy were positively correlated with level of training (Spearman’s ρ = 0.36, P < .001; Figure 1A ). Similarly, ratings of resident operative performance were positively correlated with level of training (Spearman’s ρ = 0.25, P < .001; Figure 1B ). Participants rated 11% (n = 339) of all cases as among the easiest one-third in case complexity, 68% (n = 2020) as average, and 21% (n = 625) as hardest. There was a significant correlation between attending ratings of case complexity and level of training (Spearman’s ρ = 0.15, P < .001; Figure 1C ).

Figure 1.

Attending surgeon ratings for (A) resident autonomy, (B) performance, and (C) case complexity across training levels. Postgraduate year 1 (PGY 1) was excluded as the number was small. *Not applicable: performance ratings not solicited if autonomy was rated “show and tell.”

Among all cases performed by senior residents (PGY 4 or 5), attendings reported that residents achieved meaningful autonomy (passive help or supervision only) in 62% of cases and practice-ready or exceptional performance ratings in 61%.

Key Indicator Procedures

Attendings evaluated 517 cases classified by the ACGME as otolaryngology KIPs, of which 276 were performed by senior residents (PGY 4 and 5). Attending ratings of resident operative autonomy and performance in KIPs were positively correlated with training level (autonomy, Spearman’s ρ = 0.46, P < .001; performance, Spearman’s ρ = 0.43, P < .001). Attending ratings of case complexity for KIPs were also correlated with training level (Spearman’s ρ = 0.11, P = .01).

Senior residents (PGY 4 or 5) achieved meaningful autonomy (passive help or supervision only) in 55% of KIP cases ( Figure 2A ); similarly, they received practice-ready or exceptional performance ratings for 55% ( Figure 2B ). Senior residents achieved meaningful autonomy for ≥50% of cases for most KIPs with the exception of flaps and grafts (40%), pediatric/adult airway (39%), and stapedectomy/ossiculoplasty (33%). Senior residents received practice-ready or exceptional performance ratings for ≥50% of cases across KIPs other than pediatric/adult airway (42%) and stapedectomy/ossiculoplasty (33%).

Figure 2.

Percentage of key indicator procedures performed by postgraduate year 4 and 5 (PGY 4 and 5) residents with (A) meaningful autonomy (passive help or supervision only) or (B) at least practice-ready operative performance in attending surgeon ratings. M/M, mandible/midface.

Resident Self-assessments

An overall 1181 cases with paired evaluations were completed by residents and attendings. Resident self-assessment of autonomy, performance, and case complexity were all statistically significantly lower than paired attending ratings (P < .001 for all). This difference was significant across all analyzed PGY levels (PGY 2-5, P < .05 for each) and existed for ratings of autonomy and performance. For assessments of case complexity, PGY 2 and 3 residents’ ratings were not significantly different from paired attending ratings (P > .05 for each), whereas PGY 4 and 5 residents tended to underestimate the difficulty of cases as compared with attendings (P < .001 and P = .02, respectively).

Discussion

This is the largest prospective multi-institutional study tracking otolaryngology surgical training experiences to date. Senior residents (PGY 4 and 5) in this cohort attained meaningful autonomy and achieved practice-ready performance for the majority (61%) of all logged procedures, including ACGME-defined KIPs. However, residents had a lower proportion of cases with meaningful autonomy for specific KIPs, such as flaps and grafts, pediatric/adult airway, and stapedectomy/ossiculoplasty. Larger studies will allow subsets of senior residents to be studied, such as those in their last 3 to 6 months of residency or those going into fellowships versus general practice. The use of a novel assessment strategy like SIMPL to track resident operative experiences facilitates the identification of such areas for possible targeted educational interventions. Such interventions could include changes to case assignments to emphasize specific procedures for certain learners, feedback to faculty on areas in which residents broadly need further experience, and supplemental educational activities in the cadaver or simulation laboratory.

The ability to track operative experiences will be vital to the modernization of surgical training. Currently, residency program directors and accreditation bodies are tasked with overseeing the progression of residents’ surgical skills with limited information about their day-to-day operative experiences. Individual residents also have limited ability to reflect on the longitudinal trajectory of their surgical experiences, making it difficult to set goals and create learning plans. Such information would complement case logs, which are commonly used to evaluate programs yet actually may be inaccurate and uncorrelated with actual competency.^10,11 More detailed assessments, such as Objective Structured Assessments of Technical Skills, are infrequently conducted at programs that use them and may not be directly comparable to the scope of training experiences that SIMPL aims to track. In the field of general surgery, for example, SIMPL is currently used at 68 residency programs, allowing for pooling of data across institutions, which could drive research initiatives and policy decisions.^12,13

By using SIMPL to compare personal assessments with regional and national benchmarks, residents who need remediation may be more readily identified by themselves and their program directors; this is especially important as studies have shown that trainees who perform the worst in multiprofessional and surgical skills assessments are also the least able to identify their weaknesses.^14,15 A recent survey of otolaryngology program directors indicated that while 37.5% of programs have residents who require remediation,¹⁶ there is currently no consensus on what constitutes an evidence-based remediation strategy. The ability to track assessments over time would allow trainees and program directors to identify specific areas for improvement. Attending surgeons and program directors would additionally receive actionable feedback. For example, attending surgeons may learn that they give significantly less autonomy than other colleagues, or a program may discover that it has a deficit of experience in particular procedures or subspecialties.

In addition to the identification of low-performing outliers, the ability to compare attending and trainee assessments is critical. This study corroborates the existing literature that surgical residents, particularly senior residents, consistently underestimate their own skills as compared with attending surgeons’ assessments,^6,15 which is problematic since self-reflection is critical for personal improvement. Interestingly, the present study found that senior residents also underestimated the difficulty of their cases as compared with paired attending ratings. We suspect that this may be a result of where these residents train, at large academic centers, where the average case complexity is already high. Senior residents are routinely assigned to the most advanced cases and can lose perspective to how complex these cases are relative to the full spectrum of such procedures in the community. The ability to generate reliable self-assessments should be deliberately refined during training, as it has implications for the self-confidence, self-regulation, and long-term skills development of an independent practitioner. Novel educational instruments that longitudinally compare self-assessments with those of trusted surgical mentors are a possible first step toward cultivating this skill.

Study Limitations

A number of limitations exist in our study design and timing. First, this study was conducted at 5 large academic training centers. Our results may not generalize to other residency programs with different attributes (residency size, case mix, academic versus community setting, etc). Second, data logged could be confounded by elements of recall bias due to challenges of separating the procedure to be evaluated from the context of the larger surgical procedure performed. For example, a resident who performs the initial mastoidectomy during surgery with what ultimately becomes a difficult translabyrinthine approach to the skull base may evaluate one’s autonomy and performance by memory of the more difficult portions of the case. Furthermore, as this study found, resident and attending ratings of operative experiences are not always equivalent. This is a limitation inherent to all research on educational assessments. Third, all evaluations were voluntary and could have been susceptible to selection bias. It is unknown what proportion of all cases performed by residents/attendings were logged on SIMPL and whether evaluated cases were representative. Fourth, this study used SIMPL, which is an application that cannot be customized in its assessment design. We chose to use this application for an annual fee (presently ~$2000 for a program of 25 residents) for its useful features rather than designing our own data collection instrument. For example, residents and faculty could visualize graphs of their autonomy/performance on the interface. By using SIMPL, we were also contributing to the wider research collaborative that now spans >120 training programs in the United States across surgical specialties.^3,4,17,18 Last, this study occurred during the SARS-CoV-2/COVID-19 pandemic, which may have had significant effects on resident training, such as reductions in overall case numbers. Therefore, this study may underestimate the average trainee’s operative assessments during a nonpandemic year, a conclusion that may be reassuring to many surgical educators who are concerned about the impact of COVID-19 on surgical training. However, 1 of the study sites (Massachusetts Eye and Ear) participated in a single-center pilot study with SIMPL during a prepandemic year with comparable outcomes.⁶

Conclusion

In this multi-institutional study, senior otolaryngology residents achieved ratings of meaningful surgical autonomy and practice-ready performance in the majority of cases logged for most KIPs. The future development of nationwide benchmarks will help programs and residents set specific educational goals and personalize training plans.

Footnotes

Acknowledgements

We are profoundly grateful to all the residents and faculty who participated in this study at each institution and to the faculty, trainees, and institutional members who collectively sustain the work of the Society for Improving Medical Professional Learning.

This article will be presented at the AAO-HNSF Annual Meeting & OTO Experience; October 6, 2021; Los Angeles, California.

Author Contributions

Jenny X. Chen, study conception and design, data collection, data analysis, manuscript writing, approval of final submission; Francis Deng, study design, data analysis, manuscript writing, approval of final submission; Andrey Filimonov, study design, data collection, manuscript writing, approval of final submission; Elizabeth Shuman, study design, data collection, manuscript revising, approval of final submission; Emily Marchiano, study design, data collection, manuscript revising, approval of final submission; Brian George, study design, data analysis, manuscript revising, approval of final submission; Marc Thorne, study design, data collection, manuscript revising, approval of final submission; Steven Pletcher, study design, data collection, manuscript revising, approval of final submission; Michael Platt, study design, data collection, manuscript revising, approval of final submission; Marita Teng, study design, data collection, manuscript revising, approval of final submission; Elliott D. Kozin, study conception and design, data collection, manuscript revising, approval of final submission; Stacey T. Gray, study conception and design, data collection, manuscript revising, approval of final submission.

Disclosures

Competing interests: None.

Sponsorships: None.

Funding source: American Academy of Otolaryngology–Head and Neck Surgery Foundation Women in Otolaryngology Research Grant (2019).

References

Fonseca

Reddy

Longo

Gusberg

. Graduating general surgery resident operative confidence: perspective from a national survey. J Surg Res. 2014;190(2):419-428. doi:10.1016/j.jss.2014.05.014

Malangoni

Biester

Jones

Klingensmith

Lewis

. Operative experience of surgery residents: trends and challenges. J Surg Educ. 2013;70(6):783-788. doi:10.1016/j.jsurg.2013.09.015

George

Bohnen

Williams

, et al; Procedural Learning and Safety Collaborative. Readiness of US general surgery residents for independent practice. Ann Surg. 2017;266(4):582-594. doi:10.1097/SLA.0000000000002414

Kaban

Cappetta

George

Lahey

Bohnen

Troulis

. Evaluation of oral and maxillofacial surgery residents’ operative skills: feasibility and engagement study using SIMPL software for a mobile phone. J Oral Maxillofac Surg. 2017;75(10):2041-2047. doi:10.1016/j.joms.2017.05.036

Chen

Kozin

Bohnen

, et al. Assessments of otolaryngology resident operative experiences using mobile technology: a pilot study. Otolaryngol Head Neck Surg. 2019;161(5):939-945. doi:10.1177/0194599819868165

Chen

Kozin

Bohnen

, et al. Tracking operative autonomy and performance in otolaryngology training using smartphone technology: a single institution pilot study. Laryngoscope Investig Otolaryngol. 2019;4(6):578-586. doi:10.1002/lio2.323

George

Teitelbaum

Darosa

, et al. Duration of faculty training needed to ensure reliable OR performance ratings. J Surg Educ. 2013;70(6):703-708. doi:10.1016/j.jsurg.2013.06.015

George

Teitelbaum

Meyerson

, et al. Reliability, validity, and feasibility of the Zwisch scale for the assessment of intraoperative performance. J Surg Educ. 2014;71(6):e90-e96. doi:10.1016/j.jsurg.2014.06.018

Williams

Chen

Sanfey

Markwell

Mellinger

Dunnington

. The measured effect of delay in completing operative performance ratings on clarity and detail of ratings assigned. J Surg Educ. 2014;71(6):e132-e138. doi:10.1016/j.jsurg.2014.06.015

10.

Shah

Haisch

Noland

. Case reporting, competence, and confidence: a discrepancy in the numbers. J Surg Educ. 2018;75(2):304-312. doi:10.1016/j.jsurg.2018.01.007

11.

Collins

Dudas

Johnson

, et al. ACGME operative case log accuracy varies among surgical programs. J Surg Educ. 2020;77(6):e78-e85. doi:10.1016/j.jsurg.2020.08.045

12.

Meyerson

Odell

Zwischenberger

, et al; Procedural Learning and Safety Collaborative. The effect of gender on operative autonomy in general surgery residents. Surgery. 2019;166(5):738-743. doi:10.1016/j.surg.2019.06.006

13.

Bohnen

George

Zwischenberger

, et al. Trainee autonomy in minimally invasive general surgery in the United States: establishing a national benchmark. J Surg Educ. 2020;77(6):e52-e62. doi:10.1016/j.jsurg.2020.07.033

14.

Lipsett

Harris

Downing

. Resident self-other assessor agreement: influence of assessor, competency, and performance level. Arch Surg. 2011;146(8):901-906. doi:10.1001/archsurg.2011.172

15.

Kendrick

Clark

Fischer

Bohnen

Kim

George

. The reliability of resident self-evaluation of operative performance. Am J Surg. Published online December 3, 2020. doi:10.1016/j.amjsurg.2020.11.054

16.

Brown

Thompson

Bhatti

. Assessment of operative competency in otolaryngology residency: survey of US program directors. Laryngoscope. 2008;118(10):1761-1764. doi:10.1097/MLG.0b013e31817e2c62

17.

Kobraei

Bohnen

George

, et al. Uniting evidence-based evaluation with the ACGME Plastic Surgery Milestones: a simple and reliable assessment of resident operative performance. Plast Reconstr Surg. 2016;138(2):349e-357e. doi:10.1097/PRS.0000000000002411

18.

Meyerson

Sternbach

Zwischenberger

Bender

. Resident autonomy in the operating room: expectations versus reality. Ann Thorac Surg. 2017;104(3):1062-1068. doi:10.1016/j.athoracsur.2017.05.034