Development and Validation of Instrument for Operative Competency Assessment in Selective Neck Dissection

Abstract

Background:

Instruments to assess surgical skills have been validated for several key indicator procedures in otolaryngology. Selective neck dissection is a core procedure for which trainees must integrate knowledge of complex head and neck anatomy with technical surgical skills. An instrument for assessment of surgical performance in selective neck dissection has not been previously developed. The objective of the current study is to develop and validate an instrument for assessing surgical competency for level II-IV selective neck dissection.

Design:

A Delphi working group comprised of 23 fellowship trained head and neck surgeons from 17 institutions was assembled. The modified Delphi method encompassed a 3-step process, including 2 anonymous voting rounds to successively refine individual items and establish levels of consensus. Thresholds for achieving strong consensus, at >80% agreement, were determined a priori. The resulting instrument was subsequently validated in a prospective cohort of 17 resident surgeons, spanning postgraduate year 1 to 5 training experience. Participants were asked to perform a level II-IV selective neck dissection on fresh-frozen cadaveric specimens. Performance was scored by 2 independent, blinded observers using the devised instrument and construct validity was assessed.

Results:

Through the modified Delphi process a final list of 30 items, considered to be the most essential items for achieving the goals of a level II-IV selective neck dissection, was developed. Construct validity was supported by a positive association between instrument scores compared to both resident postgraduate year level and number of head and neck rotations completed.

Conclusion:

The development and validation of a novel instrument for assessment of surgical competency in level II-IV selective neck dissection, a key indicator case in otolaryngology, is described. This new instrument may be used to provide objective feedback on overall and task-specific competency to identify surgical deficiencies and offer granular feedback to enhance surgical training.

Keywords

operative competency instrument otolaryngology head and neck miscellaneous neck dissection surgical training

Introduction

Objective assessment of trainees’ abilities to perform a set of core surgical procedures safely, efficiently, and independently is a challenging but necessary task for surgical training programs. The Accreditation Council for Graduate Medical Education (ACGME) model of competency-based medical education (CBME) evaluates programs based on trainees meeting defined milestones in 6 Core Competencies encompassing patient care and medical knowledge. There has been an increasing drive to incorporate objective measures of surgical competency into this model. This motivation is further intensified by clinical and trainee constraints, such as limiting concurrent surgery and duty-hour restrictions, and reduced case numbers associated with the COVID-19 global pandemic. Training programs are being encouraged to implement objective assessment tools that not only help trainees safely and efficiently achieve competency, but also serve to evaluate overall program performance and facilitate granular feedback.¹

There is currently no widely accepted standard for assessing surgical skill development and procedural competency. Commonly used methods of evaluation include case log review, informal work-based assessment, and subjective end-of-rotation evaluation.^2,3 While case logs show quantitative experience, they are not reliable proxies for surgical competency. Furthermore, differences in case-log entry may reflect variance in resident coding practices rather than true disparity in surgical experience. End-of-rotation evaluations rely on faculty opinion and recollection of trainee performance. These can be influenced by confirmation bias, recall bias, or “halo or horn” effect and have shown poor reliability and validity—faculties’ global perceptions of resident performance during rotations may influence specific scoring of surgical acumen.⁴ In addition, these methods often do not provide sufficiently granular feedback on specific deficiencies that could prompt trainee-specific adjustments and remediation. A more ideal instrument would objectively measure performance in surgical skills and provide reproducible, consistent, and actionable feedback.

Current objective measures use global rating scales (GRSs) and task-based checklists (TBCs). Previously validated GRSs are generalizable to any surgical procedure, assessing fundamental skills such as respect for tissue, appropriate instrument selection, and economy of movement. While these generic instruments can assess general surgical skills, they lack the ability to assess performance of discrete procedural steps. Task-based checklists account for this by assessing a surgeon’s ability to effectively navigate individual steps within a specific procedure. Originally developed for general surgery procedures, Objective Structured Assessments of Technical Skills (OSATS) are performance-based evaluation tools for assessing surgical competence. These involve a combination of GRSs and TBCs for evaluation of surgeon competency either in live animal models or bench models.⁵ Since the original inception, OSATS have expanded to the operating room setting and have been successfully adopted into training curricula of specialties including obstetrics and gynecology, general surgery, and orthopedic surgery.^6-8

Given the heterogeneous subspecialties within otolaryngology, many unique technical skills are required to perform procedures within each domain of the field. Instruments to assess otolaryngology surgical skills have been previously validated for endoscopic sinus surgery, thyroidectomy, mastoidectomy, direct laryngoscopy, tracheostomy, myringotomy, tonsillectomy, septoplasty, and microvascular free flaps.^9-24 Still, many key otolaryngology procedures lack objective evaluation methods.⁵ Selective neck dissection (SND) is a core procedure in otolaryngology for which trainees must integrate knowledge of complex head and neck anatomy with technical surgical skills. To the best of the authors’ knowledge, there is currently no assessment tool available for SND. Thus, the chief objective of this study is to develop and validate an operative performance instrument for SND, a key indicator case for otolaryngology residency training.

Materials and Methods

Modified Delphi Process

A modified Delphi method was used for development of a novel OSATS-based instrument for SND competency assessment. The Delphi process is an established method of consensus building that leverages the expertise of a group of individuals who have professional and experiential knowledge surrounding the content under investigation.²⁵ The process inherently facilitates controlled feedback and systematic progress toward consensus among a panel of experts during completion of a series of anonymous online questionnaires.²⁶ Each round of voting is used to successively refine the next iteration of the questionnaire until sufficient level of consensus is achieved.

In a modified Delphi process an initial collection of statements, items, or questions are provided for critiques and the open-ended questions are eliminated. The modified Delphi method employed in this study encompassed development of an inclusive list of instrument items developed by the 3-member Delphi steering committee (E.M.D., D.L.P, and M.L.C) and 2 subsequent voting rounds to successively refine individual statements and establish levels of consensus by members of the Delphi panel (Figure 1). To improve generalizability and construct validity, the Delphi working group comprised 23 head and neck surgeons with representation from 17 institutions. With each round, a prospective list of items was presented alongside forced response multiple-choice responses; free-text responses were also supported, offering respondents an opportunity to provide feedback. A 5-point Likert scale was applied to each item. A score of 3 was chosen as the minimum acceptable level to perform the procedure independently, allowing for improvement beyond minimally acceptable levels. The modified Delphi process for each statement was considered complete when there was convergence of opinion or when a point of diminishing returns was reached. The three-step modified Delphi process was completed over an 8-week period, from December 1, 2018 to January 26, 2019. All voting was performed anonymously via the SurveyMonkey online survey tool (SurveyMonkey, Portland, OR). Thresholds for achieving strong consensus, at >80% agreement, were determined a priori.

Figure 1.

The modified Delphi method employed a 3-step process consisting of 1 pre-voting round and 2 subsequent voting rounds to successively refine individual statements and establish levels of consensus by all members of the panel.

Instrument Validation

Institutional review board approval was obtained for a prospective study designed to validate the novel 30-item SND competency assessment tool that was developed using the methodology described above (IRB#18-007477). Seventeen residents, ranging from postgraduate year (PGY) 1 to 5, performed level II-IV SNDs on fresh-frozen cadaveric specimens in a controlled anatomy laboratory setting. Prior to starting the exercise, trainees were instructed to perform the procedure as if it were a live operative case. Specifically, residents were instructed to choose appropriate instrumentation, delicately navigate vital structures, and ligate necessary vasculature. Video documentation was obtained by use of an unmodified high-definition GoPro HERO 7 Black digital action camera (GoPro Inc, San Mateo, CA) and a commercially available head-mount (Figure 2). The camera was worn by a senior level trainee acting as a surgical assistant for the procedure. The surgical assistant provided no feedback and only assisted according to surgeons’ instructions. Captured video was edited to remove participant identifiers. The video for left-handed trainees was inverted 180-degrees in the horizontal plane so that all videos displayed right-handed dissection to enhance subject anonymity. The video was evaluated using the newly developed assessment instrument for level II-IV SND by 2 independent observers (E.M.D. and D.L.P.) who were blinded to the performing resident they were evaluating and to each other’s assessment.

Figure 2.

Equipment used to record trainee operative performance, including an unmodified GoPro camera with head mount and recording computer.

Construct validity was evaluated for each component of the tool by comparing trainees’ mean scores across advancing PGY levels and number of previous head and neck rotations completed. The inter-item reliability for the instrument was measured by assessing their internal consistency with Cronbach α, with a value of at least .80 considered acceptable. Inter-rater reliability was calculated using Cohen’s κ. Scores given for each item were evaluated, with an inter-rater score difference of 1 or less point on the Likert scale considered agreement. Pearson correlation coefficient was used to determine the relationship between scores on individual instrument tasks and overall score. Continuous features were summarized with means and 95% confidence intervals. Analyses were performed using Microsoft Excel 2010 (Microsoft Corporation, Redmond, WA).

Results

Through the modified Delphi process a final list of 30 items, considered to be the most essential assessable items for achieving the goals of a level II-IV SND, was developed (Figure 3). A total of 23 evaluations were completed for 17 distinct trainees across 5 PGY levels (levels 1-5). A breakdown of the number of trainees by PGY level and number of head and neck rotations is shown in Table 1. Six trainees underwent a second evaluation after advancing in PGY level. The Cronbach α to evaluate the tool’s internal consistency was .98. Evaluation of 6 different trainees was completed by both raters and was thus used to calculate inter-rater reliability on each item in the instrument. Inter-rater reliability showed moderate concordance (Cohen’s κ .65, 86% agreement).

Figure 3.

Selective neck dissection (levels II-IV) task specific checklist and global assessment tool.

Table 1.

Number of Participating Trainees According to Post-Graduate Year Level and Number of Rotations Completed.

PGY level	Number of participants
1	6
2	4
3	3
4	2
5	2
Number of head and neck rotations	Number of participants
0	8
1 to 2	2
3 to 4	4
5+	3

For this study, the instrument demonstrated construct validity. There was a clear association of increasing scores by PGY level in the TBC, GRS, and overall score (Figure 4A-C). PGY-3 had significantly better overall scores compared to PGY-1 (mean difference 0.98; P = .004) and PGY-4-5 had significantly better overall scores compared to PGY-3 (mean difference 1.30; P < .001). The transition to surgical competence (average score ≥3) appeared to occur at the PGY-3 level. The time required to complete the exercise did not significantly correlate with PGY-level (Figure 4D).

Figure 4.

Mean instrument scores and time to complete the exercise by PGY level. (A) Mean Global Rating Scale Score. (B) Mean Task Based Checklist Score. (C) Mean overall instrument score. (D) Mean time to complete the exercise in minutes. Error bars represent 95% confidence intervals.

Construct validity was also appraised within the context of the number of head and neck rotations completed by the evaluated trainee (0, 1-2, 3-4, and 5-7). The results of this assessment also demonstrate a positive association between the number of head and neck rotations completed and higher instrument scores for TBC, GRS, and overall score (Figure 5A-C). The transition to surgical competence appeared to occur between 1-2 and 3-4 head and neck rotations completed. No significant correlation was found between time to complete the exercise and number of head and neck rotations completed (Figure 5D).

Figure 5.

Mean instrument scores and time to complete the exercise by number of head and neck rotations completed. (A) Mean Global Rating Scale Score. (B) Mean Task Based Checklist Score. (C) Mean overall instrument score. (D) Mean time to complete the exercise in minutes. Error bars represent 95% confidence intervals.

To better elucidate the discrete surgical steps that have the greatest correlation with overall score, and thus have the potential to differentiate trainees by level of performance, the mean scores of each step and its correlation with overall score was calculated. The surgical steps with the greatest degree of correlation are shown in Table 2. Those steps with the least correlation are shown in Table 3.

Table 2.

Task-Based Checklist Items Demonstrating Strongest Correlation With Final Instrument Score.

Surgical step		Average score	Correlation coefficient
25	Branches of internal jugular vein anticipated and managed during final packet dissection.	2.65	.94
20	Omohyoid skeletonized and retracted inferiorly to allow access to level IV.	2.94	.91
7	Posterior belly of digastric traced posteriorly and retracted anteriorly.	3.24	.91
21	Transverse cervical artery identified and managed appropriately.	2.53	.91

Table 3.

Task-Based Checklist Items Demonstrating Lowest Correlation With Final Instrument Score.

Surgical step		Average score	Correlation coefficient
1	Appropriate patient positioning.	4.47	.03
2	Appropriate Identification of landmarks.	3.71	.42
17	Level IIB packet transposed deep to spinal accessory nerve.	3.18	.59
8	Hypoglossal nerve identified deep to intermediate tendon of digastric.	2.88	.59

Discussion

This study describes the systematic development and validation of a novel surgical assessment instrument for SND that reliably correlates with PGY level as well as number of head and neck surgical rotations completed, with moderate inter-rater reliability and construct validity. Individual instrument items were rigorously examined and refined through the established Delphi process by an extended group of 23 fellowship trained head and neck surgeons from 17 institutions. Identification of key vascular structures as well as the omohyoid muscle and posterior belly of the digastric were most strongly associated with overall score. This can be feasibly implemented into training program’s armamentarium along with other, previously validated instruments to identify deficiencies early on, provide formative feedback, and facilitate graduated autonomy with improvement in performance.

Too often, resident procedural deficiencies are identified late in training, when remediation is more challenging and arduous. Ideally, these deficiencies should be identified early, so that minor course corrections can be implemented throughout training. Moreover, accurate assessment of surgical proficiency may be used to accelerate graduated autonomy, whereby residents who demonstrate more advanced skills may be given early opportunities for increasing independence in the operating room when appropriate.

A recent systematic review by Labbé et al²⁷ identified a paucity of validated assessment tools for otolaryngology procedures, having only been developed for 11 core otolaryngology procedures. Selective neck dissection is among the procedures that lack an objective assessment tool. This pilot study presents a valid, reliable, and practical instrument for assessing operative performance in SND in cadaveric dissection with the potential to be used in live surgery. Intuitively, instrument performance should not only reflect PGY level, but also the number of head and neck rotations completed. Indeed, our data confirm that a higher number of rotations completed in the head and neck subspecialty is associated with improved surgical performance in SND.

Producing a safe and technically proficient surgeon is arguably one of the most important objectives of a surgical training program. Work-hour restrictions, limits on concurrent surgery, and pressure imposed on faculty to increase clinical productivity highlight the need for more innovative training and assessment strategies. With the impetus from accreditation bodies toward competency-based medical education, more objective measures of trainee performance, including multiple operative instruments, have been developed, validated, and implemented by various surgical specialties. These instruments can measure performance and provide formative feedback for improvement. In contrast to traditional evaluation methods that generally provide less reliable subjective feedback, objective assessment tools improve the ability to identify and correct deficiencies early on, provide formative and directed feedback, and ensure certification of a proficient and confident surgeon at the conclusion of training.

Despite the theoretical advantages of incorporating objective procedural assessment into resident evaluation, from a practical standpoint, implementation of these tools is often challenging. Coordination of busy trainee schedules with anatomy laboratory appointments, specimen availability, and logistics of capturing and distributing video footage requires a systematic approach with significant investment from the department as a whole. Integration of regular resident education time into our curriculum facilitates participation in activities such as this. Another major challenge with implementing assessment tools is the burden placed on faculty. This requires significant faculty time to observe, either directly or indirectly, evaluate, and complete assessments. This emphasizes the importance of faculty buy-in to ensure that timely formative feedback is provided.

This study is limited by several factors. Direct observational assessment in the operating room setting is the gold standard for evaluation. However, cadaveric dissection was used for this instrument for various reasons. While cadaveric dissection does not offer true fidelity of a living, bleeding human model, it allows for reasonably uniform difficulty with normal anatomy and a lack of disease involvement. In addition, cadaveric dissection allowed for evaluation of junior trainees without placing patients at undue risk. Trainees performed the dissections independently and feedback was withheld until the procedure had concluded. Future application of this instrument in a live operating room setting under appropriate supervision will lend insight into its validity and allow for further modification. Also, due to scheduling constraints, there was a disproportionate number of junior trainees participating in the exercise. While the data show a clear trend toward improvement in senior residents, the deficiency of senior-level evaluation may have impacted reliability. Ideally, each resident would have been evaluated annually throughout training, but the cross-sectional nature of this study was not conducive to serial evaluations. Video was recorded using a head-mounted camera, which was the least labor-intensive method of maintaining an appropriate field of view but created significant motion. A dedicated videographer would create the most ideal recordings but is not practical in most situations. As camera stabilization improves this will become less of an issue, and we found upgrading the camera was beneficial.

Rather than direct observation, video recording was obtained for evaluation. While this was logistically more complicated than direct observation, it allowed for anonymity of the participants and blinding of the evaluators who are familiar with the trainees. Bias due to known PGY level or “halo or horn” effect was circumvented using this method. Furthermore, the video allowed for rewind, control of playback speed, and did not require direct, real-time observation by the faculty. Remote scoring of assessment otolaryngology procedures using video review has previously been validated by Bowles et al²⁸. While the TBC developed for this study is based on rigorous literature review and consensus, the tasks described are the methodology used at the author’s institution and are therefore not generalizable to the practice at every institution. Ultimately, multi-institutional application and crowd-sourced feedback will facilitate instrument revision to improve feasibility and generalizability.

Conclusion

Herein, we present a novel instrument for objective assessment of surgical competency in level II-IV SND. It allows for specific and objective documentation of trainee skills and is intended to facilitate appropriate advancement in the operating room setting and identify specific deficiencies for targeted skill development. This instrument can feasibly be adopted into a training program curriculum to monitor progression of trainee skills.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Eric Dowling

Matthew L. Carlson

References

Wagner

Fahim

Dunn

Reid

Sonnadara

RR.

Otolaryngology residency education: a scoping review on the shift towards competency-based medical education. Clin Otolaryngol. 2017;42(3):564-572.

Wanzel

Ward

Reznick

RK.

Teaching the surgical craft: from selection to certification. Curr Probl Surg. 2002;39(6):573-659.

Brown

Thompson

Bhatti

NI.

Assessment of operative competency in otolaryngology residency: survey of US program directors. Laryngoscope. 2008;118(10):1761-1764.

Martin

Regehr

Reznick

, et al. Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273-278.

van Hove

Tuijthof

GJM

Verdaasdonk

EGG

Stassen

LPS

Dankelman

Objective assessment of technical surgical skills. Br J Surg. 2010;97(7):972-987.

Christopherson

Buchsbaum

Voet

Lifshitz

The canine laboratory in the training of the oncology fellow. Gynecol Oncol. 1986;23(1):26-34.

Lossing

Hatswell

Gilas

Reznick

Smith

LC.

A technical-skills course for 1st-year residents in general surgery: a descriptive study. Can J Surg. 1992;35(5):536-540.

Lippert

Spolek

Kirkpatrick

Briggs

Clawson

DK.

A psychomotor skills course for orthopaedic residents. Acad Med. 1975;50(10):982-983.

Lin

Laeeq

Ishii

, et al. Development and pilot-testing of a feasible, reliable, and valid operative competency assessment tool for endoscopic sinus surgery. Am J Rhinol Allergy. 2009;23(3):354-359.

10.

Syme-Grant

White

McAleer

JP.

Measuring competence in endoscopic sinus surgery. Surgeon. 2008;6(1):37-44.

11.

Diaz Voss Varela

Malik

Thompson

Cummings

Bhatti

Tufano

RP.

Comprehensive assessment of thyroidectomy skills development: a pilot project. Laryngoscope. 2012;122(1):103-109.

12.

Stack

Jr Siegel

Bodenner

Carr

MM.

A study of resident proficiency with thyroid surgery: creation of a thyroid-specific tool. Otolaryngol Head Neck Surg. 2010;142(6):856-862.

13.

Laeeq

Bhatti

Carey

, et al. Pilot testing of an assessment tool for competency in mastoidectomy. Laryngoscope. 2009;119(12):2402-2410.

14.

Francis

Masood

Laeeq

Bhatti

NI.

Defining milestones toward competency in mastoidectomy using a skills assessment paradigm. Laryngoscope. 2010;120(7):1417-1421.

15.

Sethia

Kerwin

Wiet

GJ.

Performance assessment for mastoidectomy: state of the art review. Otolaryngol Head Neck Surg. 2017;156(1):61-69.

16.

Piromchai

Kasemsiri

Wijewickrema

Ioannou

Kennedy

O'Leary

. The construct validity and reliability of an assessment tool for competency in cochlear implant surgery. Biomed Res Int. 2014;2014:192741.

17.

Dowling

Carlson

ML.

Assessing operative competency in cochlear implantation across the residency training continuum. Otol Neurotol. 2021;42(2):e153-e156.

18.

Ishman

Brown

Boss

, et al. Development and pilot testing of an operative competency assessment tool for pediatric direct laryngoscopy and rigid bronchoscopy. Laryngoscope. 2010;120(11):2294-2300.

19.

Al-Qahtani

Alkhalidi

Islam

Tool for assessing surgical tracheostomy skills in otolaryngology residents. B-ENT. 2015;11(4):275-280.

20.

Schwartz

Costescu

Mascarella

, et al. Objective assessment of myringotomy and tympanostomy tube insertion: a prospective single-blinded validation study. Laryngoscope. 2016;126(9):2140-2146.

21.

Ahmed

Ishman

Laeeq

Bhatti

NI.

Assessment of improvement of trainee surgical skills in the operating room for tonsillectomy. Laryngoscope. 2013;123(7):1639-1644.

22.

Roberson

Kentala

Forbes

Development and validation of an objective instrument to measure surgical performance at tonsillectomy. Laryngoscope. 2005;115(12):2127-2137.

23.

Obeid

AL-Qahtani

Ashraf

Alghamdi

Marglani

Alherabi

Development and testing for an operative competency assessment tool for nasal septoplasty surgery. Am J Rhinol Allergy. 2014;28(4):e163-e167.

24.

Schoeff

Hernandez

Robinson

Jameson

Shonka

Jr.

Microvascular anastomosis simulation using a chicken thigh model: interval versus massed training. Laryngoscope. 2017;127(11):2490-2494.

25.

Helmer

. Systematic Use of Expert Opinions (Report No. P-3721). The RAND Corporation; 1967.

26.

Yousuf

MI.

Using experts’ opinions through Delphi technique. Pract Assess Res Eval. 2007;12:4.

27.

Labbé

Young

Nguyen

LHP

. Toolbox of assessment tools of technical skills in otolaryngology-head and neck surgery: a systematic review. Laryngoscope. 2018;128(7):1571-1575. doi:10.1002/lary.26943

28.

Bowles

Harries

Young

Das

Saunders

Fleming

JC.

A validation study on the use of intra-operative video recording as an objective assessment tool for core ENT surgery. Clin Otolaryngol. 2014;39(2):102-107.