Abstract
Background
Assessing surgical competency in otolaryngology is challenging, and residency programs are now responsible for ensuring the surgical competency of their graduates. Therefore, more objective assessment tools are being incorporated into the evaluation process. Objective structured assessment of technical skills (OSATSs) tools have been developed for multiple otolaryngology procedures. These include tonsillectomy, endoscopic sinus surgery, thyroidectomy, mastoidectomy, direct laryngoscopy, and rigid bronchoscopy. The purpose of this study was to develop and test a new assessment tool for septoplasty surgery and ensuring its feasibility, reliability, and construct validity. This study was designed develop an test a valid, reliable, and feasible evaluation tool designed to measure the development of trainees’ surgical skills in the operating room for septoplasty surgery.
Methods
A new OSATSs-based instrument form for septoplasty was developed. During the study period of 2 years, 21 otolaryngology–head and neck surgery residents (ranging from postgraduate year 2 to 5) were evaluated intraoperatively by on faculty member obtaining al of 175 evaluations. Surgical performance was rated using a seven-item task-specific checklist (TSC) and a global rating scale (GRS). The TSC assessed specific septoplasty technical skills, and the GRS assessed the overall surgical performance.
Results
Our tool showed construct validity for both components of the assessment instrument, with increasing mean scores with advancing clinical levels. Cronbach's α, a measure of internal consistency, was 0.911 for TSC and 0.898 for GRS. Strong correlation between the TSC and GRS was established (r = 0.955; p < 0.01).
Conclusion
This study proved our educational tool to be a valid, reliable, and feasible method for assessing competency in septoplasty surgery. It can be integrated into surgical training programs to facilitate direct formative feedback. Assessing trainees’ learning curves enables insight into their progression, ensuring their appropriate development.
Residency training programs are facing increasing demands from accreditation councils to ensure the competency of their graduates in all six core competencies (patient care, medical knowledge, practice-based learning and improvement, interpersonal and communication skills, professionalism, and systems-based learning). This includes surgical skills as a component of patient care, which is considered a core competency. 3 Several evaluation tools were developed and are currently in use for assessing the surgical performance. Moreover, these tools serve to provide feedback on training, assess trainees’ learning curves, and as tools for examination in different stages of training. 4
Objective structured assessment of technical skills (OSATSs) was one of the first methods designed for objective skills assessment. It is also the instrument that has been studied most extensively and is one of the few that are actually used in clinical practice. It consists of a global rating scale (GRS) and a procedure task-specific checklist (TSC). Originally, it was designed to be used in laboratory settings and skill labs, but it is now also used in the operating theater.5,6 In a recent review by van Hove et al, OSATS is currently the “gold standard” for objective skills assessment, provided that it is being used in the appropriate setting, as an assessment feedback and discussion tool (formative assessment), and not for important examination decisions. 6
OSATS forms were proven feasible, reliable, and valid for multiple otolaryngology procedures. These include tonsillectomy, endoscopic sinus surgery, thyroidectomy, mastoidectomy, direct laryngoscopy, and rigid bronchoscopy.7–11 A Medline search was performed and did not reveal any assessment tool for performing septoplasty.
OBJECTIVES
The aim of the study was to develop a feasible, valid, and reliable assessment tool of surgical skills for otolaryngology trainees performing septoplasty.
Materials and Methods
This was a prospective, observational study conducted from September 2011 to August 2013 within the operating theaters of three teaching hospitals in Riyadh, Saudi Arabia.
OSATS evaluation forms were developed and tested earlier in a smaller scale pilot study. 12 Three expert rhinologists identified the main steps in septoplasty surgery using modified Delphi technique. 13 This was performed by asking the experts to make a list of the key steps they deemed essential for septoplasty surgery. 14 After that, a summary of the proposed steps was presented along with the reasons for them to be considered essential and the list was accordingly revised. This was performed for a total of three times after which a “septoplasty TSC” of eight items was then developed, with a 5-point rating scale. The checklist instrument helps primarily with providing formative feedback to the residents after the procedure (Fig. 1).

Task specific checklist TSC for septoplasty surgery.
A second “septoplasty GRS” instrument modified from Reznick was used. 5 A global assessment is useful as an estimate of overall aspects of surgical performance. A seven-item form with a 5-point rating scale linked to clear descriptors was created (Fig. 2). The faculty determined minimal acceptable (“pass”) level as 3 out of 5. This approach allows residents to continue to improve their skills beyond the minimally acceptable level up to full competency (5 out of 5).

Global rating scale (GRS) for septoplasty surgery.
Institutional Review Board approval was obtained from King Saud University, Riyadh, Saudi Arabia. All participants gave informed consent to be in the study.
Supervising faculty members were asked to fill out the forms immediately after the surgery was performed. Any forms with missing signature of the evaluator or missing an evaluation mark for any of the items were excluded from the study. Twenty-one residents, ranging from postgraduate years 2–5 (R2–R5), from the Saudi Board of Otolaryngology and Head and Neck Surgery training program, were observed while performing septoplasty surgery in the operating theater. No residents from the 1st postgraduate year were included because they spend their first year in general surgery training. Residents were evaluated by one faculty member of the division of Otolaryngology and Head and Neck Surgery who was scrubbed with the trainee and evaluated each and every step directly after completion. TSC and GRS forms were completed immediately after the procedure and were then handed to the operating residents. Forms were collected on regular basis throughout the study period.
Results were analyzed using IBM SPSS Statistics V.20 (IBM, Armonk, NY). Internal consistency was assessed by measuring Cronbach's α as a measure of interitem reliability for both checklist and global items. Construct validity was assessed by determining the mean percentages of total scores for residents at different levels and whether they increase with advancing training level. In addition, one-way ANOVA was used to compare scores across different training levels. For all statistical purposes, a value of p < 0.05 was considered significant.
Results
A total of 195 assessments were completed for 25 residents. Of these, 175 assessments were included for 21 residents. Twenty assessments were excluded for missing data. Over a period of 2 years, 10 faculty members participated in evaluating residents as they performed septoplasty in the operating room and filled the evaluation forms afterward. Each resident was monitored by one faculty member. The evaluation tool was found to be feasible based on faculty feedback. Average filling time was 5 minutes. Residents felt that the tool provided them with immediate informative feedback on their surgical performance.
Significant correlation of both instruments was noted (r = 0.955; p < 0.01). Internal consistency, measured with Cronbach's α, was statistically significant at 0.911 and 0.898 for TSC and GRS, respectively.
Construct validity refers to whether a test can differentiate between different expertise levels15,16 and was confirmed by the increasing mean task-specific (Fig. 3) and global scores with advancing postgraduate year (Fig. 4). There were seven residents who took part in 2 consecutive years of training. Junior residents showed much more improvement in their average individual scores than did senior residents (Fig. 5). Because no gold standard tool currently exists for evaluating surgical competency for septoplasty, criterion validity could not be assessed. Face and content validity refer to whether the developed tool measures what it is supposed to measure and to which extent it measures it adequately, respectively. Both were derived from the fact that three expert rhinologists used the modified Delphi technique to develop the instrument.15,16

Mean and 95% confidence interval scores of the task-specific checklist (TSC) for trainees at different levels. R2, 2nd year of residency; R3, 3rd year of residency; R4, 4th year of residency, R5, 5th year of residency.

Mean and 95% confidence interval scores of the global rating scale for trainees at different levels. R2, 2nd year of residency; R3, 3rd year of residency; R4, 4th year of residency, R5, 5th year of residency.

Mean performance scores for both task-specific checklist (TSC) and global rating scale (GRS) with progressing training levels. R2, 2nd year of residency; R3, 3rd year of residency; R4, 4th year of residency, R5, 5th year of residency.
One-way ANOVA showed significant difference in TSC and GRS between groups (F = 60.408, p < 0.001, and F = 68.130, p < 0.001, respectively). Post hoc test (Tukey HSD) for different items in TSC and GRS showed significant difference in all aspects of comparison with the only exception being R4 and R5 residents who did not show marked difference between groups (p = 0.191 and p = 0.756, respectively).
Discussion
All surgical training programs in Saudi Arabia assess trainees with a serial end-of-rotation faculty evaluation and a surgical logbook every 3 months. This includes the evaluation of overall surgical skills on recall-based observation. These methods, despite their importance, have poor reliability and validity.4,17–19 Assessing surgical competence in otolaryngology is challenging, 6 and residency programs are now responsible for ensuring the surgical competency of their graduates. Therefore, more objective assessment tools are being incorporated into the evaluation process. These may include dissection labs, virtual reality simulators, video assessment, and various evaluation checklists.6,19
The program of otolaryngology requires the trainees to master various surgical skills; these include general surgery skills, endoscopic skills, 20 and microscopic skills. Proper objective evaluation for each of those aspects is essential for training programs; hence our motivation was to develop and implement OSATS-based assessment tools for all common procedures in otolaryngology.
Nowadays, with reduced time dedicated to surgical training due to increased hospitals workload and increasing number of residents, valid objective tools are vital to ensure residents’ proper skill development.
Our study goal was to develop an evaluation tool for septoplasty surgery that is valid, reliable, and feasible.
TSC was developed to include the various tasks that faculty members deemed necessary for a competent execution of this type of surgery. Deconstructing the surgery into multiple tasks will allow for more observation and emphasis on each of these specific areas (tasks). Moreover, the TSC provides structure to the evaluator to give timely formative feedback on different aspects, which allows for early detection and correction of errors to tailor surgical skills training to individual needs.
Feasibility was assessed by calculating the average time spent on filling the forms. Evaluators reported that the forms were easy to understand and did not require >5 minutes on average for completion—an important factor in a busy surgical schedule.
Internal consistency refers to whether different items of a composite score show significance to the final score and was found high for both TSC and GRS, meaning that all of the different items on both evaluation forms are necessary for final evaluation, and none of them can be omitted.
Construct validity of our newly developed tool was assessed by its ability to discriminate residents of different experience levels. Interestingly, the major score difference for both the TSC and the GRS seen in our results was between R2 and R3 residents and between the R3 and R4 residents, which corresponds to the early “steep slope” or rapid improvement phase of a learning curve. This curve plateaus at R4, reflecting minimal skill acquisition in the last year of training. The smaller difference observed between R4 and R5 residents could be caused by the limited interest in performing septoplasty at a more senior level.
Reliability is another important factor of any evaluation tool that needs to be ensured before implementing it into a training program. Both the TSC and the GRS achieved acceptable interitem reliability. Therefore, we believe our tool has established its validity, feasibility, and reliability as an assessment tool that can be implemented in otolaryngology training programs for assessing the surgical competency in performing septoplasty.
A potential weakness of our study was that the trainees were evaluated by a single evaluator who already knew them, thus making it vulnerable to faculty bias. Suggestions to overcome this included the use of a video assessment in a blinded fashion in the future to ensure a more objective evaluation. Moreover, including more faculty members as raters would enable us to assess whether this tool shows acceptable interrater reliability and would further validate our assessment instrument. Another weakness was the lack of standardization, because surgical cases depict different pathology and standardizing them would be very difficult if not impossible.
Conclusion
This study suggests that a valid and reliable tool can be developed to objectively assess the surgical competency for residents performing septoplasty. This instrument can serve to facilitate formative and summative timely feedback of operative performance. Moreover, it was found valid and easy to use in a limited evaluation; however, larger studies are required to further validate its usefulness.
Footnotes
Acknowledgments
The authors thank all of the residents and consultants who participated in the study.
