Abstract
Discrepancies between intraoperative consultations with frozen section diagnosis and the final pathology report have the potential to alter treatment decisions and affect patient care. Monitoring these correlations is a key component of laboratory quality assurance, however identifying specific areas for improvement can be difficult to attain. Our goal is to develop a standardized method utilizing root cause analysis and a modified Eindhoven classification schematic to identify the source of discrepancies and deferrals and subsequently to guide performance improvement initiatives. A retrospective review of intraoperative consultations performed at a tertiary level hospital and cancer center over a 6-month period identified deferrals and discrepancies between the intraoperative consult report and the final pathology report. We developed and applied a classification tool to identify the process errors and cognitive errors leading to discrepant results. A total of 48 (4.6%) discrepancies and 24 (2.3%) deferrals were identified from the 1042 frozen sections. Within the entire data set of frozen sections, the process errors (n = 26, 54.2%) were due to gross sampling (n = 16, 33.3%), histologic sampling (n = 8, 16.7%), and surgical sampling (n = 2, 4.2%). Interpretation errors (n = 22, 45.8%) included undercalls/false negatives (n=8, 16.7%), overcalls/false positives (n = 10, 20.8%), and misclassification errors (n = 4, 8.3%). Application of our classification tool demonstrated that the root cause of discrepancies and deferrals varied both between organ systems and by specific organs and that classification models may be utilized as a standardized method to identify focused areas for improvement.
Introduction
Monitoring the correlation of intraoperative consultation (IOC) by frozen section with the final diagnosis as determined by comparison with the subsequent permanent sections is a key component of anatomic pathology laboratory quality assurance (QA), is required for certification by the College of American Pathologists (CAP), and is an analytic monitor within the published recommendations of the Association of Directors of Anatomic and Surgical Pathology (ADASP).1,2 Prior studies evaluating concordance of intraoperative frozen sections have shown a rate of concordance with final diagnosis between 92% and 98%.3-7 The CAP performed multiple large interinstitutional studies that identified a mean concordance rate of approximately 98% and demonstrated that laboratories that participated in monitoring of correlation were capable of decreasing the discordance rate to less than 1% over 5 years.8-12 Although monitoring of correlation rates is almost uniformly performed within anatomic laboratories, the method of correlation varies by institution and ranges from a simple correlation of agreement to an in depth analysis, including tissue type, etiology of error, and grading of clinical impact. Unfortunately, a more detailed root cause of discrepancies and development of subsequent interventions is frequently impeded by cost, time, and a culture reluctant to focus on error. 13 Our goal within this project was to develop a classification scheme that is efficient without the loss of granular details that are imperative in the identification of specific areas for improvement.
Several systems have been suggested over recent years to classify diagnostic discrepancies in surgical pathology and have proven effective at determining patterns of error which can subsequently be helpful in identifying the source of error.14-16 White and Trotter 3 demonstrated a relationship between the percentage of discrepancies/deferrals and numerous variables intrinsic to the frozen section procedure including the anatomic site of tissue being studied and the type of pathology procedure requested (eg, diagnosis of neoplasm vs margin assessment). Of note, a comparison of studies demonstrates that the specific areas with high rates of discrepancy and deferral show substantial variation by institution. We propose a standardized classification model that may be applied to institutional QA of frozen sections with the goal of identifying the source of discrepancies and deferrals and to subsequently guide performance improvement initiatives.
Materials and Methods
Institutional review board approval was obtained for this quality improvement project (COMIRB 12-1350).
A retrospective review of all intraoperative consultations (n = 1042) performed at a tertiary-level hospital and cancer center over a 6-month period was performed by 2 staff pathologists (SBS and JAW). The diagnosis from the final surgical pathology report was compared with each intraoperative frozen diagnosis and these were classified as either: concordant, discrepant, or deferred. Cases in which the frozen diagnosis was strictly upheld in the final diagnosis were classified as concordant. By contrast, a final diagnosis that did not reflect the frozen diagnosis was categorized as discrepant and further subclassfied to delineate etiology. Discrepant cases included cases in which the frozen was completely incorrect or partially incorrect due to insufficient information or extraneous information. Deferrals were identified in a number of ways: a frozen section diagnosis that gave an extensive differential diagnosis, a vague or ambiguous diagnosis such as “atypical”, or a request for further tissue or further workup (ie, immunohistochemistry or flow cytometry) were all defined as deferrals. Each frozen section was subsequently assigned to 1 of the 3 following clinical impact categories: no change in clinical management, minor potential clinical significance, and major potential clinical significance. Minor clinical significance was defined as a change to the clinical treatment that caused minimal or no increased risk to the patient. An example of minor potential clinical significance would be the procurement of additional surgical margins at the time of the initial surgery. Major potential clinical significance was defined as a change in clinical treatment that incurred additional risk of harm. An example of this category would include unnecessary surgical staging or the requirement of additional surgeries. For events in which the clinical impact of discrepant or deferred cases was not readily apparent, a targeted review of the electronic medical record was performed to make the determination.
We designed a classification tool based on categories from the Eindhoven Classification Model of process and cognitive failures to further delineate the etiology behind discrepant results.17,18 Process failures were subclassified as due to gross sampling, histologic sampling, or surgical sampling. A gross sampling error occurred when the lesional was tissue was present in the specimen, however, was not sampled in the frozen section tissue. Lesional tissue present within the frozen tissue but not present on the frozen section slide, was defined as histologic sampling error. These discrepancies were identified on permanent section as histology cut deeper into the block. A surgical sampling error was identified when the surgical team provided nonlesion tissue for frozen section and subsequently submitted lesional tissue as a separate specimen for permanent histology. Interpretation failures were subclassified as undercalls (false-negative result), overcalls (false-positive result) and misclassifications.
The review of all intraoperative consultations with determination of concordance and classification of discrepant results was divided between and performed by 2 staff pathologists. If the classification was not readily apparent, the case was discussed by the 2 pathologists and consensus was determined. In this manner the rate of intraobserver disagreement is presumed to be low as, over time and with discussion, a more uniform agreement of the definitions was established.
Second, we identified the total number of frozen sections, discrepancies (total number and percentage), and deferrals (total number and percentage) within organ systems and specific organs. Our classification tool was applied to the entire data set, to specific organs systems, and to specific organs showing the highest incidence of discrepancies and deferrals.
Results
A total of 48 (4.6%) discrepancies and 24 (2.3%) deferrals were identified from the 1042 frozen sections and no major clinical harm was identified (Table 1). Six cases with a discrepancy were classified as having a minor clinical significance. The majority of discrepancies (95.8%) occurred during gross room processing or interpretation and only a small percentage were preanalytic (due to surgical sampling). Within the entire data set of frozen sections, process errors (n = 26, 54.2%) were due to gross sampling (n = 16, 33.3%), histologic sampling (n = 8, 16.7%), and surgical sampling (n = 2, 4.2%). Interpretation errors (n = 22, 45.8%) included undercalls/ false negatives (n = 8, 16.7%), overcalls/false positives (n = 10, 20.8%), and misclassification errors (n = 4, 8.3%) (Figure 1).
Detailed listing of frozen section discrepancies and deferrals.

Classification of all frozen sections.
Total discrepancies and deferrals by organs system and by individual organs are shown in Figures 2 and 3. Organ systems with a percentage of discrepancies/deferrals greater than 10% included gastrointestinal (13.4%), soft tissue (11.1%) and gynecologic (10.1%). Results of application of our classification tool on gynecologic and gastrointestinal organs systems are shown in Figures 4 and 5. Among gynecologic frozen sections, a total of 15 (10.5%) discrepancies and 1 (0.6%) deferrals were identified from 158 frozen sections. Process errors (n = 11, 73.3%) were subdivided into gross sampling (n = 9, 60%) and histologic sampling (n = 1, 6.7%). Interpretation errors (n = 4, 26.7%) consisted of overcalls/false positive (n = 3, 20%) and misclassification errors (n = 1, 6.7%, Figure 4). Among gastrointestinal frozen sections a total of 6 (6.2%) discrepancies and 7 (7.2%) deferrals were identified from 97 frozen sections. The process errors (n = 2, 33.3%) were due to histologic sampling. The interpretive errors (n = 4, 66.7%) were equally divided between undercalls/false negatives (n = 2, 33.3%) and overcalls/false positives (n = 2, 33.3%, Figure 5).

Concordance, discrepant and deferrals by organ system.

Concordance, discrepant and deferrals by specific organ.

Classification of gynecologic frozen section discrepancies.

Classification of gastrointestinal frozen section discrepancies.
Discussion
Our data from this retrospective review of frozen section concordance over a 6-month time period show a slightly higher discrepancy rate than those previously published (Table 2). We propose that our higher rates reflect a more stringent application of criteria to define concordance, particularly given the comparable rates of significant clinical impact. Although one could make the argument that application of a low threshold is clinically insignificant, or simply a function of self-reporting bias, we believe that a lower threshold allows the institution to better identify potential areas for improvement and thereby reduce the potential for more significant events. Frozen section discrepancies with clinical impact are relatively uncommon yet these events, however infrequent, have the potential to cause significant harm. Therefore, we feel that it is imperative to cast a broader net and identify the factors leading to near-miss events that may ultimately contribute to a more significant event.
Rates of Frozen Section Discrepancies and Deferrals.
These studies are Q-probes multi-institutional studies.
The data from Table 2 also reflect a variety of methods utilized to compile concordance statistics ranging from self-reporting to retrospective peer review, each with its own inherent biases. In this project, we chose a retrospective review performed by 1 of 2 board-certified pathologists who were blinded to the identity of the original frozen section pathologist. Cases that the primary reviewer thought were ambiguous or complicated were reviewed together for a consensus. In addition, in rare cases in which the frozen section diagnosis was deemed by consensus to be the “true” diagnosis, the case was designated as concordant. This limitation to a minimum of reviewers as opposed to a multitude of pathologists each responsible for their own cases, yields a more standardized and uniform classification of data and prevents the inherent tendency to discount discrepancies and deferrals when performing self-review. A drawback to this approach, however, is the bias incurred as the reviewer is already aware of the final diagnosis at the time of frozen section diagnosis review. An independent third-party reviewer might be the best possible solution for obtaining standardized data with minimal bias.
Likewise, it is worth noting that when comparing institutional rates, that there appears to be an inverse relationship between discrepancy and deferral rate which likely reflects a shift in classification from discrepant to deferral. A “deferral” may not have a uniform agreed-upon definition and perhaps is better divided into “appropriate deferral” and “inappropriate deferral.” For example, the majority of pathologists would agree that a frozen section diagnosis of “spindle cell neoplasm” for a myometrial lesion might be considered an appropriate deferral and no added benefit is gained with the application of root cause analysis. However, the diagnosis of “spindle cell neoplasm” would no longer be an appropriate deferral if necrosis, marked cytologic atypia, and significant mitotic figures were readily apparent on the frozen section slide. This scenario would be best classified either as an “inappropriate deferral” or “discrepant (undercall /false negative)”, depending on the reviewer and standard of a given institution. To identify near-miss events it ultimately may be more useful to exclude appropriate deferrals from root cause analysis and to combine inappropriate deferrals with discrepancies.
The CAP Q-tracks program has shown that participation in QA, including the monitoring of frozen section discordance, leads to overall improvement and reduction in error frequency over time. 12 We propose that our algorithm is a simple classification tool that can be easily applied to this QA data to better identify areas for targeted interventions. This is demonstrated within our data set by the comparison of 2 organs systems with high rates of discrepancy/ deferral: gastrointestinal and gynecologic. Of the 158 gynecologic frozen section specimens, a minority were deferred (n=1, 0.6%) and 15 (9.5%) were discrepant. In contrast, among 97 gastrointestinal frozen section specimens an almost equal number were deferred (7, 7.2%) as discrepant (6, 6.2%). The higher rate of deferrals may reflect a perceived cognitive complexity with which the pathologist was uncomfortable or unable to render a more definitive diagnosis. When we compare the root cause of discrepant cases in these 2 organ systems, we see that the majority of gynecologic discrepancies are due to process errors (n = 11, 73.3% of gynecologic discrepancies) and most specifically gross sampling (n = 10, 66.7%). Gastrointestinal discrepancies were much more likely to be a result of interpretation errors (n = 4, 66.7%) with an equal number being undercalls (n = 2, 33.3%) as overcalls (n = 2, 33.3%). Based on these data, interventions at our institution to target gynecologic frozen section discrepancies should be directed toward the gross process of sampling. Examples of interventions that may be attempted include things such as increased oversight from the pathologist of the area to be sampled by the pathology assistant, resident, or fellow or the requirement to sample and submit a minimum of 2 frozen blocks per gynecologic case. Whereas interventions at our institution designed to target gastrointestinal discrepancies and deferrals might be more effective if aimed at cognitive deficiencies. With this type of detailed frozen section deferral and discrepant monitoring and classification, we determined that 2 entirely different process improvement strategies would be required for each of these organ systems. This type of information gleaned from in-depth analysis can be quite useful to the surgical pathology unit that is serious about improving the accuracy of their diagnoses. It is worth noting that we believe that the application of this model will provide information specific to a single institution that can be used to guide specific interventions but should not necessarily be generalized beyond the participating group.
This project supports the findings of White et al 3 that discrepant and deferral rates vary significantly by organ site. The retrospective design of this project serves as a significant limitation. The impact of this model may be best shown as an ongoing prospective project, ideally integrated into a comprehensive quality assurance program. This project does not demonstrate how this information could be used for improvement. In an ongoing system, data-driven practice improvements could be devised and response to the changes could be monitored over time. Fortunately, discrepancies between frozen section diagnosis and the final diagnosis are uncommon, even when using a sensitive method for identification; this reliability, however, limits the findings of this project given its relatively short time period uncovering only 48 discrepancy events and 24 deferral events. The strength of the model is the provision of granular details that illuminate the source of the discrepancy; this necessarily requires stratification of these events, generating multiple categories and leaving fewer events in each. Thus, more total events are needed. An increased collection period would not only provide more events but also allow for discovery of trends over time within individual categories. Despite these limitations, this project has shown the potential of such a model.
Although the majority of institutions currently monitor frozen section discrepancy and deferral rates, we propose that a monitoring system that includes the organ site and error type will provide substantial additional information with the potential to reduce the overall rates of discrepancies and deferrals. This model transforms a quality assurance program into a system of continuous quality improvement, and potentially provides a more powerful tool to the pathologist.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
