Abstract
Background:
For thyroid nodules with indeterminate cytology, the Afirma Gene Expression Classifier (GEC) identified benign nodules to reduce diagnostic surgery, though many nodules classified as suspicious still proved histopathologically benign. The current Afirma Genomic Sequencing Classifier (GSC) demonstrates improved specificity, suggesting more nodules will have a benign result (benign call rate [BCR]), but independent data are needed to confirm this in clinical practice.
Methods:
Retrospective analysis was performed of all Bethesda III or IV cytology thyroid nodules ≥1 cm tested with GEC (between January 1, 2011, and July 19, 2017) or GSC (between July 20, 2017, and August 27, 2018) at the authors' institution. Afirma testing was not performed reflectively for all nodules with Bethesda III or IV cytology, but rather was applied based on physician–patient decision making. Demographic, sonographic, and cytologic data were collected. The BCR for GEC- versus GSC-tested nodules was compared and further stratified by Bethesda classifications.
Results:
The study evaluated 600 nodules in 563 patients tested with either GEC (n = 486) or GSC (n = 114). The BCR was 233/486 (47.9%) for the GEC compared to 75/114 (65.8%) for the GSC (p = 0.0006). Hürthle-cell cytology was present in 99/486 (20.4%) nodules in the GEC group compared to 31/114 (27.2%) nodules in the GSC group (p = 0.28). The GSC BCR was significantly higher than the GEC BCR for Bethesda III nodules characterized by Hürthle cells (p = 0.006), but the BCRs were similar for nodules with architectural or cytologic atypia. In Bethesda IV nodules suspicious for follicular neoplasm, BCR for the GEC and GSC were similar (p = 0.68), but for cytology suspicious for Hürthle-cell neoplasm, the GSC BCR was 68.2% (15/22) compared to the GEC BCR of 16.4% (10/61; p < 0.0001). Positive predictive value in resected nodules with a suspicious result was 16/32 (50%) for GSC nodules and 75/221 (33.9%) for GEC nodules (p = 0.1).
Conclusions:
The higher BCR for the GSC compared to the GEC for indeterminate thyroid nodules, predominantly among nodules with Hürthle-cell cytology, will likely lead to further reduction in surgical management.
Introduction
Thyroid nodules are common in clinical practice and are most often benign (1,2). For patients without a suppressed serum thyrotropin (TSH) concentration, ultrasound (US)-guided fine-needle aspiration (FNA) is recommended as the principle diagnostic test to assess for a potential cancer in most nodules over a certain size (3,4).
The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) has standardized reporting of thyroid FNA cytopathology into six groups, stratified by risk of malignancy. While cytopathologic evaluation is highly accurate in many cases, up to one third of aspirates have indeterminate findings (5). Nodules in the Bethesda III or IV categories present a clinical dilemma, since the risk of malignancy is relatively low but not excluded (5,6). Surgical resection is recommended for many such nodules due to the risk of cancer, though most prove to be benign (2,5 –7). For patients with benign nodules, superfluous surgery carries unnecessary risk (8,9), while initial diagnostic surgery may be suboptimal for those with malignancy (10,11).
The introduction of molecular testing for indeterminate thyroid FNA samples has improved diagnostic risk assessment. The Afirma Gene Expression Classifier (GEC) measured RNA expression to identify benign nodules with high sensitivity and negative predictive value (NPV) (12,13). For nodules with a benign GEC result, surveillance was generally recommended (14 –17), whereas diagnostic thyroid surgery was most often undertaken when the GEC result was suspicious (17,18). The specificity and positive predictive value (PPV) of the GEC were modest (13), and subsequent investigations found that nodules with Hürthle-cell (or oncocytic) cytology were often classified as suspicious by GEC but were unlikely to be malignant (19 –21). This low benign call rate (BCR) and PPV called into question the value of GEC testing for such nodules (22).
The updated Afirma Genomic Sequencing Classifier (GSC) uses next-generation RNA sequencing and has demonstrated improved test specificity while maintaining high sensitivity and NPV in the same prospective blinded multicenter cohort used to validate the GEC robustly (13,23). The GSC specifically incorporated Hürthle-cell classifiers and showed marked improvement of the low specificity in this subgroup (23). However, these data were based on a small number of samples and warrant independent confirmation.
The purpose of this study was to compare the performance of the Afirma GSC with its predecessor, the Afirma GEC, in Bethesda III or IV nodules ≥1 cm, with particular emphasis on differences in test performance for nodules with Hürthle cytology, as well as other cytologic subtypes within the Bethesda III category.
Methods
The study was a retrospective evaluation of all nodules ≥1 cm with Bethesda III or IV cytology tested with either the Afirma GEC (from January 1, 2011, to July 19, 2017) or GSC (from July 20, 2017, to August 27, 2018) at the Brigham and Women's Hospital Thyroid Nodule Clinic. Demographic, cytologic, and sonographic data were collected on all nodules. A very small number of nodules that had a repeat cytology prior to Afirma that was not indeterminate (benign, suspicious for malignancy, or malignant) were excluded. Nodules with a final Afirma result of “no result” (insufficient RNA) were not included in the study population.
As previously described (2,24), the clinical management of all patients undergoing thyroid nodule evaluation included thyroid US by a radiologist with subspecialty expertise. US reports included nodule size in three dimensions, echogenicity, presence and percent of any cystic component, and any suspicious sonographic features (microcalcifications, extrathyroidal extension, or abnormal lymphadenopathy). For this study, nodules were assigned retrospectively to American Thyroid Association (ATA) sonographic suspicion categories of high, intermediate, low, or very low (3) after blinded radiologist review (H.T.H.) using picture archiving and communications system. For analysis, the number of high and intermediate suspicion nodules was grouped together, since FNA is recommended when the nodule size is ≥1 cm for either (3), and compared to low and very low suspicion nodules.
FNA was performed by a thyroidologist using US guidance, typically with three aspirations with 25- to 27-gauge needles. Aspirates were processed using a liquid-based cytology preparation (ThinPrep; Hologic, Marlborough, MA). Evaluation of FNA samples was performed by a pathologist with thyroid cytopathology expertise using TBSRTC criteria (6,25). Bethesda III (atypia of undetermined significance/follicular lesion of undetermined significance [AUS/FLUS]) included those with cytologic atypia (AUS-C), only architectural atypia (AUS-A), or low cellularity specimen comprised exclusively (or almost exclusively) of Hürthle cells (AUS-H) (25,26). Bethesda IV (follicular neoplasm/suspicious for follicular neoplasm [FN/SFN]) included those considered suspicious for Hürthle-cell neoplasm (SHCN). The definition for Hürthle-cell cytology in this study included AUS-H or SHCN together. For Afirma GEC or GSC testing, one or two additional passes were performed for nodules with an indeterminate cytologic diagnosis. The decision to use molecular testing was based on shared physician–patient decision making after the cytology result. The indications for Afirma were Bethesda III or IV cytology, patient willingness to undergo observation or surgery based on the results, and absence of any clear indication for surgery (e.g., compressive symptoms, Graves' disease, a separate nodule with a cytology suspicious or positive for malignancy, and suspicious lymph nodes). Rare patients, in whom nodule features were particularly low risk and who were unlikely to benefit from treatment because of extreme old age or life-limiting comorbidity, were not subjected to molecular testing.
Histologic data were collected on all nodules that were removed surgically. To compare the pathologic results between the two groups more directly, cases of follicular variant of papillary thyroid carcinoma were reviewed to identify those that fulfilled criteria for noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) (27). Tumors defined as NIFTP were included with malignant tumors because NIFTP are considered low risk but are not definitively benign (27,28). Malignancies occurring in the thyroid but outside the Afirma-tested nodule were not considered in this comparison of diagnostic performance. The PPV was calculated based on Afirma suspicious nodules that underwent surgical resection.
The percentage of nodules with a benign result was defined as the BCR for the GEC or GSC. The BCRs for the GEC and GSC were compared overall and by subtypes within the Bethesda III and IV categories, including nodules with AUS-C compared to those with AUS-A. Additionally, the BCRs for nodules with Hürthle-cell cytology were compared.
Statistical analysis
Demographics and nodule characteristics are shown as number (percentage), mean (±standard deviation) or median (range). Statistical testing was performed using Fisher's exact test for categorical variables and Student's t-test for continuous variables. Statistical significance was defined as a two-tailed p-value of <0.05 for all analyses. Analyses were performed using GraphPad v5.0 (GraphPad Software, Inc., La Jolla, CA) and figures were created with Microsoft Powerpoint for Mac v15.34 (Microsoft, Inc., Redmond, WA). Approval for this study was obtained from the Institutional Review Board of the Brigham and Women's Hospital, which allowed a waiver of informed consent for this study. This study did not receive any financial support, approbation, or review by any commercial entity.
Results
During the study period, 652 nodules in 615 patients were eligible for inclusion. After exclusion of nodules with a Bethesda II or V result on repeat cytology (n = 27), size <1.0 cm (n = 11), or Afirma “no result” (n = 14), 600 nodules in 563 patients were included in this study. Of these, 486 nodules had a GEC test result and 114 had a GSC test result (Fig. 1). Patient demographics and nodule characteristics are shown in Table 1. Overall, patients were predominantly female (454/563; 80.6%), and the median patient age was 56 years (range 20–87 years). Sex distribution and median patient age were similar between the GEC and GSC groups.

Flow chart of nodule outcomes. There were initially 652 nodules with indeterminate cytology identified, of which 52 were excluded.* There were 486 nodules tested with the Afirma Gene Expression Classifier (GEC) from January 2011 to July 2017 and 114 nodules tested with the Afirma Genomic Sequencing Classifier (GSC) from July 2017 to July 2018. Of each, the proportion with a benign result (benign call rate) was higher for the GSC (65.8%) than for the GEC (47.9%; p = 0.0006). Of nodules with a suspicious result, a similar proportion underwent surgical resection, and of these nodules, there was a trend toward a higher proportion that were cancer or noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) in the GSC cohort. FNA, fine-needle aspiration. *Rate of insufficient results not significantly different between the two tests (p = 0.48).
Comparison of Patient and Nodule Characteristics for Thyroid Nodules Tested with GEC or GSC
Includes AUS/FLUS of predominantly Hürthle-cell subtype and suspicious for Hürthle cell neoplasm.
GEC, gene expression classifier; GSC, genomic sequencing classifier; AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; FN/SFN, follicular neoplasm/suspicious for follicular neoplasm; ATA, American Thyroid Association.
Comparing GEC- to GSC-tested nodules (Table 1), the median nodule size in the largest dimension was 1.9 cm (range 1.0–6.7 cm) and 2.0 cm (range 1.0–6.9 cm), respectively (p = 0.18). The proportion of nodules ≥3 cm was similar between the two groups. The proportion of nodules with AUS/FLUS and FN/SFN cytology was similar, as was the proportion of nodules with Hürthle and non-Hürthle cytology. When comparing ATA sonographic patterns, there was a similar proportion of nodules with high or intermediate suspicion compared to low or very low suspicion in the GEC and GSC groups. In contrast, 56/486 (11.5%) GEC nodules were predominantly cystic (>50%) compared to only 3/114 (2.6%) GSC nodules (p = 0.003).
Of the 486 nodules tested with the Afirma GEC, a benign result was obtained in 223 (47.9%). In comparison, 75/114 (65.8%) GSC-tested nodules had a benign result (p = 0.0006; Fig. 1). The BCR for the GEC and GSC was further analyzed after stratifying cytologic results (Table 2). In Bethesda III nodules, 48.7% (153/314) of GEC tests were benign compared to 65.7% (44/67) of GSC results (p = 0.01). For nodules with AUS-H cytology, the GSC BCR was 100% (6/6) compared to a 36.8% (14/38) BCR for the GEC (p = 0.006). Though the BCRs were higher for the GSC compared to the GEC for AUS-A and AUS-C, the differences were not statistically significant. Similarly, in Bethesda IV nodules, the BCR for GSC was 65.9% (31/47) compared to 46.5% (80/172) for the GEC (p = 0.02). Specifically for SHCN nodules, the GSC BCR was 64.0% (16/25) compared to the GEC BCR of 16.4% (10/61; p < 0.0001). In contrast, the BCRs were similar for Bethesda IV nodules that were FN/SFN. Evaluating Hürthle-cell lesions together (AUS-H and SHCN), the GEC BCR was only 24.2% (24/99) compared to the BCR of 71% (22/31) for the GSC (p < 0.0001).
Effect of Cytology Category on Benign Call Rate of GEC and GSC in Thyroid Nodules with Indeterminate Cytology
Bold indicates significance.
Unspecified atypia (n = 2) not shown, both GEC tested, both with benign result.
Includes those with cytologic atypia alone and those with cytologic and architectural atypia present.
The surgical management and histopathologic outcomes of GEC- and GSC-tested nodules are shown in Figure 1. In nodules with a suspicious result by GEC and GSC, a similar proportion underwent surgical resection (221/253 [87.4%] and 32/39 [82.1%], respectively). The PPV of the GEC was 33.9% (75/221). The PPV for the GSC was higher at 50% (16/32), though the difference only showed a statistical trend at this sample size (p = 0.1). The histologic diagnoses of all resected nodules are shown in Table 3. A similar proportion of positive histological results were NIFTP in each group.
Histopathologic Diagnoses Stratified by Test Result
NIFTP, noninvasive follicular thyroid neoplasm with papillary-like nuclear features.
Discussion
The Afirma GSC is a recently available molecular test for thyroid nodules with Bethesda III or IV FNA cytology that improves upon observed limitations of the Afirma GEC while maintaining high sensitivity and NPV, and was validated using the same robust prospective cohort of nodules that established the accuracy of the GEC (23). Though that initial investigation showed that a greater number of nodules would be accurately categorized as benign with improved test specificity compared to the GEC, this result has not been completely confirmed by independent evaluation. This study provides an independent real-world clinical assessment showing that the BCR was higher for the GSC compared to the GEC for Bethesda III or IV nodules, and this improvement was predominantly seen for nodules with Hürthle-cell cytology. To the authors' knowledge, this is the first publication to evaluate independently the performance of the GSC to that of the GEC for architectural and cytologic subtypes of AUS/FLUS.
The Afirma GEC was widely adopted in clinical practice, and subsequent analyses confirmed high NPV and clinical utility in decreasing unnecessary surgery (17,21), as well as long-term stability and patient safety during follow-up (16,29,30). However, to assure the accuracy of a benign GEC result and the safety of conservative management, the specificity and PPV were modest. A specific source of concern raised by several investigators was that nodules with Hürthle-cell cytologies (AUS-H or SHCN) were more likely to receive a suspicious GEC result but were unlikely to be malignant when resected (13,18 –21,31). For such nodules, previous reports indicated a BCR of 10–32% (20,21), while the PPV was 14–41% (18 –21,31).
The newer GSC sought to improve specificity while maintaining high sensitivity and NPV. When evaluated using 190 samples from the same cohort of Bethesda III and IV nodules used to validate the GEC (23), the specificity and PPV of the GSC were improved at 68.3% [confidence interval (CI) 60–76%] and 47.1% [CI 36–58%]. Fifty-four percent of GSC tests were benign. The current study demonstrated a higher BCR for the GSC of 65.8%, and this was significantly higher than 47.9% (p = 0.0006) in the GEC-tested cohort, which had similar patient demographics and nodule characteristics. This improvement in the BCR was seen in both the Bethesda III and Bethesda IV nodule groups. Within Bethesda III (AUS/FLUS), the BCR was modestly improved for AUS-A nodules (73.7% vs. 63.6%) and AUS-C (43.5% vs. 37.9%), though these did not reach statistical significance.
The GSC also attempted to improve test performance for Hürthle-cell lesions. Hürthle neoplasms are relatively rare thyroid tumors with distinct molecular biology that is only recently becoming more completely elucidated (32) and for which correct classification is more challenging. The GSC introduced two RNA classifiers assessing the presence of Hürthle cells to classify nodules with Hürthle cytology more accurately (23). The GSC validation report demonstrated that 10/17 (58.8%) benign Hürthle lesions were correctly identified, suggesting improvement over the GEC. In that study, however, nodules were evaluated by histology and not classified by Hürthle or non-Hürthle cytology. In the current study, the GSC showed a significantly higher BCR than the GEC for AUS-H (100% vs. 36.8%) and SHCN (64.0% vs. 16.4%).
Recently, Harrell et al. (33) compared nodules previously tested with GEC (n = 481) to those tested with GSC (n = 139). Their results showed an overall decrease in the percentage of samples called suspicious (58.4% vs. 38.8%), with a larger decrease in the subanalysis of oncocytic specimens (82.7% vs. 35.3%). In operated patients, the PPV of the GSC compared to the GEC improved from 57% to 76%. The data on GSC performance in the present study confirm and extends the findings of Harrell et al. Importantly, 9.3% (8/85) of GSC benign nodules were resected in the study by Harrell et al., and only 5.3% (4/75) were resected in the present study, confirming the clinically utility of obtaining a benign GSC result. Use of the GSC reduced surgeries compared to the GEC by 17% in the present study and by 23% in the report by Harrell et al. In the current study, a higher percentage of GSC suspicious nodules underwent resection than in Harrell et al.'s study. The PPV improvement of 16% for the GSC found in the present study does confirm the 19% increase observed by Harrell et al. The minor dissimilarity in absolute risks and effects sizes that exists between the two studies may be due to differences in referral patterns, cytologic evaluation, or underlying prevalence of malignancy between the populations.
It was considered that the change in BCR may have been due to differences between patients in the GEC and GSC cohorts, given that these patients were treated during different periods (2011–2017 vs. 2017–2018, respectively). Clinical practices in nodule assessment may have changed over this time, along with revised guidelines from the ATA in 2016 recommending against routine FNA for some nodules with a lower-risk sonographic phenotype (3). Significant differences in the proportion of higher and lower ATA sonographic risk nodules were not detected between the two groups. Similarly, patient age, patient sex, nodule size, and cytologic distribution were similar between the groups. The effect of more cystic nodules and fewer nodules with Hürthle cytology in the GEC group likely was small, but if anything this could have favored a lower malignancy risk and a higher BCR compared to the GSC cohort, which further supports the result of the analysis.
There are limitations to this study that should be considered. The two cohorts had similar patient and nodule characteristics, but it was not possible to evaluate all potential risk factors for malignancy, such as TSH (34) or other factors, and the possibility of confounding cannot be excluded. Given the stability of practice over this study period, any such effects are considered to be unlikely. Since not all patients with a suspicious result underwent surgery, this was not a complete representation of PPV, but a similar proportion of suspicious nodules were resected in both groups, and unresected suspicious nodules in both groups were similar with respect to cytology and ATA sonographic risk. Although there was only a trend observed for the differences in PPV between tests, it is highly likely that this was due to a lack of power in the available sample size.
In conclusion, the Afirma GSC identifies more indeterminate cytology nodules as benign compared to the Afirma GEC, predominantly among samples with cytology characterized by Hürthle cells. This reduction in suspicious results leads to fewer thyroid surgeries performed and higher PPVs.
Footnotes
Author Disclosure Statement
None of the authors have any financial disclosures.
