Abstract
Background
Society consensus guidelines are commonly used to guide management of pancreatic cystic neoplasms (PCNs). However, downsides of these guidelines include unnecessary surgery and missed malignancy. The aim of this study was to use computed tomography (CT)-guided deep learning techniques to predict malignancy of PCNs.
Materials and Methods
Patients with PCNs who underwent resection were retrospectively reviewed. Axial images of the mucinous cystic neoplasms were collected and based on final pathology were assigned a binary outcome of advanced neoplasia or benign. Advanced neoplasia was defined as adenocarcinoma or intraductal papillary mucinous neoplasm with high-grade dysplasia. A convolutional neural network (CNN) deep learning model was trained on 66% of images, and this trained model was used to test 33% of images. Predictions from the deep learning model were compared to Fukuoka guidelines.
Results
Twenty-seven patients met the inclusion criteria, with 18 used for training and 9 for model testing. The trained deep learning model correctly predicted 3 of 3 malignant lesions and 5 of 6 benign lesions. Fukuoka guidelines correctly classified 2 of 3 malignant lesions as high risk and 4 of 6 benign lesions as worrisome. Following deep learning model predictions would have avoided 1 missed malignancy and 1 unnecessary operation.
Discussion
In this pilot study, a deep learning model correctly classified 8 of 9 PCNs and performed better than consensus guidelines. Deep learning can be used to predict malignancy of PCNs; however, further model improvements are necessary before clinical use.
Introduction
Pancreatic cystic neoplasms (PCNs) are rare tumors of the pancreas and encompass approximately .5% of all pancreatic neoplasms. 1 PCN is a broad term encompassing multiple histologically different subtypes including serous cystic tumors, mucinous cystic neoplasms (MCNs), intraductal papillary mucinous neoplasms (IPMNs), and solid pseudopapillary neoplasms. While most PCNs are benign, mucinous cystic neoplasm, IPMNs, and solid pseudopapillary tumors may harbor risk for malignancy. 2 With increasing identification of PCNs on incidental imaging3,4 and inconsistent diagnostic accuracy, 5 there has been interest in creation of schema to aid management of PCNs. In general, surveillance is recommended for serous cystic tumors and resection recommended for main-branch IPMNs and solid pseudopapillary tumors due to risk of malignancy or malignant transformation. However, for intermediate lesions such as MCNs or side-branch IPMNs, most clinicians follow consensus guidelines for recommendations regarding resection. The most prominent of these are the 2015 American Gastroenterological Association (AGA) guidelines 6 (for asymptomatic pancreatic neoplastic cysts such as serous cystic tumors and MCNs) and the Fukuoka guidelines 7 (for IPMNs). The guidelines describe high-risk and worrisome criteria of PCNs based on preoperative imaging and clinical symptoms. Retrospective studies have demonstrated high specificity; however, these guidelines have poor sensitivity, and using guidelines alone would have led to missing approximately 50% of advanced neoplasia in 1 series.8,9 This results in nearly 20% of patients with a benign lesion 5 undergoing operations with high morbidity. 10 Given that existing guidelines are insufficient for identification of high-risk features of PCNs, an opportunity exists to leverage innovative new technologies to aid in diagnosis of potentially malignant lesions.
Deep learning has emerged as a promising technology for image analysis in medicine.11-14 Deep learning is an artificial intelligence (AI), machine learning technique that uses a “neural network” to make predictions based on discrete inputs. The model uses an iterative process to take output predictions and reanalyze the model to further improve accuracy, continually improving with each new data point. The advantage of deep learning over other machine learning techniques is the use of “representation learning,” whereby the AI model identifies relationships within a data set that human operators cannot perceive. 11 Therefore, a deep learning model can have a pictorial input, broken down into individual pixels, and find relationships between the orientation and intensity of these pixels. There is growing interest in the use of deep learning in medicine with over 500 articles published since 2015. 12 The majority of these articles focus on the use of deep learning for classification and detection of objects in radiographic images. A study of the use of deep learning for identification of prostate cancer on histopathological slides demonstrated 99% accuracy for determination of malignancy. 13 A deep learning model was used to identify breast cancer on mammography alone with over 85% accuracy. 14 Deep learning in medicine is a growing field with boundless potential for clinical application in the right context.
Deep learning is a powerful and increasingly utilized tool that has previously been used to accurately identify malignancy from image analysis only. Considering the uncertainty in preoperative diagnosis of PCNs, a deep learning model may be able to find difficult-to-characterize patterns in images that can aid in diagnosis of PCNs. The aim of this study was to compare an AI deep learning model to existing guidelines in the diagnosis of high-grade dysplasia or malignancy in PCNs. Our hypothesis is that a deep learning model will have higher accuracy than the Fukuoka guidelines, which are the current standard of care.
Patients and Methods
Patient Selection and Image Analysis
Patients were identified using our institution’s Research Electronic Data Capture electronic database between March 2008 and March 2017. Institutional review board approval was received (IRB#04-18-14E), and as this is a retrospective study with deidentified data, informed consent was not required. The inclusion criteria included preoperative diagnosis of IPMN, mucinous cystic neoplasm, or serous cystic neoplasm. Diagnoses of pancreatic pseudocysts were excluded from analysis. Patients were excluded if there was no preoperative computed tomography (CT) scan available for review. Additionally, patients included had a pathology report available for review with pathologic diagnosis of their lesion. Characteristics of patients were determined through retrospective record review and included age, sex, operation performed, preoperative diagnosis, and postoperative pathologic diagnosis. Data are reported as either median with interquartile range (IQR) or counts with percentage.
The axial image with the largest cross-sectional lesion diameter on preoperative CT scan was collected for each patient and deidentified. All structures surrounding the lesion were removed from the image to isolate the lesion. The isolated lesion was then centered on a 150-pixel × 150-pixel white background for standardization (Figure 1). Using pathologic diagnosis, each image was assigned a binary outcome of advanced neoplasia or benign for purposes of model creation. Advanced neoplasia was defined as adenocarcinoma or IPMN with high-grade dysplasia, while benign was defined as IPMN with intermediate- or low-grade dysplasia or serous cystadenoma. A “training set” was created with 66% of randomly selected lesion images, and a “testing set” of 33% of images was created for model testing. Example of images used to train the deep learning model. The largest cross-sectional diameter of the lesion in the axial plane was isolated and placed on a 150-pixel × 150-pixel white background.
Deep Learning Model
The deep learning model utilized in the experiment was modeled after the LeNet architecture which included 3 convolutional layers: 1 flattened layer and 2 fully connected layers. Batch sizes for model analysis were set to 3. Input classes were designated during training as advanced neoplasia status labeled as “yes” or “no.” Overall, 20% of the data were utilized within the convolutional neural network (CNN) for internal validation purposes. Images for both testing and training were standardized by sizing to 128 × 128 with 3 color channels for analysis.
The neural network graph parameters were specified as follows: the first 2 convolutional layers were given a filter size of 3 with 32 filters. Convolutional layer 3 was given a filter size of 3 with 64 filters. The fully connected layer size was subsequently set to 128 based on these parameter specifications. The activation function utilized in the first fully connected layer is the rectified linear unit. The learning rate for the CNN was set to .0001 to ideally minimize potential overfitting errors associated with model construction. The model was trained for 4000 iterations. Further CNN training beyond the 4000 iterations was not performed due to a reported loss in accuracy and higher associated cost function associated with overtraining the model.
After completing CNN training and validation, external validation was accomplished with a separate sample of images completely unknown to the deep learning model. Classification percentages for these validation images were calculated by utilizing the trained weights from the CNN. These values were recorded with the corresponding classification label (advanced neoplasia or benign) and later evaluated for model discriminative ability through receiver operating characteristic value.
Model Validation and Comparison to Established Guidelines
The 9 “testing set” images were analyzed by the model, and each image was given a percentage chance of advanced neoplasia. If the model’s predicted chance of advanced neoplasia was greater than 50%, the model was considered to have predicted advanced neoplasia. If the model’s predicted chance of advanced neoplasia was less than 50%, the model was considered to have predicted benign pathology. These predictions were compared to the actual pathology report to determine model accuracy. Each lesion was then evaluated by an attending hepato-pancreato-biliary surgeon to determine whether it met “high risk” or “worrisome risk” criteria based on the Fukuoka guidelines. 7 A guideline prediction of “high risk” was considered a prediction of advanced neoplasia, while a prediction of “worrisome” was considered a prediction of benign pathology. The model predictions were compared with the Fukuoka guideline predictions. Data were analyzed with STATA (StataCorp LLC, College Station, Texas, USA).
Results
In total, 27 patients met the inclusion criteria. All patients in this study underwent pancreatic resection, with 11 (40.7%) undergoing left pancreatectomy and 16 (59.3%) undergoing pancreaticoduodenectomy. Median age was 69 (IQR 57.8-73) years with 57.7% male and 42.3% female. Postoperative pathology was reviewed for all patients. In total, 9 (33.3%) patients had evidence of advanced neoplasia consisting of 7 (25.9%) patients with invasive pancreatic adenocarcinoma and 2 (7.4%) patients with IPMN with high-grade dysplasia. Eighteen patients had benign pathology including 5 (18.5%) serous cystadenoma and 13 (48.1%) IPMN with intermediate- or low-grade dysplasia. Surgery was indicated in ten patients (37.0%) deemed “high risk” and 15 (55.6%) deemed “worrisome” based on Fukuoka guidelines. Of the remaining 5 (18.5%) patients, surgery was indicated for symptoms of abdominal pain (4; 14.8%) or recurrent pancreatitis (1; 3.7%).
Model Predictions for All Tested Lesions.
aFukuoka classification as “high risk” was considered a prediction of advanced neoplasia and classification as “worrisome” was considered a prediction of benign.
Comparison of Deep Learning Model Predictions with Fukuoka Guidelines.
aFukuoka classification as “high risk” was considered a prediction of advanced neoplasia and classification as “worrisome” was considered a prediction of benign.
bAdvanced neoplasia pathology included adenocarcinoma and intraductal papillary mucinous neoplasms (IPMNs) with high-grade dysplasia, while benign pathology included IPMN with intermediate- or low-grade dysplasia and serous cystadenoma.
Discussion
In this pilot study for prediction of advanced neoplasia from preoperative CT imaging, a deep learning model was superior to consensus guidelines. The deep learning model was able to correctly classify 8 of 9 high-risk PCNs, while consensus guidelines were correct 6 of 9 times. In terms of possible patient outcomes, following predictions from the deep learning model would have resulted in 1 unnecessary operation, while following consensus guidelines would have resulted in 2 unnecessary operations and 1 missed neoplasm.
Artificial intelligence and deep learning are exciting new technologies that have shown promise in clinical applications, with over 300 publications since 2015. 12 Many applications of deep learning thus far have been focused on oncology to predict malignant behavior of tissue samples, 15 histology slides, 13 or radiographic images. 14 Litjens et al 13 demonstrated that a deep learning model was able to identify specific areas of cancerous tissue in prostate cancer biopsies with an area under the curve (AUC) of .999. Further analysis demonstrated that a different deep learning model was able to identify areas of cancerous tissue in breast sentinel lymph node samples with an AUC of .90. 13 Another study evaluated the use of a deep learning model for identification of suspicious microcalcifications in diagnostic mammography and compared it to other machine learning techniques. 14 The authors demonstrated that the deep learning model was superior to other techniques and was able to identify microcalcifications with an accuracy of 82%. 14 These studies demonstrate the power of deep learning techniques to aid clinicians in cancer diagnoses. For a disease such as PCN, where the current standard of diagnosis is insufficient, 5 new approaches are needed.
This study is the first to use preoperative CT imaging to predict dysplastic and malignant biology of a PCN. Previous studies have used deep learning models incorporating pancreatic fine needle aspirate (FNA) analysis to classify PCNs. 15 Kurita et al 15 retrospectively reviewed data from patients undergoing endoscopic ultrasound (EUS) and FNA of PCNs over a 10-year period. Using carcinoembryonic antigen (CEA) value, cancer antigen 19-9, fluid amylase, patient sex, cyst characteristics, and cytology, the authors created a deep learning CNN model. The output prediction was malignant or benign, based on final pathology or from subsequent clinical follow-up. This study found that a deep learning model had an AUC of .966 and accuracy of 92.9% in high-risk lesions, which was more accurate than using CEA or cytology alone. While the deep learning model had high accuracy compared to other measures, the patient population was limited to patients undergoing diagnostic EUS-FNA, introducing a selection bias toward high-risk lesions. The study did not mention the indications for EUS-FNA for patients in the study; however, it is likely that the group of patients with high-risk features from consensus guidelines is overrepresented in this study. This model has limitations in broad applicability because it may not represent all patients with PCNs and would require patients to undergo EUS-FNA which is costly and may place patients at risk of procedure-related complications.
Another study has used preoperative magnetic resonance imaging (MRI) to create a deep learning model to classify malignant behavior of PCNs for patients undergoing eventual surgical resection. 16 The authors obtained T2-weighted and postcontrast T1-weighted MRI images of the pancreatic region to use in a CNN. Images were used for both model creation and model testing, rather than using separate “training” and “testing” sets. The authors found that the AUC of the deep learning model (.78) was similar to that of AGA (.76) and Fukuoka (.77) consensus guidelines. While this study demonstrated the feasibility of deep learning to predict malignant biology from preoperative imaging, accuracy was no better than currently existing and simple consensus guidelines. Additionally, by using the image data set for both model creation and model validation, there is a risk of overfitting the model to the data set and limiting external validity to new data.
There are several important limitations of our study. First, this is a pilot study with a small sample size; therefore, we cannot make broad statements about the performance of our model. However, initial results from our model are quite promising, and use of techniques such as image modulation may be help increase the number of images for input into the model. 17 A characteristic of deep learning models is that predictive ability improves through sequential learning; therefore, as the sample size grows, the model will improve. Second, images obtained in this study were from a single axial image of the largest diameter of the target lesion without surrounding pancreatic tissue or further images of the lesion. Therefore, our model lacks information regarding surrounding pancreatic structures, location in the pancreas, or other features of the lesion from other axial images. Further model construction with use of multiple images of the pancreas including the lesion may improve model accuracy. Third, the retrospective nature of this study adds an inherent selection bias, and there may be overrepresentation of patients with high-risk features. As this study includes only patients who had surgical resection and pathologic diagnosis, patients with low-risk features undergoing surveillance were excluded. Therefore, there is a risk that the model was biased toward lesions with high-risk features. Finally, we chose a deep learning prediction cutoff of 50%; however, this may not be reflective of choices made by surgeons. This was a pilot study that demonstrates the potential of deep learning image analysis for characterizing malignant potential of PCNs and may not be ready for clinical application without further refinement.
Future studies should be directed toward using techniques to increase the number of images for model creation 17 and increasing the number of patients evaluated. Considering that a deep learning model increasingly improves with more input data, this will likely lead to improvement in the model’s predictive ability. As PCNs are uncommon, multi-institutional collaboration may help with creation of a larger cohort of patients to evaluate. Additionally, these further studies should investigate the use of more computationally complex models that can incorporate clinical information, more CT images including surrounding structures, and standardized CT protocols with higher resolution (ie 1.5-mm slices vs. 5-mm slices).
In conclusion, this pilot study with a small group of patients (n = 27) demonstrated that a deep learning model based only on preoperative CT imaging was better able to predict advanced neoplasia than generally accepted consensus guidelines. Further study should be directed toward creation of a larger, more sophisticated model and ultimately prospective validation of a model for prediction of malignant behavior of PCNs.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
