Abstract
Background:
Blurry images in teledermatology and consultation increased the diagnostic difficulty for both deep learning models and physicians. We aim to determine the extent of restoration in diagnostic accuracy after blurry images are deblurred by deep learning models.
Methods:
We used 19,191 skin images from a public skin image dataset that includes 23 skin disease categories, 54 skin images from a public dataset of blurry skin images, and 53 blurry dermatology consultation photos in a medical center to compare the diagnosis accuracy of trained diagnostic deep learning models and subjective sharpness between blurry and deblurred images. We evaluated five different deblurring models, including models for motion blur, Gaussian blur, Bokeh blur, mixed slight blur, and mixed strong blur.
Main Outcomes and Measures:
Diagnostic accuracy was measured as sensitivity and precision of correct model prediction of the skin disease category. Sharpness rating was performed by board-certified dermatologists on a 4-point scale, with 4 being the highest image clarity.
Results:
The sensitivity of diagnostic models dropped 0.15 and 0.22 on slightly and strongly blurred images, respectively, and deblurring models restored 0.14 and 0.17 for each group. The sharpness ratings perceived by dermatologists improved from 1.87 to 2.51 after deblurring. Activation maps showed the focus of diagnostic models was compromised by the blurriness but was restored after deblurring.
Conclusions:
Deep learning models can restore the diagnostic accuracy of diagnostic models for blurry images and increase image sharpness perceived by dermatologists. The model can be incorporated into teledermatology to help the diagnosis of blurry images.
Introduction
Deep learning models have demonstrated their expert-level accuracy in classifying skin images. 1 Most of these models adopt common underlying structures of convolutional neural networks (CNN) for feature recognition and have been incorporated into the decision-support systems of mobile applications, 2 increased the diagnostic accuracy in teledermatology, 3,4 and deployed in cloud applications for remote diagnosis of skin cancers. 5 However, these models suffer from accuracy drops when the input images are blurry because the models are typically trained and tested on curated datasets composed of high-quality photos.
Unfortunately, image blurriness is especially a pressing issue in dermatology because images are the only way to evaluate skin lesions in certain situations. 6 For example, in store-and-forward teledermatology, patients take digital images and send them to a platform where physicians or a built-in diagnostic model can view and diagnose the skin disease asynchronously. 7 Difficulty in diagnosis will be anticipated with increased blurriness of the image as there is no alternative way to assess the lesion. Moreover, compelling evidence showed that diagnostic accuracy drops for both physicians 8 and CNN models 9 when the images are blurry. Previous studies made efforts to solve this problem by automatically assessing and categorizing patient-uploaded pictures as excellent or poor quality. 10 –12 However, accurate image quality evaluation does not guarantee clear images because technical issues may exist, such as a low-resolution camera or hand tremor. Therefore, we seek a more direct way to solve this problem by building a deblurring model that directly converts the blurry image into a sharper one.
Deblurring was applied to various medical images such as sonography, 13 computed tomography, 14 magnetic resonance imaging, 15 and fundus images. 16 Most of them relied upon a deep learning-based approach, which allowed end-to-end deblurring and achieved state-of-the-art performance in the blind deblurring tasks. 17 In dermatology, deep image deblurring has been proposed in the preprocessing step of dermatoscopic photos. 18 However, the performance of deblurring models has not been systematically examined on camera-acquired skin photos, which compose most images on teledermatology platforms.
In this study, we built a deblurring model to remove blurriness in skin images. We evaluated the usefulness of the model by first demonstrating how much the accuracy drops for diagnostic models when inputting blurry images and showed the amount of recovery in accuracy after images deblurred. We also evaluated if the image sharpness improved subjectively by practicing dermatologists after deblurring. The deblurring model can become a crucial tool in a teledermatology platform in the rapidly expanding wave of remote health care.
Methods
DATASET AND STUDY DESIGN
For model training, we used the public Dermnet dataset on Kaggle that included 19,559 pictures from the website Dermnet (http://www.dermnet.com/). In image labeling, all images were categorized into one of 23 disease categories such as inflammatory diseases, infection, and benign and malignant lesions (Supplementary Fig. S1). We excluded duplicated images and divided them into the training, validation, and test sets, resulting in 12,453, 3,104, and 3,634 pictures in each set (19,191 pictures in total), respectively.
MODEL TRAINING
We used the above dataset to train both diagnostic and deblurring models, the former being the tool for comparison and the latter being the subject of the study.
We chose three state-of-the-art image classification models for diagnosis: (1) EfficientNet-B4, 19 (2) SE-ResNeXt-101, 20 and (3) DenseNet. 21 We applied transfer learning and fine-tuned the models on our dataset using Adam optimizers, a learning rate of 1e−3, and a batch size of 32 for 70 epochs.
We adopted Restoration Transformer (Restormer) 22 for the deblurring model because of its high performance in many datasets. 23,24 To train a deblurring model, pairs of sharp/blurry images are required. Because of the lack of such datasets on skin images, we generated such image pairs by applying blurring algorithms to the training set (Supplementary Fig. S2). Five types of blurriness were applied to the images: Gaussian blur, motion blur, Bokeh blur (gaussian blur in the foreground), mixed slight blur, and mixed strong blur. The latter two were induced by randomly applying one of the Gaussian blur, Bokeh blur, or motion blur, using parameters tailored for slight or strong blurriness. For all types of blur, only 75% of the images underwent blurring, whereas the remaining 25% test images were deliberately kept unchanged to facilitate the model’s learning of the distinctions between blurry and sharp images. The pretrained model was fine-tuned on our dataset using an AdamW optimizer with a declining learning rate from 3e−4 to 1e−6, a batch size of 4 and 46 epochs (details described in Supplementary Data S1).
EVALUATION OF BLURRING AND DEBLURRING EFFECTS
The performances of the diagnostic models were evaluated by weighted sensitivities and precisions across all classes of skin diseases. The weighted metrics accounted for the imbalance between classes and were obtained by weighting the per-class sensitivities and precisions with the amounts of images within each category. The three diagnostic models were evaluated on various conditions, including sharp, mixed slightly blurred, slightly deblurred, mixed strongly blurred, and strongly deblurred images, and the mean and standard deviation of the weighted metrics were calculated. Mixed types of blurriness were selected to simulate the diverse range of blurriness encountered in real-world scenarios. The heatmap of important areas where the models relied upon for diagnosis was shown by GradCam++. 25 Two traditional metrics used for measuring the extent of image reconstruction in computer science, peak-signal-to-noise ratio (PSNR) and structural similarity index (SSIM), are also used for evaluation.
To evaluate the deblurring effect on real-world blurry images, we collected a dataset of 107 blurry images taken in both clinical and daily life settings. The dataset contains 53 blurry skin photographs captured by an iPhone XS, which were extracted from the dermatology consultation records at Kaohsiung Chang Gung Memorial Hospital from June 2021 to November 2021, and 54 de-identified daily life blurry skin images from a public dataset that focused on human hands. The consultation photos encompass various anatomical sites, including 19 images from the trunks, 28 from the limbs, 2 from the palms, 2 from the soles, and 2 from the faces. The demographics of the patients in dermatology consultation photos were unavailable because of the anonymous nature of the photos. Each image underwent 5 different deblurring models, and the least blurry output was manually selected for a head-to-head comparison with the raw blurry images. Five board-certified dermatologists were provided with written instruction for rating images. They were instructed to assess the images based on the ease of making a clinical diagnosis, using Likert-type integer scores ranging from 1 to 4, with 4 being the highest level of image sharpness. The interrater agreement between pairs of raters ranged from 0.46 to 0.65, indicating a moderately positive correlation. The five raters possess similar levels of experience in teledermatology, each seeing about 5–10 patients per week via telemedicine for 2–3 years. For each image, the scores are averaged across all the raters. The mean and the standard errors of the averaged scores for the raw and deblurred images were calculated, and their differences were tested by paired t-test using Python library SciPy version 1.7.3.
DATA SHARING STATEMENT
The dataset is available on the Dermnet website for academic use (https://dermnetnz.org/).
The real-world blurry dataset is available at https://github.com/shipchou/A-skin-blurry-dataset-focused-on-hand-regions
INSTITUTIONAL REVIEW BOARD APPROVAL STATUS
Approval for this study was obtained from the Institutional Review Board of Chang Gung Memorial Hospital in Taiwan (202200353B0).
Results
DEBLURRING RESTORES BLURRINESS BOTH QUALITATIVELY AND QUANTITATIVELY
To measure the qualitative effects of blurring and deblurring of the digital photos, we compared the heatmap of the CNN model when the inputs were sharp, blurry, and deblurred (Fig. 1). The critical area in the sharp image is mainly located at a more inflamed area, and the intensely colored region spreads out on less typical skin lesions after blurring. The hot area was successfully restored after image deblurring, focusing on similar regions as that in the sharp image. The distortion of image content was unnoticeable. On visual inspection, there is no difference in color and brightness between the deblurred outputs and the original image.

Examples of sharp, blurry, and deblurred images and their heatmaps for classification. The upper row shows the sharp, blurry, and deblurred version of a test image. The lower row shows the GradCam++ activation maps of SE-ResNeXt-101 for the three images. Red areas indicate more importance for making classification by the CNN models. The model’s focus spread out on less typical skin lesions in the blurry image but was restored after deblurring. CNN, convolutional neural network.
We evaluated the quantitative performance by PSNR and SSIM (see Methods), and all deblurring models show remarkable objective improvement in both metrics from the raw images (Supplementary Table S1). Mixed types of blurry photos led to less favorable improvement in both PSNR and SSIM than a single type of blurry pictures, and stronger mixed blur adversely affected deblurring performance more than slight mixed blur, indicating that more complex blurriness presented greater challenges to deblur.
BLURRING WORSENS CLASSIFICATION PERFORMANCE, WHICH CAN BE REVERSED BY DEBLURRING
To examine if successful deblurring leads to changes in diagnostic performance, we used the CNN classification model as a comparison tool in the experiment. The results are summarized in Table 1. The decrease in sensitivity and precision was exacerbated by increased image blurriness. The sensitivity dropped 0.16 and 0.22, and the precision dropped 0.14 and 0.19 in slightly and strongly blurry images, respectively. After deblurring, the sensitivity improved by 0.14 and 0.17, and the precision improved by 0.12 and 0.14 on slightly and strongly deblurred images, respectively. In both conditions, the deblurring restored the metrics to a comparable level of sharp images. We also evaluated whether the deblurring model distorts images when the original inputs are sharp and not blurry. The result showed that the performance changed minimally (sensitivity 0.63 ± 0.01, precision 0.64 ± 0.01) with such inputs. This shows that the deblurring model selectively acts only on blurry photos, and the improvement in image sharpness is related to diagnostic accuracy. The remaining question is whether the model helps diagnosis in a realistic setting where the photos are not taken by professionals or in a specific environment.
Restoration of Averaged Sensitivity and Precision of Deep-Learning-Based Classification Models on Sharp, Blurry, and Deblurred Images
Blurry images produced by algorithms with parameters inducing slight/strong blurring.
Images with slight/strong blurriness restored by applying deblurring models for slight/strong blurring.
DEBLURRED IMAGES HAVE HIGHER SHARPNESS RATINGS THAN BLURRY IMAGES TAKEN IN DAILY CLINICAL SETTINGS
We compared the sharpness rating rated by dermatologists after deblurring for blurry daily photos and clinical consultation photos. The mean dermatologist-rated scores of blurry and deblurred images were 1.87 ± 0.65 and 2.51 ± 0.55, respectively. The mean scores were significantly higher for the deblurred images (p < 0.001). For the subset of 53 images collected from dermatology consultation, the mean scores of blurry and deblurred images were 2.35 ± 0.52 and 2.75 ± 0.46, respectively, which also showed a significant difference (p < 0.001). The 54 blurry photos from the public dataset received ratings of 1.40 ± 0.35, whereas their deblurred counterparts were rated 2.28 ± 0.53, showing a significant difference (p < 0.001). The result demonstrates that the photos processed by the deblurring model got higher scores of image sharpness from dermatologists, which may help recognize subtle features essential for correct diagnosis.
Discussion
In this study, we demonstrated that blurry images hurt performance increasingly with image blurriness, and our deblurring models, which were trained on simulated blurry images, could restore the performance drop. When applied to real-world clinical photos, the deblurring models were able to improve the dermatologists-rated image sharpness of blurry clinical photos. This discovery expands the use cases of the already popular convolution-based neural networks on skin image diagnosis, enabling it to work equally well in blurry inputs that might have failed in previous models. It can not only be incorporated into existing diagnostic models 5 but also used in store-and-forward teledermatology settings to avoid diagnosis delay when patients send blurry photos.
The performance drop with increased blurriness was similar to the results of previous studies in generic images. 9 Pei et al. applied nine different degrading algorithms on the Caltech-256 dataset, 26 including the three kinds of blurs in our study, and also found classification accuracy dropped rapidly with an increased level of blurriness. However, their deblurring models could only restore partial classification accuracies. The failure in recovering accuracy can result from less optimal deblurring performance of the models, or the deblurring does not target restoring patterns that are crucial for classification. In our test images, the restoration of model focus shown in the heatmaps matches the recovery in classification performance, meaning the deblurring successfully targets the pattern identified by the diagnostic model.
We did not observe a noticeable effect on diagnostic performance when the slightly deblurring models processed clear images. This indicates that the model can selectively change the sharpness of the blurry but not the clear images. This selectivity of deblurring model to only operate on blurry images is an essential feature for such deblurring models to be deployed in clinical settings, where distortion to clear images is ideally avoided.
We found that the deblurring models trained on a single type of blurriness performed better than those trained on mixed types of blurs. This reflects the difficulty in building a one-size-fits-all model that deblurs all kinds of blurriness. Larger models or more efficient structures may be required in future studies. On our teledermatology platform, we address this problem by simultaneously showing the results of all deblurring models and letting physicians decide which one to use for diagnosis.
Our study has several limitations. First, for images with multiple types of blurs, the models might have less optimal performance. Second, each image has to undergo five different deblurring models, which requires more computational resources on the teledermatology platform. Third, the deblurring process might introduce artifacts that cause diagnostic uncertainty or fail to reconstruct small lesions such as freckles.
Conclusions
Deep-learning-based deblurring models could restore the diagnostic performance of deep-learning classification models on blurry images and increase image sharpness on real-world blurry clinical images.
Footnotes
Acknowledgments
The authors thank Chang Gung Research Program for their financial support and the Higher Education Sprout Project of the National Yang Ming Chiao Tung University and Ministry of Education, Taiwan. The authors also thank SciDM and National Center for High-performance Computing for providing computational and storage resources.
Authors’ Contributions
H-H.Y.: Conceptualization, methodology, software, validation, formal analysis, and writing—original draft. B.W-Y.H.: Validation, formal analysis, and data curation. S.-Y.C.: Software, validation, formal analysis, and data curation. T-J.H. and B.W-Y.H.: Formal analysis and data curation. V.S.T.: Writing—reviewing and editing, supervision, and funding acquisition. C-H.L.: Writing—reviewing and editing, supervision, and funding acquisition.
Disclosure Statement
No competing financial interests exist.
Funding Information
This study received grant support from Chang Gung Research Program (CMRPG8K0081 and CORPG8L0051).
Supplementary Material
Supplementary Data S1
Supplementary Figure S1
Supplementary Figure S2
Supplementary Table S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
