Hybrid Physics-Driven Deep Learning for Enhanced Ultrasound Image Quality and Speckle Noise Suppression

Abstract

Ultrasound imaging is widely used in clinical practice due to its non-invasive, real-time, and cost-effective nature. However, speckle noise often degrades image quality, obscuring fine anatomical structures and reducing diagnostic confidence. Existing denoising methods struggle to remove noise effectively while preserving critical details, limiting their clinical utility. Although recent deep learning architectures excel at capturing both local details and global structure, they remain inherently limited in handling speckle noise, as its physical characteristics are not explicitly incorporated. To address this limitation, a Physics-Regularized Self-Supervised Denoising U-Net (PR-SSD-Net) is introduced to reinforce the U-Net’s capability for high-quality image restoration. The physics-based constraint guides the network to produce residual noise patterns that align with expected statistical behavior, enhancing image clarity and preserving critical structures. Comprehensive evaluations were conducted on six diverse ultrasound datasets. Significant improvements in Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) were observed, accompanied by reduced variability, as reflected in their standard deviations (SD). An ablation study confirmed the pivotal role of physics-guided regularization, and expert assessments demonstrated high inter-rater agreement (Fleiss), supporting the clinical relevance of the approach. These results highlight the proposed PR-SSD-Net approach as a robust, physically grounded solution for speckle noise reduction, enhancing both the reliability and clinical utility of ultrasound imaging.

Keywords

physics-guided regularization speckle noise self-supervised learning ultrasound denoising

Introduction

Cancer continues to pose a significant global health challenge, with incidence rates expected to rise steadily in the coming decades. Ultrasound imaging has long played a key role in the detection of organ cancers, owing to its non-invasive nature, real-time imaging capabilities, and cost-effectiveness. Recent technological advances—such as elastography, microvascular imaging, and AI-assisted interpretation—have further enhanced its diagnostic potential. Despite these improvements, speckle noise remains a persistent and limiting factor. This granular, multiplicative noise obscures fine tissue structures, impairs image clarity, and complicates accurate diagnosis, underscoring the critical need for effective denoising strategies to ensure reliable and clinically meaningful ultrasound imaging.¹

Early attempts to tackle speckle relied on traditional filtering methods. Median filters, anisotropic diffusion, and wavelet-based techniques can reduce noise and preserve edges to some extent. However, these methods struggle when confronted with complex tissue textures or low-contrast regions, often smoothing out subtle anatomical details that are diagnostically important.² Statistical and model-based approaches, such as homomorphic filtering or Bayesian estimators, leverage assumptions about speckle patterns to improve denoising, yet their performance tends to degrade when imaging conditions vary, limiting their generalizability.³

The introduction of deep learning opened new horizons. Convolutional neural networks (CNNs) and convolutional autoencoders (CAEs) can capture intricate spatial correlations and provide stronger denoising capabilities than traditional filters. Still, they often fall short in preserving fine structures in areas with high spatial complexity. Generative models, particularly conditional GANs, have shown the ability to produce visually realistic reconstructions, but their reliance on large and diverse datasets, along with training instability, limits their robustness in clinical scenarios.⁴ Transformers, with their self-attention mechanisms, offer the ability to model long-range dependencies and distinguish noise from meaningful structures. Yet, they come with high data demands that may hinder practical deployment.⁵ Multi-scale or residual learning strategies have been proposed to balance noise reduction and detail preservation, but misalignments or scale inconsistencies can introduce subtle artifacts.⁶ Recently, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful class of generative architectures for image restoration. These models iteratively refine an image by reversing a gradual noising process, enabling high-quality reconstructions with strong robustness to complex noise distributions. In medical imaging, diffusion-based approaches have been successfully applied for ultrasound enhancement and lesion segmentation by exploiting their ability to model rich data distributions and generate clean images through multi-step denoising refinement. Despite their impressive performance, DDPMs typically require long inference times and large training datasets, which may limit their applicability in real-time or data-constrained ultrasound scenarios.⁵

More recently, unsupervised and self-supervised methods have gained attention. Techniques like Noise2Void or Noise2Self can denoise directly from noisy images without clean references, making them attractive for clinical applications where annotated data is scarce. While these methods perform well in many scenarios, they typically do not incorporate the physics of ultrasound image formation, which limits their ability to guarantee physically plausible reconstructions. This observation suggests a promising direction: combining self-supervised learning with domain-specific priors could achieve both effective speckle suppression and accurate structural preservation.⁷

Among deep learning architectures, U-Net has emerged as one of the most effective solutions for ultrasound denoising. Its encoder–decoder structure, combined with skip connections, allows the network to capture both global context and fine-grained features, enabling high-quality reconstructions. U-Net is particularly suited for medical imaging due to its ability to preserve spatial details while performing substantial noise reduction. However, despite its strengths, standard U-Net exhibits several limitations: it can struggle to fully suppress speckle noise in highly heterogeneous tissues, may introduce smoothing artifacts in regions of fine anatomical structures, and does not inherently enforce any physical consistency with the underlying image formation process.⁸ These drawbacks motivate efforts to enhance the U-Net framework with additional constraints or priors to improve both denoising and structural fidelity.

Several strategies have been proposed to enhance U-Net’s performance in ultrasound imaging. Residual connections and attention mechanisms have been integrated to improve feature propagation and focus the network on diagnostically relevant structures.⁹ Multi-scale or multi-resolution architectures capture information across different levels of detail, improving performance on complex textures.¹⁰ Generative adversarial training has been explored to produce more realistic outputs, while self-supervised learning schemes allow the network to denoise without requiring clean reference images.¹¹ Despite these innovations, most approaches remain purely data-driven and do not explicitly leverage domain-specific knowledge of ultrasound physics.

Building on these insights, we propose that embedding a physics-informed constraint directly into the U-Net framework represents the most principled and effective approach. By enforcing statistical consistency with the physical formation of speckle noise, the network can reduce artifacts, preserve fine anatomical details, and produce reconstructions that are both visually and physically plausible. This idea motivates the development of a technique based on integrating physical constraints into a U-Net architecture,¹² combining the proven strengths of U-Net with physics-informed guidance to achieve robust interpretable, and clinically meaningful denoising.

The main contributions of this research are as follows:

We propose a Physics-Regularized Self-Supervised Denoising U-Net (PR-SSD-Net), which combines the architectural strengths of U-Net with physics-guided regularization.

We introduce a probabilistic physical constraint, derived from the statistical model of speckle noise, directly into the network’s loss function, ensuring that denoised images remain physically plausible while preserving fine anatomical structures.

We evaluated our method on six datasets, including five publicly available ultrasound image datasets: Breast Ultrasound Scans-BRAzil (BUS-BRA),¹³ Breast Ultrasound Image Dataset (BUSI),¹⁴ Ultrasound Digital Image Analysis Tool (UDIAT),¹⁵ Digital Diagnostic Test Images (DDTI),¹⁶ Bangladesh University of Engineering and Technology (BUET),¹⁷ and one clinical dataset collected at a university hospital center in Tunisia: Real World Clinical Validation (RWCV),¹⁸ demonstrating robustness under varying noise levels.

The remainder of this paper is organized as follows. In Section 2, we detail the proposed PR-SSD-Net framework, explaining its architecture and the integration of the physics-based regularization. Section 3 presents the results of our comprehensive experiments, conducted on the combined BUS-BRA, BUSI, UDIAT, DDTI, BUET, and RWCV datasets, and compares the performance of PR-SSD-Net against state-of-the-art denoising methods. Finally, Section 4 concludes the study, highlighting the main findings and discussing their significance for improving the quality and reliability of ultrasound imaging.

Proposed PR-SSD-Net Approach

As shown in Figure 1, the novelty lies in embedding a probabilistic physical constraint—derived from the statistical model of acoustic speckle—directly into the loss function of the network. This physical regularization ensures that the denoising process remains consistent with the real physics of ultrasound image formation.

Figure 1.

Flowchart of the PR-SSD-Net proposed hybrid method.

The observed ultrasound intensity can be modeled as a multiplicative process¹⁹ :

I (x, y) = S (x, y) \cdot N (x, y)

(1)

where $I (x, y)$ denotes the measured intensity at each pixel, $S (x, y)$ corresponds to the true backscattered signal that represents the underlying tissue structures, and $N (x, y)$ accounts for the multiplicative speckle noise introduced during the image formation process.

After applying a logarithmic transformation, the model becomes additive:

\log I (x, y) = \log S (x, y) + \log N (x, y)

(2)

This transformation simplifies statistical modeling and facilitates noise estimation.

In our study, the datasets consist of B-mode ultrasound images, which are already log-compressed by the ultrasound scanner. Because we work directly on the final B-mode images, the artificial speckle noise was added after the log compression performed by the ultrasound system. Thus, in our implementation, the noise is applied directly on the B-mode image, not on a pre-log linear representation. After noise injection, no additional log transform is applied.

Self-Supervised Framework

The network is trained using a blind-spot self-supervised strategy inspired by Noise2Void methods. Random pixels are masked in the input image, and the network predicts the missing intensity $\hat{I} (x, y)$ from its unmasked neighborhood, without ever seeing the true pixel value.²⁰

The corresponding loss is defined as:

L_{b l i n d} = \frac{1}{| Ω |} \sum_{(x, y) \in Ω} {‖ I (x, y) - \hat{I} (x, y) ‖}_{1}

(3)

where $Ω$ denotes the set of masked pixels.

Physical Constraint on Speckle Noise

To ensure that the denoised image remains physically consistent, we introduce a probabilistic constraint on the residual noise:

R (x, y) = I (x, y) - \hat{I} (x, y)

(4)

In the case of fully developed speckle, the amplitude of the speckle noise follows a Rayleigh distribution:

P_{s p e c k l e} (r; σ) = \frac{r}{σ^{2}} \exp (- \frac{r^{2}}{2 σ^{2}})

(5)

where $σ$ is the dispersion parameter.

Rayleigh statistics are theoretically valid for the envelope before log compression, whereas log-compressed B-mode images typically follow exponential or generalized Gamma-like distributions.

For this reason, the Rayleigh model in our framework is not meant to characterize the observed B-mode intensities. Instead, it is introduced as a regularization prior on the residual noise, guiding the reconstruction so that the residual maintains a physically plausible speckle-like structure in homogeneous areas.

The Rayleigh prior is thus a practical modeling choice, not a claim of exact statistical fidelity to the B-mode image distribution.

Although our experiments are performed on final B-mode (log-compressed) images, the artificial speckle noise is applied in the linear intensity domain before the logarithmic transformation. This ensures that the physical multiplicative nature of speckle is preserved. Specifically, for each clean image $I_{c l e a n}^{l i n} (x, y)$ , the noisy image is generated as

I_{n o i s y}^{l i n} (x, y) = I_{c l e a n}^{l i n} (x, y) \cdot N (x, y),

(6)

Where $N (x, y)$ is a unit-mean Rayleigh-distributed random variable, defined by

f_{N} (n) = \frac{n}{σ^{2}} \exp (- \frac{n^{2}}{2 σ^{2}}), n \geq 0,

(7)

With $σ$ chosen to achieve the desired noise variance in the linear domain (2–7 in our experiments).

After adding the multiplicative noise, the image is converted to B-mode using standard logarithmic compression:

I_{n o i s y}^{B} (x, y) = 20 \cdot \log_{10} (I_{n o i s y}^{l i n} (x, y) + ϵ),

(8)

where $ϵ$ is a small constant to avoid $\log (0)$ .

This procedure ensures that the speckle noise realistically reflects the multiplicative nature of ultrasound speckle while producing B-mode images suitable for evaluation. The same simulation protocol, including the noise distribution, variance control, and pre-log application, is applied consistently across all datasets to ensure full reproducibility.

We enforce the residual $R (x, y)$ to statistically follow this distribution through a physics-based likelihood loss:

L_{p h y s} = - \sum_{x, y} l o g P_{s p e c k l e} (R (x, y); σ)

(9)

The parameter $σ$ is estimated adaptively on locally homogeneous regions, ensuring that the regularization dynamically adjusts to tissue-dependent backscatter properties.

Although Gamma-based model may better describe log-compressed B-mode ultrasound images, the Rayleigh distribution is adopted here as a simple and physically inspired regularization prior for the residual term. Therefore, its use should be interpreted as a practical modeling choice rather than as a definitive statistical characterization of B-mode data.

Variance Control

The variance of the simulated speckle noise is controlled through the Rayleigh dispersion parameter. For a Rayleigh-distributed noise amplitude, the variance is defined as:

V a r (N) = (2 - \frac{π}{2}) σ^{2}

(10)

To generate the desired noise levels used in our experiments (with variance ) V ε [2,7], the corresponding Rayleigh parameter is computed by inverting equation (10):

σ (v) = \sqrt{\frac{v}{2 - \frac{π}{2}}}

(11)

This formulation ensures a fully controlled and reproducible definition of the noise intensity across all datasets.

Total Loss Function

The global objective function combines the blind-spot reconstruction loss, the physical constraint, and a structural similarity term:

L_{t o t a l} = L_{b l i n d} + λ_{p h y s} L_{p h y s} + λ_{S S I M} L_{S S I M}

(12)

where $L_{b l i n d}$ represents the self-supervised pixel reconstruction loss, ensuring that the network learns to predict masked intensities from their spatial context. The term $L_{p h y s}$ enforces physical consistency by constraining the residual noise to follow the statistical behavior of speckle as defined by the Rayleigh distribution. Meanwhile, $L_{S S I M}$ preserves the anatomical structures and maintains edge sharpness in the denoised images. The weighting coefficients $λ_{p h y s}$ and $λ_{S S I M}$ are used to balance the contribution of each component within the total loss function.

The originality of the proposed PR-SSD framework lies in integrating a physics-based probabilistic constraint directly within a self-supervised learning scheme. Unlike conventional denoising networks that minimize local pixel-wise errors, our model learns to reconstruct ultrasound images while respecting the statistical and physical nature of speckle. This combination results in a denoising algorithm that is both data-efficient and physically guided, achieving speckle suppression without loss of fine anatomical details.

Experiments

Overview of the Experimental Framework and Dataset Details

The implementation of the proposed PR-SSD-Net framework was carried out in a Python 3.12 environment using the PyCharm integrated development platform. All simulations and training procedures were executed on an HP 15 workstation operating under Windows 10,

Model development and training were performed using the PyTorch framework, chosen for its flexibility and efficiency in implementing deep neural networks. The model was trained for 200 epochs using the Adam optimizer with a learning rate of $1 \times 10^{- 4}$ . The batch size was set to four images, ensuring stable convergence during the training process. This configuration ensures reproducibility and reliability of the obtained results while maintaining a coherent alignment between the physical modeling of ultrasound speckle and the structural fidelity of the reconstructed images.

To rigorously assess the proposed approach, we relied on six publicly accessible ultrasound datasets, collectively comprising 3252 images. These datasets—BUS-BRA, BUSI, UDIAT, DDTI, BUET, and RWCV—were carefully selected to encompass a wide range of clinical conditions, primarily focusing on breast and thyroid imaging. In accordance with the established evaluation protocol, each dataset was partitioned into two subsets: 60% of the images were allocated for model training, while the remaining 40% served exclusively for performance testing. Such diversity ensures that the evaluation captures variations in acquisition protocols, imaging devices, and anatomical structures. This heterogeneity provides a robust foundation for assessing the generalization ability and stability of the proposed model under diverse real-world imaging conditions. A detailed summary of dataset specifications is provided in Table 1.

Table 1.

Summary of Datasets Employed in this Study.

Ref.	Dataset	NI	Organ/classes	Format/resolution	Acquisition equipment	Source/location
Gómez-Flores et al.¹³	BUS-BRA	1875	Breast images comprising 814 normal, 722 benign, and 342 malignant cases	PNG images at 274 pixels× 353 pixels	Acquired using Philips HDI 5000, GE LOGIQ E9, Toshiba Aplio XG, and Siemens Acuson S2000	Collected at the National Institute of Cancer (INCA), Rio de Janeiro, Brazil
Al-Dhabyani et al.¹⁴	BUSI	780	Breast images including 133 normal, 437 benign, and 210 malignant cases	PNG images at 562 pixels× 471 pixels	Acquired with LOGIQ E9 Agile system	Provided by Baheya Hospital for Early Detection & Treatment of Women’s Cancer, Cairo, Egypt
Yap et al.¹⁵	UDIAT	163	Breast images with 110 benign and 53 malignant cases	PNG images at 760 pixels × 570 pixels	Acquired using Siemens ACUSON Sequoia C512 system	Obtained from UDIAT Diagnostic Center, Parc Taulí Corporation, Sabadell, Spain
Pedraza et al.¹⁶	DDTI	134	Thyroid images, all with microlobulated nodules	JPG images at 418 pixels × 309 pixels	Acquired using TOSHIBA Nemio 30	Provided by Universidad Nacional de Colombia, CIM@LAB, IDIME, Colombia
Tasnim and Hasan¹⁷	BUET	260	Breast images containing 190 benign and 70 malignant cases	BMP images at 800 pixels × 600 pixels	Acquired with Sonix-Touch Research US system	Collected at Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
Slimi et al.¹⁸	RWCV	40	Breast images, all malignant	JPG images at 468 pixels × 472 pixels	Acquired using Logiq E9, GE Healthcare	Collected at the Radiology Department, Monji Slim University Hospital Center, Marsa, Tunisia

NI = Number of images; Format/Resolution = Image format and pixel dimensions; Acquisition Equipment = Devices used for image acquisition; Source/Location = Institution and country of acquisition.

To ensure a fair and unbiased evaluation, the dataset splitting strategy was carefully designed. The train-test partition was performed independently for each dataset rather than globally across the combined datasets. Specifically, for each dataset, 60% of the images were randomly selected for training, while the remaining 40% were reserved for testing.

This dataset-wise splitting strategy preserves the intrinsic characteristics and statistical distribution of each dataset, avoiding potential biases that could arise from mixing heterogeneous data sources during partitioning. It also ensures that the evaluation reflects the generalization capability of the proposed model across different acquisition conditions and imaging environments.

Implementation Guidelines

To situate the proposed PR-SSD-Net model within the current spectrum of deep learning methodologies for image denoising and segmentation, an extensive comparative study was performed. This benchmarking process aimed to assess the model’s performance against several state-of-the-art algorithms and to emphasize its distinctive contributions to ultrasound image enhancement.

For a comprehensive evaluation, the PR-SSD-Net architecture was compared with a diverse set of representative methods, including K-Singular Value Decomposition (K-SVD), Block-Matching and 3D Filtering (BM3D), Denoising Autoencoder (dnAE), Generative Adversarial Network (GAN)-based frameworks, the S-Transformer, Diffusion-based UNet (DiffUNet), and Restormer.

These benchmark models were selected for their demonstrated success in addressing image restoration tasks, encompassing a wide range of methodological paradigms—from sparse coding and classical filtering to convolutional and transformer-driven deep architectures. Such diversity ensures a fair and rigorous comparison, highlighting the efficiency, adaptability, and generalization capacity of the proposed PR-SSD-Net in relation to leading denoising techniques.

To closely replicate the noise characteristics typically observed in clinical ultrasound imaging, artificial speckle perturbations were introduced into the input data, with noise variance systematically varied between 2 and 7. This controlled range was chosen to emulate different levels of acquisition degradation, thereby creating a realistic and demanding testing scenario for evaluating the robustness of the model. All ultrasound frames were resized to a uniform spatial resolution of 256 pixels × 256 pixels, ensuring both the preservation of relevant anatomical information and consistent compatibility across the network’s processing pipeline. Each image underwent intensity normalization to the [0, 1] range to promote numerical stability and accelerate training convergence. The denoising performance of the proposed PR-SSD-Net framework was comprehensively evaluated through a suite of quantitative indicators, including the Peak Signal-to-Noise Ratio (PSNR), the Structural Similarity Index Measure (SSIM), as well as their corresponding standard deviations (SD-PSNR and SD-SSIM).

The PSNR and SSIM metrics offer a broad evaluation of the reconstructed image quality: PSNR quantifies the peak deviation between the restored image and its reference, while SSIM captures the perceptual fidelity by emphasizing structural and luminance similarities. The complementary indicators, SD-PSNR and SD-SSIM, were employed to examine the consistency and robustness of image quality across multiple denoising trials and varying noise intensities.

To ensure fair comparison and full reproducibility, the same preprocessing and noise–simulation protocol was applied uniformly to all six datasets. All B-mode ultrasound images were resized to $256 \times 256$ pixels, normalized to the range, and processed using the same Rayleigh-based speckle injection procedure with a fixed target variance range of $[2, 7]$ . No dataset-specific adjustment or tuning was performed, ensuring strict consistency across all evaluated image collections.

To ensure reproducibility and provide a clear understanding of the proposed framework, additional implementation details are presented in this section.

The proposed PR-SSD-Net is built upon a U-Net architecture composed of an encoder–decoder structure with skip connections. The network consists of four encoding and four decoding stages, allowing the model to capture both local and global contextual information. The number of feature channels increases progressively along the encoder path, following a configuration of 64, 128, 256, and 512 filters, and is symmetrically reduced in the decoder.

All convolutional layers use 3 × 3 kernels, which provide an effective balance between spatial feature extraction and computational efficiency. Each convolution is followed by a Rectified Linear Unit (ReLU) activation function to introduce non-linearity and ensure stable gradient propagation during training. In addition, Batch Normalization is applied after each convolutional operation to improve convergence stability and enhance generalization performance.

The self-supervised learning strategy is implemented using a blind-spot masking mechanism inspired by Noise2Void. During training, a subset of pixels is randomly selected and masked, and their values are replaced using neighboring pixel information. The network is then trained to predict the original intensity values of these masked pixels based solely on their surrounding context, ensuring that no direct information leakage occurs and enabling effective denoising without requiring clean reference images.

The total loss function integrates three complementary components: the blind-spot reconstruction loss, the physics-based regularization term, and the structural similarity constraint. The weighting coefficients are empirically set to $λ_{p h y s} = 0.1$ and $λ_{S S I M} = 0.5$ , ensuring a balanced contribution between noise suppression and structural preservation.

To assess the robustness of the proposed method, a sensitivity analysis was conducted with respect to the hyperparameters $λ_{p h y s}$ and $λ_{S S I M}$ . The results indicate that the model maintains stable performance over a reasonable range of values, confirming the robustness of the chosen configuration.

The parameter, associated with the statistical modeling of speckle noise, is estimated using a local statistical approach in homogeneous regions. Specifically, a sliding window is applied to identify regions with low intensity variance, which are assumed to correspond to homogeneous tissue areas. Within these regions, $σ$ is computed based on local variance estimation, allowing adaptive adjustment of the physical constraint according to tissue characteristics.

Experimental Results

In this section, a detailed evaluation of the proposed PR-SSD-Net framework is presented. The assessment was performed on six standardized ultrasound datasets, The objective of this analysis is to verify the framework’s capacity to attenuate speckle noise while preserving fine anatomical details essential for accurate medical interpretation.

For benchmarking purposes, the performance of PR-SSD-Net was systematically compared against a collection of established denoising algorithms representing diverse methodological principles.

Specifically, K-SVD reconstructs clean image patches using a sparse representation over a learned dictionary, while BM3D employs block-matching and collaborative 3D filtering to exploit similarities between patches for effective speckle noise reduction in ultrasound images. dnAE relies on unsupervised neural networks to capture latent representations for reconstructing denoised images, whereas GAN-based architectures utilize adversarial training between generator and discriminator networks to produce high-fidelity outputs. S-Transformer integrates attention mechanisms to model long-range spatial dependencies and preserve structural information, DiffuNet applies iterative refinement based on diffusion processes inspired by probabilistic models, and Restormer uses a Transformer-based encoder–decoder with multi-scale attention to restore fine details while maintaining global contextual coherence. These diverse approaches provide a comprehensive benchmark for evaluating the effectiveness of the proposed PR-SSD-Net framework in suppressing speckle noise in ultrasound imaging.

Qualitative Results

A comprehensive evaluation was conducted to assess the performance of our PR-SSD-Net proposed method. Figure 2, present the results of a detailed comparison between our approach and several leading state-of-the-art algorithms, including K-SVD, BM3D, dnAE, GAN, S-Transformer, DiffuNet, and Restormer. For this evaluation, images exhibiting speckle noise at an intensity level of $σ = 7$ were specifically selected, providing a rigorous scenario to thoroughly examine the effectiveness of our method in noise suppression and structural preservation.

Figure 2.

Qualitative assessment of ultrasound enhancement results on ultrasound images.

The clinical applicability of the proposed PR-SSD-Net approach was further examined through an expert evaluation involving five seasoned ultrasound specialists. These experts assessed the denoised images with respect to both visual quality and their potential for supporting diagnostic interpretation. The evaluations consistently highlighted the method’s effectiveness, indicating its capacity to facilitate accurate clinical decision-making and improve diagnostic outcomes in practical settings.

Qualitative comparisons, as illustrated in Figure 2, revealed marked differences among the various denoising strategies. Notably, the proposed PR-SSD-Net model demonstrated a clear advantage, producing reconstructions that are consistently cleaner, structurally coherent, and rich in diagnostically relevant information. Its performance remained robust even in challenging scenarios characterized by intricate tissue textures or low signal-to-noise ratios, illustrating a remarkable ability to preserve anatomical fidelity and visual clarity. These results underscore the model’s potential to set a new approach in the enhancement of ultrasound images, combining noise reduction with the preservation of clinically meaningful details.

Starting with the classical methods, K-SVD demonstrated the least effective noise suppression, leaving noticeable speckle artifacts and limited improvement in structural clarity. BM3D provided a moderate enhancement, reducing speckle noise more effectively than K-SVD, yet some fine details and subtle tissue structures remained blurred or partially distorted. Moving to deep learning-based approaches, dnAE achieved clearer reconstructions with improved preservation of anatomical features, though residual speckle and minor artifacts were still apparent in regions of complex texture. GAN-based models produced visually more plausible images, suppressing noise and smoothing some regions, but occasionally introduced artificial textures or minor structural distortions, limiting their diagnostic fidelity.

S-Transformer further improved structural preservation, effectively maintaining long-range anatomical coherence while reducing speckle, although minor residual noise persisted in low-contrast areas. DiffuNet generated cleaner images with more consistent suppression of speckle noise, successfully restoring complex textures while minimizing artifacts, surpassing S-Transformer in overall fidelity. Restormer provided highly detailed reconstructions, effectively eliminating speckle noise and preserving edge sharpness and subtle tissue structures, with minimal introduction of artifacts. Finally, the proposed PR-SSD-Net consistently delivered the highest quality images across all datasets, achieving superior speckle suppression, preserving fine anatomical details, and maintaining structural coherence even under challenging conditions of low signal-to-noise ratio or complex tissue textures. This ordered evaluation clearly highlights the progressive improvement from classical filtering to advanced transformer and physics-regularized approaches, with PR-SSD-Net emerging as the most robust and diagnostically reliable method for ultrasound image enhancement.

We asked five experienced ultrasound radiologists to evaluate the denoised images. All images came from previous studies and were fully anonymized to protect patient privacy. The images were shown in random order, and the experts did not know which method had produced them, following a carefully designed protocol to ensure consistency and reliability. To avoid fatigue, the evaluation was done in several shorter sessions.

Each radiologist rated the images on a five-point scale, from 1 (non-diagnostic) to 5 (excellent quality), focusing on how clearly lesions could be seen, whether anatomical details were preserved. The results were very positive: 88% of the images were rated five, and the remaining 12% were rated four, giving an average score of 4.85 ± 0.3. Agreement between the experts was extremely high, with a Fleiss $κ$ of 0.93, showing that the assessments were consistent.

These findings show that PR-SSD-Net does more than just improve numbers—it genuinely makes the images easier to interpret. Radiologists could see lesions more clearly, anatomical structures were well preserved, and overall confidence in making diagnostic decisions was higher. In short, this method doesn’t just denoise images; it meaningfully supports real-world clinical practice.

Quantitative Results

This section presents a comprehensive quantitative assessment of state-of-the-art image enhancement methods across all six combined databases, providing a solid foundation to highlight the advantages of our proposed approach.

Table 2 provides a comprehensive overview of our contributions to ultrasound image quality enhancement. Benchmarking against leading techniques—including K-SVD, BM3D, DnCNN, GAN, S-Transformer, DiffuNet, and Restormer—we report key performance indicators such as PSNR and SSIM. Table 3 presents a comparison between the Noise2Void method and the proposed PR-SSD-Net method at σ = 7, reporting the quantitative performance in terms of PSNR and SSIM.

Table 2.

Comparative Performance Evaluation of Denoising Methods on a Combined Dataset: Analysis Using PSNR and SSIM.

	Noise level
Models	$σ = 2$		$σ = 3$		$σ = 4$		σ = 5		$σ = 6$		$σ = 7$
Models	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Noise	24.58	0.55	21.59	0.48	19.47	0.46	17.90	0.45	16.78	0.39	15.60	0.33
K-SVD²¹	27.13	0.61	25.09	0.50	22.70	0.49	18.36	0.47	17.51	0.44	16.97	0.37
BM3D²²	30.87	0.78	28.83	0.77	25.11	0.75	24.67	0.71	20.54	0.68	18.57	0.62
dnAE²³	38.17	0.88	35.21	0.85	33.74	0.82	31.68	0.80	29.27	0.77	26.03	0.72
GAN²⁴	41.36	0.89	38.81	0.86	37.63	0.83	35.47	0.81	33.20	0.78	32.15	0.75
S-Transformer²⁵	44.21	0.93	41.54	0.91	39.47	0.88	37.28	0.85	36.11	0.83	33.16	0.79
DiffuNet²⁶	48.83	0.95	46.12	0.93	43.90	0.91	41.58	0.89	39.27	0.86	36.94	0.83
Restormer²⁷	52.11	0.97	49.67	0.95	47.84	0.93	45.20	0.91	42.61	0.89	39.18	0.88
Proposed PR-SSD-Net	66.47	0.99	63.42	0.98	62.20	0.96	58.93	0.95	56.07	0.93	52.81	0.92

Table 3.

Comparison Between Noise2Void Baseline and the Proposed PR-SSD-Net Method at $σ = 7$ .

Method	PSNR	SSIM
Noise2Void²⁸	36.85	0.83
Proposed PR-SSD-Net	52.81	0.92

In addition, Figures 3 and 4 illustrate the variability of each denoising method through SD-PSNR and SD-SSIM under high noise conditions ( $σ = 7$ ). These visualizations further confirm the consistency and reliability of our method in preserving fine image details while effectively suppressing speckle noise.

Figure 3.

Comparative analysis of SD-PSNR across different denoising methods.

Figure 4.

Comparative analysis of SD-SSIM across different denoising methods.

The results presented in Table 2 provide a detailed comparison of the denoising performance across a range of noise levels ( $σ = 2$ to $7$ ) on the combined dataset. As expected, the unprocessed noisy images exhibit the lowest PSNR and SSIM values, highlighting the substantial degradation caused by speckle noise. Traditional methods such as K-SVD and BM3D offer moderate improvements, yet their performance decreases rapidly as noise intensity increases, reflecting their limited capacity to handle severe speckle noise.

Deep learning-based approaches, including dnAE, GAN, and S-Transformer, achieve significantly higher PSNR and SSIM scores, demonstrating their superior ability to capture complex image features and suppress noise effectively. Among these, DiffuNet and Restormer further enhance denoising performance, particularly under higher noise levels, suggesting the advantage of more sophisticated architectures and feature extraction strategies.

Notably, the proposed PR-SSD-Net consistently outperforms all compared methods across all noise levels, achieving remarkable PSNR values up to 66.47 and SSIM scores approaching 0.99 at low noise, and maintaining superior performance even at $σ = 7$ (PSNR = 52.81, SSIM = 0.92). These results underscore the effectiveness of integrating physics-informed regularization with self-supervised learning, enabling robust and accurate ultrasound image denoising while preserving fine structural details. Overall, this evaluation confirms the substantial practical and methodological advantages of the proposed approach over both conventional and state-of-the-art deep learning techniques.

Beyond the average PSNR and SSIM metrics, the standard deviation (SD) provides a valuable perspective on the consistency and stability of each denoising method across the combined dataset. Figures 3 and 4 illustrate the SD of PSNR and SSIM, respectively, highlighting the variation in performance among the compared techniques.

Conventional methods such as K-SVD and BM3D exhibit higher SD-PSNR values, indicating greater variability and less predictable denoising results. In contrast, deep learning-based approaches, achieve progressively lower SD-PSNR values, reflecting enhanced stability. Notably, our proposed PR-SSD-Net attain the lowest SD-PSNR (0.03675), demonstrating remarkably consistent performance even under varying noise conditions.

Similarly, Figure 4 shows the SD of SSIM, where PR-SSD-Net achieves the minimal variability (0.03475), outperforming all other methods. This low SD indicates that the structural integrity of ultrasound images is preserved consistently across different noise realizations. Overall, the analysis of these figures confirms that PR-SSD-Net not only delivers superior denoising quality but also ensures highly reliable and stable reconstruction, a critical factor for practical clinical applications.

Comparison with Noise2Void Method

To further assess the contribution of the proposed physics-based regularization, we compare the PR-SSD-Net model with a Noise2Void framework that does not incorporate any physical constraint, as presented in Table 3. The table reports a quantitative comparison in terms of PSNR and SSIM at a high noise level ( $σ = 7$ ).

The results clearly demonstrate the significant impact of incorporating the physics-informed regularization. While the Noise2Void baseline effectively reduces noise, it remains limited in modeling the statistical behavior of speckle, leading to residual artifacts and partial loss of fine structural details.

In contrast, the proposed PR-SSD-Net achieves a substantial improvement of approximately (16-in PSNR) and a notable increase in SSIM. This performance gain highlights the importance of guiding the learning process with domain-specific physical priors, enabling the network to produce more accurate and structurally consistent reconstructions.

These findings confirm that the proposed physics-based regularization is not merely an additional component, but a key factor driving the superior performance of the PR-SSD-Net framework.

Ablation Study

To investigate the specific contribution of the physics-informed regularization term $L_{p h y s}$ , we conducted an ablation study focusing on high noise conditions (σ = 7). Figure 5 summarizes the denoising performance with and without this constraint.

Figure 5.

Comparative analysis of SD-PSNR across different denoising methods.

The results obtained from Figure 5 clearly demonstrate the pivotal role of the physical constraint in guiding the network toward more accurate and robust reconstructions. Incorporating $L_{p h y s}$ leads to a remarkable improvement of nearly 16 dB in PSNR and a substantial increase in SSIM from 0.83 to 0.92. This substantial gain highlights that the physics-informed regularization effectively constrains the network to respect the underlying image formation model, reducing artifacts and enhancing structural fidelity. Consequently, the ablation study confirms that the superior performance of PR-SSD-Net is not solely due to the network architecture but critically depends on the integration of the physical prior.

Discussion

The qualitative results presented in Figure 2 reveal clear distinctions among the evaluated denoising strategies. Notably, the proposed PR-SSD-Net consistently delivers reconstructions that are cleaner, structurally coherent, and rich in diagnostically relevant features. Its robustness is particularly evident in challenging scenarios, such as regions with complex tissue textures or low signal-to-noise ratios, where it preserves anatomical fidelity and visual clarity.

Classical filtering methods, including K-SVD and BM3D, offered limited improvement: K-SVD left noticeable speckle artifacts, while BM3D moderately reduced noise but failed to fully restore fine tissue details. Deep learning approaches, such as dnAE and GAN-based models, provided improved clarity and better anatomical preservation, yet residual noise and minor distortions remained in complex regions. Transformer-based methods achieved progressively higher fidelity, with Restormer producing highly detailed reconstructions while minimizing artifacts. Our proposed PR-SSD-Net approach surpassed all compared methods, combining superior speckle suppression with precise preservation of subtle anatomical structures, demonstrating the critical advantage of integrating physics-informed regularization within a self-supervised framework.

Clinical evaluation by five experienced ultrasound radiologists confirmed these findings. Using a blinded and randomized protocol, images were rated based on lesion visibility and anatomical preservation. The results were overwhelmingly positive, indicating consistent assessments. These outcomes illustrate that PR-SSD-Net does more than improve quantitative metrics: it enhances interpretability, ensures consistent structural preservation, and increases diagnostic confidence.

From a quantitative perspective, the superior PSNR and SSIM values reported in Table 2 confirms that the qualitative benefits are supported by measurable improvements. Importantly, as illustrated in Figures 3 and 4, PR-SSD-Net exhibits reduced variability and maintains consistent performance across different noise realizations, a crucial attribute in clinical settings, where random speckle patterns may compromise diagnostic reliability.

The results shown in Table 3 at σ = 7 show that PR-SSD-Net, guided by physics-based regularization, outperforms Noise2Void in both PSNR and SSIM. The physics-informed prior enables better preservation of speckle statistics and fine structural details, reducing residual noise artifacts. This consistent performance under high noise levels highlights its potential clinical relevance. Overall, the physics-based regularization is a key factor driving the superior denoising performance of PR-SSD-Net.

The ablation study in Figure 5 further reveals that these improvements are directly attributable to the physics-informed regularization term.

Guided by the underlying image formation model, the network produces reconstructions that are physically plausible, effectively suppressing artifacts while preserving structural details under high-noise conditions.

These findings have broader implications for the design of ultrasound denoising algorithms. They suggest that combining self-supervised learning with domain-specific priors is a highly effective strategy for handling complex noise while maintaining anatomical accuracy. The novelty of PR-SSD-Net lies not only in its architecture but in this principled integration of physics knowledge, which allows the model to generalize robustly across datasets and noise levels. In practice, this means radiologists can rely on enhanced images that consistently reveal lesions and tissue structures with minimal distortion, potentially improving diagnostic accuracy and confidence.

The discussion of these results emphasizes that our proposed PR-SSD-Net approach is more than a technical improvement: it represents a conceptual shift in ultrasound denoising. By combining physical priors with self-supervised learning, it provides a pathway toward high-quality, reproducible, and clinically meaningful image reconstructions.

By effectively addressing speckle noise, our proposed PR-SSD-Net approach opens a new path for medical imaging, showing the practical impact that physics-guided deep learning can have on both scientific research and patient care.

Conclusion

Speckle noise continues to challenge ultrasound imaging, often masking subtle anatomical details and complicating accurate diagnosis. In this work, we present a new PR-SSD-Net framework that strengthens the conventional U-Net by integrating a physics-informed probabilistic constraint with a self-supervised learning strategy. By embedding the fundamental principles of ultrasound image formation into the network, PR-SSD-Net effectively suppresses speckle noise while faithfully preserving intricate anatomical structures, enabling confident and reliable clinical interpretation.

The proposed PR-SSD-Net method consistently produces visually coherent images that retain diagnostically meaningful features, even in challenging conditions. Expert evaluations confirmed that these improvements translate directly into enhanced clinical confidence, demonstrating that PR-SSD-Net does more than refine images—it supports better, more reliable decision-making in real-world practice. Importantly, the framework achieves robust and stable performance across diverse datasets, highlighting the advantage of integrating domain-specific priors with modern deep learning. This combination ensures that reconstructions are both physically plausible and structurally faithful, providing a foundation for reproducible and clinically meaningful imaging. While the Rayleigh prior guides the network toward plausible speckle-like residuals, we did not explicitly verify the residual distribution. This represents a limitation of the current study and a potential direction for future work.

Our proposed PR-SSD-Net approach opens promising avenues for real-time applications, multi-modal imaging, and patient-specific adaptations, ultimately advancing the quality and reliability of medical imaging and fostering improved patient care.

Footnotes

Author Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Taher SLIMI], [Nahla MAJDOUB BHIRI], [Hajer CHTIOUI] and [Anouar BEN KHALIFA]. The first draft of the manuscript was written by [Taher SLIMI] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

• Taher SLIMI: Conceptualization, Methodology, Software, Data curation, Writing- Original draft preparation.

• Nahla MAJDOUB BHIRI: Visualization, Investigation, Software.

• Hajer CHTIOUI: Visualization, Software.

• Anouar BEN KHALIFA: Validation, Supervision, Writing- Reviewing and Editing.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Statement of Originality and Exclusivity

The authors affirm that the present manuscript has not been published elsewhere, either in whole or in part, and is not under consideration for publication elsewhere. All data, analyses, and interpretations presented herein are the result of original research conducted by the authors. We certify that all figures and tables included in this manuscript are original creations, requiring no reproduction permissions, and that the original electronic files (drawings, photographs, and article) will be retained until the completion of the publication process.

ORCID iD

Taher Slimi

Data Availability Statement

The datasets used in this study are publicly available.

References

Slimi

Ben Khalifa

Novel clinical hybrid deep framework for denoising and anatomical segmentation in challenging ultrasound conditions. Ultrason Imaging. 2026;48:171-200.

Bianconi

Khan

Jassim

Experimental assessment of conventional features, CNN-based features and ensemble schemes for discriminating benign versus malignant lesions on breast ultrasound images. Ultrason Imaging. 2025;47(6):256-69.

Slimi

Ferjaoui

Khalifa

AB.

Ultrasound imaging enhancement using denoising autoencoders. In: 2025 IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD), Tunisia. IEEE; 2025. pp. 209-14.

Slimi

Djeha

Khalifa

AB.

Medical Ultrasound Image Improvement Based on Denoising Convolutional Autoencoder. In: 2025 IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD), Tunisia. IEEE; 2025. pp. 715-20.

Alblwi

Makkawy

Barner

KE.

D-ddpm: deep denoising diffusion probabilistic models for lesion segmentation and data generation in ultrasound imaging. IEEE Access. 2025;13:41194-209.

Oulmalme

Nakouri

Jaafar

A systematic review of generative AI approaches for medical image enhancement: comparing GANs, transformers, and diffusion models. Int J Med Inform. 2025;199:105903.

Huang

Yin

Zhang

Lok

DeRuiter

Jin

, et al. Self-supervised deep learning for denoising in ultrasound microvascular imaging. arXiv preprint arXiv:250705451. 2025.

Pham

Luong

Kaaniche

Trinh

Chetouani

Ultrasound speckle denoising using ResUNet-based diffusion model. In: International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences, France. Springer; 2025. pp. 362-73.

Gornale

Kamat

Hiremath

Siddalingappa

A hybrid ensemble of denoising autoencoders and deep learning models for fetal image analysis. Cureus Journal of Computer Science. 2025;2:es44389-025-09506-x.

10.

Sun

Chi

Huang

Self-supervised denoising of thyroid ultrasound images using SE-module enhanced U-Net with FPN. In: 2025 37th Chinese Control and Decision Conference (CCDC), China. IEEE; 2025. pp. 4212-7.

11.

Zhu

Huang

An exploratory study on ultrasound image denoising using feature extraction and adversarial diffusion model. Med Phys. 2025;52(10):e70023.

12.

Cui

Fei

Xiong

Pang

Liu

, et al. Masked pretraining of U-net for ultrasound image segmentation. Sci Rep. 2025;15(1):31713.

13.

Gómez-Flores

Gregorio-Calas

Coelho de Albuquerque Pereira

BUS-BRA: a breast ultrasound dataset for assessing computer-aided diagnosis systems. Med Phys. 2024;51(4):3110-23.

14.

Al-Dhabyani

Gomaa

Khaled

Fahmy

Dataset of breast ultrasound images. Data Brief. 2020;28:104863.

15.

Yap

Pons

Marti

Ganau

Sentis

Zwiggelaar

, et al. Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J Biomed Health Inform. 2018;22(4):1218-26.

16.

Pedraza

Vargas

Narváez

Durán

Muñoz

Romero

An open access thyroid ultrasound image database. In: 10th International symposium on medical information processing and analysis. vol. 9287, Colombia. SPIE; 2015. pp. 188-93.

17.

Tasnim

Hasan

MK.

CAM-QUS guided self-tuning modular CNNs with multi-loss functions for fully automated breast lesion classification in ultrasound images. Phys Med Biol. 2024;69(1):015018.

18.

Slimi

Baoues

Khalifa

AB.

Innovative ultrasound image denoising using channel attention and variational autoencoders. Crit Rev Biomed Eng. 2025;53(3):47-76.

19.

Slimi

Khalifa

AB.

Enhancing breast ultrasound diagnostics using GANs and guided filtering. In: 2025 IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD), Tunisia. IEEE; 2025. pp. 368-73.

20.

Lee

Yoon

Kim

JY.

Real-time self-supervised ultrasound image enhancement using test-time adaptation for sophisticated rotator cuff tear diagnosis. IEEE Signal Process Lett. 2025;32:1635-9.

21.

Zhou

Automatic modulation recognition based on a new deep K-SVD denoising algorithm. Journal of Data Science and Intelligent Systems. 2023;3(1):18-26.

22.

Dong

Sun

Research on medical image denoising using an improved dung beetle optimization algorithm based on a novel 3D chaotic system. Phys Scr. 2025;100(5):055224.

23.

Soy

Prakash

VV.

Medical image denoising using deep convolutional autoencoders for ultrasound. In: 2025 International Conference on Automation and Computation (AUTOCOM), India. IEEE; 2025. pp. 262-7.

24.

Minhaz

Murali

Örge

Wilson

Bayat

Improved biometric quantification in 3D ultrasound biomicroscopy via generative adversarial networks-based image enhancement. Journal of Imaging Informatics in Medicine. 2026;39(1):103-114.

25.

Huang

Yang

Thyroid nodule ultrasound image segmentation based on improved swin transformer. IEEE Access. 2025;13:19788-95.

26.

Zhang

Yan

Xing

Gao

Tao

Han

, et al. HADiff: hierarchy aggregated diffusion model for pathology image segmentation. Vis Comput. 2025;41:5689-700.

27.

Gautam

Pawar

Joshi

Tazi

Chaudhary

Hambarde

, et al. Pureformer: transformer-based image denoising. In: Proceedings of the Computer Vision and Pattern Recognition Conference, Tennessee, United States; 2025. pp. 1441-9.

28.

Priya

Deep learning-based noise removal from low-resolution medical images. Int J Adv Res Comput Sci Eng. 2025;1(4):1-8.