Enhancing white blood cell classification via fixed segmentation and learnable blending

Abstract

Accurate and timely classification of white blood cells (WBCs) is crucial for diagnosing a myriad of hematological disorders, including leukemia. While deep learning models, particularly Convolutional Neural Networks (CNNs), have shown promise in automating this task from microscopic blood smear images, their performance can be hindered by complex backgrounds and intra-class variations. This paper proposes a novel segmentation-enhanced classification framework that synergistically combines classical image processing with deep learning. Our approach first employs a fixed-parameter Canny edge detection and contour-based algorithm to segment the WBC foreground. Subsequently, a learnable blending layer intelligently fuses the segmented foreground with the original image, allowing the downstream CNN to leverage both focused object information and contextual cues. We meticulously document our experimental journey, including initial attempts to train Canny parameters which proved unstable. The proposed model, featuring fixed segmentation and a learnable blending factor ( $α$ ), was evaluated on a public blood cell image dataset for cancer detection. It achieved a test accuracy of 99.93%, outperforming a baseline CNN (99.07%). The blending factor $α$ converged to approximately 0.63, indicating an optimal balance between the segmented and original image content. Furthermore, the segmentation-enhanced model demonstrated faster inference times. This work underscores the potential of hybrid approaches, particularly the utility of a learnable blending mechanism to effectively integrate classical segmentation outputs into neural networks for improved classification.

Keywords

Image segmentation canny edge detection learnable blending medical image analysis leukemia detection hybrid AI

1. Introduction

White blood cells (WBCs), or leukocytes, are integral the components of the human immune system, and their morphological characteristics and differential counts provide critical insights for diagnosing a wide range of diseases, including infections, inflammatory conditions, and hematological malignancies such as leukemia.^1,2 Elhassan et al.³ created a two-stage deep learning model to accurately detect atypical WBCs in acute myeloid leukemia (AML) by merging a convolutional autoencoder with a convolutional neural network.

Microscopic examination of peripheral blood smears by trained hematologists remains a gold standard for WBC analysis. However, this manual process is labor-intensive, time-consuming, and susceptible to inter-observer variability, highlighting the need for automated and reliable solutions.⁴ In order to improve robustness under various imaging settings,⁵ suggested a dual-attention feature fusion network (DAFFNet) that can capture both high-level semantic features and low-level morphological cues. When taken as a whole, these developments highlight the revolutionary potential of AI-driven microscopy in automating leukocyte analysis and lowering human error while maintaining the diagnostic accuracy often associated with manual review.

In recent years, deep learning (DL), particularly CNNs, has emerged as a powerful tool for medical image analysis, demonstrating remarkable success in tasks like classification, segmentation, and detection.^6,7 Several studies have applied CNNs to WBC classification, achieving promising results.^8–10 These models learn features of hierarchical type directly from raw image data.

Despite their successes, CNNs can face challenges when dealing with microscopic WBC images due to factors such as variations in staining, illumination, cell morphology, overlapping cells, and complex background elements.¹¹ Pre-processing techniques, especially image segmentation to isolate the region of interest (i.e., the WBC), have been explored to enhance the focus of downstream classifiers.^12,13 Classical image processing techniques like thresholding, edge detection (e.g., Canny¹⁴), and watershed algorithms are often employed for this purpose.¹⁵ However, integrating such classical methods into end-to-end trainable DL pipelines can be non-trivial, as these methods are often non-differentiable or their parameters are difficult to optimize directly via backpropagation.

Early attempts in our research focused on making classical segmentation parameters, such as Canny edge detection thresholds, learnable within the CNN framework using finite-difference approximations for gradients. However, this approach proved highly unstable, often leading to degraded segmentation quality and volatile training behavior. This observation motivated a shift in strategy towards leveraging the strengths of a well-tuned, fixed classical segmentation method and introducing a learnable mechanism to integrate its output effectively with the original image data.

This research study gives a novel hybrid WBC classification framework that combines fixed classical segmentation with a learnable blending layer. This proposed research work’s contribution can be classified into 3 major aspects:

A segmentation-enhanced CNN architecture has been demonstrated in this work, where WBCs are first segmented using a Canny edge detection and using contour-based method with optimized, fixed parameters.

We introduce a learnable blending layer with a single trainable parameter, $α$ , which adaptively fuses the segmented foreground image with the original input image. This allows the network to find the optimal balance of information from both sources.

We demonstrate through extensive experiments on a public blood cell image dataset¹⁶ that our proposed approach achieves superior classification accuracy (99.07%) compared to a baseline CNN (98.80%), along with faster inference times. We also provide insights into the learning behavior of the blending parameter $α$ .

We document the evolution of our approach, including the challenges encountered with attempting to train classical segmentation parameters, thereby providing valuable lessons for designing hybrid AI systems.

2. Related work

The automated classification of WBCs has been an emerging area of research, with significant advancements driven by the techniques of ML and DL.

2.1. Deep learning for WBC classification

Recent research has demonstrated the efficacy of a dual-attention CNN supplemented with a Deep Convolutional Generative Adversarial Network (DCGAN) in leukocyte subtype recognition, with accuracies of 99.83% (PBC dataset), 99.35% (LISC), and 99.60% (Raabin-WBC).¹⁷

For image-based classification tasks, CNNs have become the standard. Early applications of CNNs to WBCs, often adapting architectures like LeNet¹⁸ or AlexNet,⁶ shows their potential. More recent works have utilized deeper architectures such as VGGNet, ResNet, DenseNet, and Inception models, often employing transfer learning from models pre-trained on large natural image datasets like ImageNet.^7,9 For instance, Toğaçar et al.⁸ proposed a hybrid CNN-SVM model with feature selection for WBC classification. Rehman et al.¹ and Shafique et al.¹⁰ focused on acute lymphoblastic leukemia (ALL) detection using CNNs. Bukhari et al.¹¹ introduced a CNN framework with squeeze and excitation learning for leukemia detection. Kumar et al.² used CNNs for automatic detection of WBC cancer from bone marrow images. These studies highlight the strong feature extraction capabilities of CNNs but also implicitly point to the need for robust handling of image variability.

2.2. Image segmentation in medical imaging

The ”Image segmentation” is a crucial preprocessing step in many medical image analysis pipelines, aiming to delineate objects of interest from the background or other structures.

2.2.1. Classical segmentation techniques

In this paper,¹⁹ a UNet design incorporates a Canny edge detector. After extracting boundary information from the input CT images using Canny, they employ a dual-path SENet block to fuse these edge characteristics with semantic features. Additionally, they use multiscale convolution to more effectively segment lesions of various sizes. Traditional methods include thresholding, region-growing, and edge-based techniques. A popular method that is well-known for its ability to detect strong edges while remaining noise-resistant is the Canny edge detector.¹⁴ Morphological operations are often used in conjunction with these methods to refine segmentation masks.²⁰ Kausar et al.¹³ presented a framework for WBC segmentation using several digital image processing concepts, including morphological analysis. While computationally efficient, classical methods often require careful parameter tuning and may struggle with complex scenes or subtle boundaries.

2.2.2. Deep learning for segmentation

Deep learning has revolutionized medical image segmentation. Architectures like Fully Convolutional Networks (FCNs), U-Net,²¹ and Mask R-CNN²² have set new benchmarks. Because the architecture of U-Net can capture both contextual and localization information, its encoder-decoder design and skip connections make it especially attractive for biological image segmentation. Recurrent Residual U-Net (R2U-Net)²³ further enhances this by incorporating recurrent and residual units.

2.3. Hybrid approaches and learnable integration

Üzen et al.²⁴ introduced WBC-KICNet, a knowledge-infused CNN that merges domain-specific morphological descriptors with deep features, achieving 99.22% accuracy and 99.25% F1-score.

There is growing interest in hybrid models that combine the strengths of classical image processing and deep learning.²⁵ The rationale is often to use classical methods for tasks where they excel (e.g., well-defined edge detection) or to provide interpretable initial processing, and then use DL for complex feature learning and decision-making.

The concept of learnable blending or feature fusion is also gaining traction. Lee et al.²⁶ explored learning to blend photos for aesthetic purposes using deep learning. Gharbi et al.²⁷ gave the idea of deep bilateral learning for image enhancement purposes for real-time images, where blending coefficients are learned. In the context of image reconstruction and enhancement in microscopy, Ozcan et al.²⁸ discuss data-driven designs blending microscopy and computing. Isola et al.²⁹ in their research work on the method of ”image-to-image translation” with the help of conditional adversarial networks that implicitly learn how to fuse generated features. Our work aligns with this trend by proposing a simple yet effective learnable blending layer to integrate a classically segmented foreground with the original image.

Initial attempts to make classical segmentation parameters (like Canny thresholds) directly learnable within a deep network using finite-difference gradients have been explored in various contexts. However, the non-differentiable nature of many classical operators and the potentially rugged optimization landscape can make such approaches unstable, an experience echoed in our initial trials detailed later in this paper. This challenge motivates approaches like ours, where the classical part is fixed, and the integration is learnable. Attention mechanisms^30,31 also offer a way for networks to learn to weigh different parts of an input or different feature maps, which is conceptually related to blending, albeit typically operating on learned features rather than raw or preprocessed image inputs.

3. Methodology

This section details the dataset used, the architecture of our baseline CNN model, the proposed segmentation-enhanced framework including the fixed segmentation module and the learnable blending layer, and a brief overview of the experimental evolution that led to this design.

3.1. Dataset

Kaggle’s ”Blood Cell Images for Cancer Detection” dataset was used in the study.¹⁶ This dataset comprises 3242 JPEG images of peripheral blood smears, captured using a microscope with 100x magnification. The images are categorized into five classes pertinent to white blood cell analysis and leukemia detection: ”Basophil”, ”Erythroblast”, ”Monocyte”, ”Myeloblast”, and ”Segmented Neutrophil”. For our experiments, for training, validation, and testing sets, the dataset was split into a standard 70%-15%-15% ratio, stratified by class to maintain proportional representation.

To ensure uniformity in model input, all photos were downsized to a standard dimension of $360 \times 363$ pixels. Figure 1 displays a selection of the dataset’s photos.

Figure 1.

Sample original microscopic images from the blood cell dataset, showcasing different WBC types and imaging conditions.

Figure 2.

Overview of the baseline CNN model (Normal Model).

3.2. Baseline CNN architecture

In order to create a performance standard, we have created a standard CNN model, called the ”Normal Model.” Figure 2 shows the architecture, which is made up of a classification head after a sequence of convolutional blocks.

Two 2D Convolutional layers (Conv2D) with $3 \times 3$ sized kernels, ’same’ padding, and ReLU activation.

To stabilize training and enhance generalization, batch normalization (BN) is applied after every convolutional layer.

A $2 \times 2$ pool sized MaxPooling2D layer to reduce spatial dimensions.

A 0.25 rate dropout layer to reduce overfitting.

Three such convolutional blocks make up the model, and each block has an increasing number of filters (32, 64, and 118). Following the convolutional foundation, the feature maps undergo batch normalization and dropout (0.5), after that they are flattened. Then they are inside a dense layer with 512 units (ReLU activation). The final output layer acts as a dense layer with a softmax activation function for the multi-class probability distribution and a number of units equal to the number of classes (5 in this example). With the help of Categorical cross-entropy loss and the Adam optimizer our model was built.

3.3. Study of proposed segmentation-enhanced architecture

The idea of our proposed model is to guide the CNN by emphasizing the foreground WBC. This is achieved through a two-stage process: fixed segmentation followed by learnable blending, as depicted in Figure 3. The flowchart of the fixed parameter segmentation module which is used to extract the white blood cell foreground is given in the Figure 4

Figure 3.

System overview of the proposed segmentation-enhanced classification model with learnable blending.

Figure 4.

Flowchart of the fixed parameter segmentation module used to extract the white blood cell foreground.

3.3.1. Fixed segmentation module

The segmentation module aims to isolate the WBC foreground from the background, which often contains red blood cells and platelets. We adopted a classical image processing pipeline based on the Canny edge detector¹⁴ and contour analysis:

Preprocessing: The input RGB image is converted to grayscale. Gaussian blurring (kernel size determined by $σ$ ) is applied to reduce noise.

Canny Edge Detection: Edges are detected using the Canny algorithm with specific low ( $T_{1}$ ) and high ( $T_{2}$ ) thresholds.

Morphological Closing: A morphological closing operation with an elliptical kernel is applied to the binary edge map to close small gaps in the contours of the WBCs.

Contour Finding and Filtering: Contours are extracted from the processed edge map. We assume the WBC is typically the largest prominent object. Thus, contours are filtered based on a minimum area (e.g., 0.015% of total image area in our final setup) to remove noise, and the largest valid contour is selected as the primary WBC boundary.

Mask Generation: A binary mask is created by filling the largest contour. This mask represents the segmented foreground.

Foreground Extraction: The segmented foreground is extracted from the original RGB image by applying the binary mask, which sets the background pixels to black (zero intensity).

Crucially, after initial experimentation (detailed in Section 3.3.3), the parameters for this module (Canny thresholds:

T_{l o w} = 0.04 \times 255

T_{h i g h} = 0.10 \times 255

; Gaussian sigma

σ = 1.5

; morphological kernel size; minimum contour area) were fixed to values that consistently produced visually good segmentations across a sample of images. This decision was pivotal to achieving stable training and improved performance.

3.3.2. Learnable blending layer

The output of the fixed segmentation module is a foreground image ( $I_{s e g}$ ) where the background is black. While this focuses on the WBC, it discards all background context. To allow the model to leverage both focused foreground information and potentially useful context from the original image ( $I_{o r i g}$ ), we introduced a learnable blending layer. This layer computes a blended image $I_{b l e n d}$ as a weighted sum:

I_{b l e n d} = α_{c l i p} \cdot I_{s e g} + (1 - α_{c l i p}) \cdot I_{o r i g}

(1)

where

α

is a single scalar trainable weight initialized (e.g., to 0.7 in our experiments) and

α_{c l i p} = clip (α, 0.0, 1.0)

ensures the blending factor remains in a meaningful range. This

α

parameter is learned with an end-to-end fashion with the rest of the CNN classifier via backpropagation. It allows the network to dynamically determine the importance of the segmented foreground versus the original image. The conceptual operation is shown in Figure 5.

Figure 5.

Diagram of the learnable blending layer, combining the original image and the segmented foreground using a trainable weight $α$ .

The blended image $I_{b l e n d}$ is then fed into the same baseline CNN architecture described in Section 3.B for classification. The overall architecture is termed the ”Segmentation-Enhanced Model.”

3.3.3. Experimental evolution and rationale

Our initial hypothesis was that a learnable segmentation stage would be optimal. We first implemented a ‘ForegroundExtractionLayer‘ where Canny edge detector parameters (low threshold, high threshold, and Gaussian sigma) were defined as trainable weights. Gradients for these parameters were estimated using finite differences, as the Canny/contour pipeline is not directly differentiable.

However, this approach exhibited significant instability during training:

Degraded Segmentation Quality: The learned Canny parameters often drifted into regions that produced poor or noisy segmentations, significantly worse than manually tuned parameters (as evidenced by visualizing segmented outputs during training).

Volatile Validation Loss: The validation loss for the classification task became extremely erratic, with frequent large spikes. This suggested that minor changes in segmentation parameters led to drastically different masked inputs, confusing the classifier and destabilizing the learning process.

No Performance Gain: Despite the learnable parameters, this model consistently underperformed the baseline CNN, and the segmentation did not demonstrably aid classification.

These challenges are likely due to the highly non-linear and non-smooth relationship between Canny parameters and the final classification loss, making gradient estimation via finite differences unreliable for this complex, multi-step classical pipeline.

Based on these observations, we revised our strategy:

Fix Segmentation Parameters: We reverted to using fixed, empirically optimized parameters for the Canny-contour segmentation module, ensuring a consistent and high-quality foreground extraction.

Introduce Learnable Integration: Recognizing that the optimal way to use the segmented foreground was still an open question, we introduced the learnable blending layer. This shifts the learning from the difficult task of optimizing classical segmentation parameters to the more tractable task of learning how to combine its output with the original image.

This iterative process led to the proposed architecture, which proved to be stable and effective.

4. Experiments and results

4.1. Experimental setup

All models were implemented using TensorFlow and Keras. Training was performed on an NVIDIA P100 GPU.

Optimizer: Adam optimizer with a $10^{- 3}$ starting learning rate.

Learning Rate Schedule: If, after five epochs, the value of the metric ”validation loss” did not improve, the learning rate was lowered by a factor of 0.5, with a minimum learning rate of $10^{- 6}$ .

Early Stopping: If the value of metric ”validation loss” did not decrease after five epochs, the process of model training was terminated. The best weights were put back in.

Batch Size: 32.

Epochs: Up to 50 epochs (though early stopping often occurred sooner).

Data Augmentation (Training only): Simple augmentations were applied to the training data including random horizontal flips, slight random rotations ( $\pm 10 \circ$ ), and random brightness adjustments (factor 0.8 to 1.2).

We initialized the blending parameter

α

at 0.7.

4.2. Study of evaluation metrics

Standard metrics for classification were used to assess the models:

Accuracy: It demonstrates overall percentage of correctly classified images.

Loss: Categorical cross-entropy loss.

Precision, Recall, F1-Score: It is computed per class and macro/weighted averages.

Confusion Matrix: To visualize class-wise performance.

Inference Time: Measured per batch and per image on the test set.

Training Time: Total time taken for model training.

4.3. Quantitative results

The given Table 1 shows a report of the key performance metrics of the Normal Model (baseline CNN) and our proposed Segmentation-Enhanced Model (fixed segmentation + learnable blending) on the test set.

The Segmentation-Enhanced Model achieved a higher test accuracy (99.07%) compared to the Normal Model (98.80%). Notably, the test loss for the enhanced model was significantly lower (0.0565 vs 4.0469), suggesting more confident and stable predictions. The inference time per image was comparable, with the enhanced model being marginally faster. The training time for the segmentation-enhanced model was higher due to the execution of the NumPy-based segmentation function within the TensorFlow graph for each batch during training.

The evolution of the learnable blending parameter $α$ during training is shown in Figure 6. It started at an initial value of 0.7 and converged to approximately 0.63.

Table 1.
Overall study of performance comparison on the test set.

Metric Normal Model Proposed Model

Test Accuracy 98.80% 99.07%

Inference Time per Image (ms) 93.65ms 93.15ms

Training Time (s) 2083.81s 6759.92s

Metric	Normal Model	Proposed Model
Test Accuracy	98.80%	99.07%
Inference Time per Image (ms)	93.65ms	93.15ms
Training Time (s)	2083.81s	6759.92s

Note: Training time is higher for Seg.-Enhanced due to Python ops in TF graph for segmentation.

Figure 6.

Evolution of the learnable blending parameter $α$ over training epochs for the Segmentation-Enhanced Model. The parameter converged from an initial value of 0.7 to approximately 0.63.

Figures 7 and 8 shows accuracy of the training set and validation set along with loss curves for both models. The Segmentation-Enhanced model, while exhibiting some initial volatility in validation loss (though much reduced compared to attempts with learnable Canny parameters), achieved stable and high accuracy.

Figure 7.

Model accuracy curve for training and validation sets for the Normal Model and the Segmentation-Enhanced Model.

Figure 8.

Model loss curve for training and validation sets for the Normal Model and the Segmentation-Enhanced Model.

4.4. Per-class performance

The detailed classification reports for both models are presented in Tables 2 and 3. Both models demonstrate high precision, recall, and F1-scores across all classes. The Segmentation-Enhanced Model shows slight improvements or maintains performance across most classes.

Table 2.
Study of classification report for the normal model.

Class Precision Recall F1-Score Support

Basophil 1.00 0.97 0.99 150

Erythroblast 0.97 0.99 0.98 150

Monocyte 0.98 1.00 0.99 150

Myeloblast 0.99 0.99 0.99 150

Seg_neutrophil 1.00 0.98 0.99 150

Accuracy 0.99 750

Macro Avg 0.99 0.99 0.99 750

Weighted Avg 0.99 0.99 0.99 750

Class	Precision	Recall	F1-Score	Support
Basophil	1.00	0.97	0.99	150
Erythroblast	0.97	0.99	0.98	150
Monocyte	0.98	1.00	0.99	150
Myeloblast	0.99	0.99	0.99	150
Seg_neutrophil	1.00	0.98	0.99	150
Accuracy			0.99	750
Macro Avg	0.99	0.99	0.99	750
Weighted Avg	0.99	0.99	0.99	750

Table 3.

Study of classification report for the segmentation-enhanced model.

Class	Precision	Recall	F1-Score	Support
Basophil	1.00	0.99	0.99	150
Erythroblast	0.97	0.99	0.98	150
Monocyte	0.99	0.99	0.99	150
Myeloblast	0.99	0.99	0.99	150
Seg_neutrophil	1.00	0.99	0.99	150
Accuracy			0.99	750
Macro Avg	0.99	0.99	0.99	750
Weighted Avg	0.99	0.99	0.99	750

Table 4 shows the performance of our Segmentation-Enhanced model with SOA models ( The images were down sized for Training increasing the accuracy of Normal as well as Segmentation Enhanced CNN )

Table 4.

Model performance metrics.

Model	Test Acc	Train Time (s)	Inf Time (ms)	Params (M)
Normal	0.9907	531.96	21.47	51.67
Segmentation-Enhanced	0.9993	4312.42	13.36	51.67
VGG16	0.9351	859.54	25.07	14.98
VGG19	0.9463	976.85	16.19	20.29
ResNet50	0.9670	705.03	50.84	24.64
EfficientNetB0	0.9760	343.41	81.96	4.71
MobileNetV3Large	0.9693	561.05	64.64	3.49

Figure 9 demonstrates the model accuracy of our model for different SOA models.

Figure 9.

Comparison of Model Accuracy.

Figure 10 demonstrates the Average Inference Time for each image.

Figure 10.

Average Inference Time for each image.

4.5. Qualitative results and visualizations

Figure 11 illustrates the effect of the fixed segmentation and the learned blending. The ”Before Training” column shows the output of the fixed segmentation module with the initial blending factor ( $α = 0.7$ ). The ”After Training” column shows the blended images using the converged $α \approx 0.63$ . The segmentation effectively isolates the WBCs. In cases where segmentation might be imperfect (e.g., the second row where the cell is largely missed by segmentation), the blending with the original image ensures that cell information is not entirely lost, allowing the classifier to still perform well. The learned reduction in $α$ suggests the network benefits from retaining a slightly stronger component of the original image context than initially hypothesized.

Figure 11.

Qualitative examples: Original images (Column 1), output of the fixed segmentation module (Column 2), blended images with initial $α = 0.70$ (Column 3, representing ”Before Training” blend effect), and blended images with learned $α \approx 0.63$ (Column 4, representing ”After Training” blend effect). The fixed segmentation isolates the cell, and the learned blend adaptively combines it with the original image.

5. Discussion

The results demonstrate that segmentation-enhanced model, incorporating a fixed classical segmentation stage and a learnable blending layer, achieves a modest but consistent improvement in WBC classification accuracy (99.07% vs. 99.93%) and a significantly lower test loss compared to a baseline CNN. This suggests that guiding the CNN’s attention towards the primary object of interest, the WBC, while still allowing for contextual information through blending, is beneficial.

The critical insight from our experimental journey was the instability encountered when attempting to make the parameters of the classical Canny-contour segmentation module (thresholds, sigma) directly trainable via finite-difference gradients. The highly non-linear nature of this multi-step process resulted in erratic learning and degraded segmentation quality. By fixing these parameters to empirically validated ”good” values, we ensured a stable and high-quality foreground input to the blending stage.

The learnable blending parameter, $α$ , converged from an initial value of 0.7 to approximately 0.63. This is a key finding: the network learned that a blend favoring the segmented image but retaining a substantial portion (about 37%) of the original image is optimal. This implies that while focusing on the segmented WBC is important, some contextual information from the original image (perhaps subtle background textures, color nuances, or even parts of nearby cells not fully captured by the primary contour) still provides valuable discriminative cues for the classifier. The blending layer acts as an adaptive gate, allowing the network to find this optimal balance automatically. The robustness provided by this blending is particularly evident in cases where the fixed segmentation might be imperfect (e.g., partially segmenting a cell or missing a very faint one). In such scenarios, the $(1 - α_{c l i p})$ component from the original image ensures that the information is not entirely lost.

The significantly lower test loss of the segmentation-enhanced model (0.0565) compared to the normal model (4.0469) is noteworthy. While both achieve high accuracy, the lower loss indicates that the enhanced model makes its predictions with higher confidence and is less uncertain, likely due to the cleaner, more focused input provided by the blended images.

The training time for the segmentation-enhanced model was considerably longer (3.24x) than the baseline. This is attributed to the ‘tf.numpy_function‘ call for the segmentation, which involves executing Python/OpenCV code for each batch on the CPU during training, breaking the optimized GPU computation graph. While inference time was comparable and even slightly faster for the enhanced model (potentially due to simpler features in the blended input leading to quicker convergence in later CNN layers or optimized graph execution), the training overhead is a consideration. For deployment, where only inference is performed, this is less of an issue. One of this research study’s limitations is that it only used one dataset. Future work could involve validating the approach on more diverse WBC datasets with varying imaging conditions.To further explore the framework’s potential beyond WBC classification, we applied it to a general Kaggle human emotion classification dataset comprising facial images across 5 emotion classes. Adapting the baseline CNN (with the final layer adjusted for 5 classes) and using the same fixed Canny-contour segmentation followed by the learnable blending layer, we trained for 10 epochs under a similar setup (70-15-15 split, Adam optimizer at $10^{- 3}$ LR, batch size 32, and basic augmentations like horizontal flips and $\pm 10 \circ$ rotations). The baseline CNN achieved 60% test accuracy, while the segmentation-enhanced model reached 65%, highlighting the approach’s ability to handle intra-class variations in facial features and lighting conditions, much like staining and illumination challenges in microscopic images. This modest improvement mirrors the WBC results, underscoring the blending mechanism’s role in focusing on salient regions while preserving contextual details.

Furthermore, while the fixed segmentation with learnable blending proved effective, exploring fully differentiable neural attention mechanisms^31,32 or lightweight neural segmentation sub-networks (like a mini-U-Net) to produce a soft mask could be a promising direction. This would allow the entire pipeline to be learned end-to-end more seamlessly, potentially capturing more nuanced segmentation boundaries, though at the cost of increased model complexity and potentially higher data requirements. Another avenue could be to make the blending factor $α$ itself input-dependent, allowing for dynamic blending based on image characteristics.

Our findings highlight that hybrid approaches, thoughtfully designed to leverage the strengths of both classical image processing and deep learning, can yield performance benefits. The key is not just to combine them, but to create stable and effective mechanisms for their integration, such as the learnable blending layer demonstrated here. The documented challenges with learning classical parameters directly also serve as a cautionary tale and guide for future research in hybrid AI systems.

6. Conclusion

This work presented a novel segmentation-enhanced neural network for white blood cell classification. Our approach successfully integrates a fixed Canny-contour based segmentation module with a downstream CNN classifier through a learnable blending layer. This layer adaptively fuses the segmented foreground with the original image, allowing the network to optimally balance focused object information with broader contextual cues.

Our test findings on a publicly available dataset of blood cell images showed that the segmentation-enhanced model achieves a test accuracy of 99.93%, outperforming a baseline CNN (99.07%) and exhibiting significantly lower test loss. The learnable blending parameter $α$ converged to approximately 0.63, indicating a learned preference for a mix that slightly favors the segmented foreground while retaining substantial original image context. We also detailed our initial unsuccessful attempts at making classical segmentation parameters directly learnable, highlighting the instability of such an approach and motivating our final design.

The proposed method offers a practical way to enhance CNN performance by leveraging the interpretability and efficiency of classical segmentation techniques, combined with the adaptive power of a learnable integration mechanism. Future work will explore fully differentiable segmentation modules and evaluate the framework on larger, more diverse datasets. This work contributes to growing field of hybrid AI models in image analysis, showcasing a pathway to effectively combine traditional and deep learning methodologies.

Footnotes

ORCID iD

Subhajit Adhikari

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Rehman

Abbas

Saba

, et al. Classification of acute lymphoblastic leukemia using deep learning. Microsc Res Tech 2018; 81: 1310–1317.

Kumar

Jain

Khurana

, et al. Automatic detection of white blood cancer from bone marrow microscopic images using convolutional neural networks. IEEE Access 2020; 8: 142521.

Elhassan

Mohd Rahim

Siti Zaiton

, et al. Classification of atypical white blood cells in acute myeloid leukemia using a two-stage hybrid model based on deep convolutional autoencoder and deep convolutional neural network. Diagnostics 2023; 13: 196.

Hegde

Prasad

Veena

, et al. An automatic approach for leukocytes classification using convolutional neural networks. J Med Syst 2019; 43: 110.

Chen

, et al. DAFFNet: A Dual-Attention feature fusion network for classification of white blood cells. arXiv, 2024. https://doi.org/10.48550/arXiv.2405.1622.

Krizhevsky

Sutskever

Hinton

. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L and Weinberger KQ (eds) Proceedings of the 25th annual conference on neural information processing systems, 2012, pp.1097–1105.

Rawat

Wang

. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 2017; 29: 2352–2449.

Toğaçar

Ergen

Cömert

. Classification of white blood cells using deep features obtained from convolutional neural network models based on the combination of feature selection methods. Appl Soft Comput 2020; 97: 106810.

Mohamed

Nabil

Hamed

HFA

, et al. White blood cell classification system based on deep learning and support vector machine. IEEE Access 2022; 10: 32373–32386.

10.

Shafique

Tehsin

. Acute lymphoblastic leukemia detection and classification of its subtypes using pretrained deep convolutional neural networks. Technol Cancer Res Treat 2018; 17: 1533033818802789.

11.

Bukhari

Yasmin

Sammad

, et al. A deep learning framework for leukemia cancer detection in microscopic blood samples using squeeze and excitation learning. Math Probl Eng 2022; 2022: 2801227.

12.

Joshi

Kulkarni

. White blood cell segmentation and classification. In: Proceedings of an international conference on signal and image processing (ICSIP), 2010, pp.469–473.

13.

Kausar

Abdullah

Malik

, et al. A framework for white blood cell segmentation in microscopic blood images using digital image processing. J Med Syst 2018; 42: 165.

14.

Canny

. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 1986; PAMI-8: 679–698.

15.

Phoulady

Goldgof

Hall

, et al. An automated cell counting and segmentation algorithm for quantitative analysis of neuronal images. J Neurosci Methods 2017; 289: 31–43.

16.

Hosseini

Eshraghi

Taami

, et al. A mobile application based on efficient lightweight CNN model for classification of B-ALL cancer from non-cancerous cells: a design and implementation study. Informat Med Unlocked 2023; 39: 101244.

17.

Abdulazeez

Shaker

Abdullah

. Leukocyte classification using dual attention CNN with GAN-based augmentation. Bioengineering 2024; 11: 388.

18.

LeCun

Bottou

Bengio

, et al. Gradient-based learning applied to document recognition. Proc IEEE 1998; 86: 2278–2324.

19.

Ding

Chang

Han

, et al. CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion. arXiv, 2024.

20.

Gonzalez

. Digital Image Processing. Pearson Education India, 2009.

21.

Ronneberger

Fischer

Brox

. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM and Frangi AF (eds) Proceedings of the international conference on medical image computing and computer-assisted intervention (MICCAI), 2015, pp.234–241.

22.

Gkioxari

Dollár

, et al. Mask R-CNN. In: Proceedings of the IEEE International conference on computer vision (ICCV), 2017, pp.2961–2969.

23.

Alom

Yakopcic

Hasan

, et al. Recurrent residual U-net for medical image segmentation. J Med Imaging 2019; 6: 014006.

24.

Üzen

Firat

. WBC-KICNet: Knowledge-infused convolutional network for white blood cell classification. Biomed Signal Process Control 2024; 90: 105893. DOI: 10.1016/j.bspc.2024.105893.

25.

Goceri

. A hybrid deep learning model for medical image segmentation with learnable feature fusion. IEEE J Biomed Health Inform 2021; 25: 1545–1555.

26.

Lee

J-Y

Sunkavalli

Lin

, et al. Learning to blend photos. ACM Trans Graph (TOG) 2017; 36: 245.

27.

Gharbi

Chen

Barron

, et al. Deep bilateral learning for real-time image enhancement. ACM Trans Graph (TOG) 2017; 36: 118.

28.

Ozcan

Rivenson

de Haan

. Deep learning-based image reconstruction and enhancement in optical microscopy. Proc IEEE 2020; 108: 30–56.

29.

Isola

Zhu

J-Y

Zhou

, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp.1125–1134.

30.

Vaswani

, et al. Attention is all you need. In: von Luxburg U, Guyon I, Bengio S, Wallach H and Fergus R (eds) Advances in neural information processing systems 30, 2017, pp.5998–6008.

31.

Woo

Park

Lee

J-Y

, et al. CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), 2018, pp.3–19.

32.

Guo

M-H

, et al. Attention mechanisms in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 2022; 44: 6483–6501.