Bayesian Dynamic Ensemble Selection Method for Dynamic Optical Breast Imaging Classification

Abstract

Breast cancer poses a significant threat to women's health. Dynamic Optical Breast Imaging (DOBI) is a medical imaging method based on the theory of early neovascularization in breast tumors. This technique is fast, non-invasive, and radiation-free, with the potential for early diagnosis of breast cancer, thereby helping to enhance patients’ survival rate and treatment outcomes. However, due to limitations such as limited data volume and class imbalance, existing medical image classification methods often suffer from low classification accuracy, poor generalization ability, and low sensitivity to malignant samples when applied to DOBI. To address these issues, this paper proposes the Bayesian Dynamic Ensemble Selection (BDES) method. In the BDES method, the K-Nearest Neighbor Dynamic Classifier Selection (KNND-CS) method is designed to construct specific classifiers pool based on all available base classifiers for each test sample. Subsequently, the simulated annealing algorithm is utilized to dynamically select classifiers from this pool for inclusion in the ensemble. Finally, the selected classifiers are ensembled by Bayesian probability fusion function to generate the final diagnosis result of benign or malignant breast tumors. The BDES method dynamically selects and integrates appropriate classifiers for each sample, enhancing DOBI's accuracy in diagnosing benign and malignant breast tumors while ensuring robustness and generalization. To validate the effectiveness of BDES, extensive experiments were conducted. Cross-validation experiment proved the generalization and robustness of the DBES method. And the comparation experiment in breast cancer diagnosing for the DOBI dataset shows that the accuracy and sensitivity of the BDES method are 83% and 78% respectively, which is significantly better than many comparative methods, proving the effectiveness of the new method in early diagnosis of breast cancer.

Keywords

ensemble learning Bayesian theory breast tumor classification dynamic optical imaging

1 Introduction

Breast cancer remains the leading cause of cancer-related mortality among women worldwide. Studies underscore the significant impact of early detection and appropriate treatment in reducing mortality rates of breast cancer (Chen et al., 2017; Chhikara & Parang, 2023). These findings emphasize the critical importance of accurately differentiating between benign and malignant neoplasms during the early stages. Breast imaging plays a pivotal role in facilitating the early diagnosis of breast cancer. At present, breast imaging mainly including mammography, breast ultrasound (BUS), magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and elastography ultrasound (EUS). Mammography uses X-ray technology to analyze density differences between tumors and normal tissue, offering high resolution and repeatability (Michell et al., 2012), though its sensitivity is limited in dense breast tissue. BUS, a non-invasive, cost-effective modality, may yield false-positive results due to similar acoustic impedance between early tumors and normal tissue (Guo et al., 2018). MRI provides high-resolution 3D breast images to assess tumor characteristics, though its high cost and long examination time limit its routine use (Terreno et al., 2010). CT generates 3D images to evaluate breast structure but involves significant radiation exposure. PET detects 18F-fluoro-deoxyglucose uptake in cancer cells for metabolic insights (Vaquero & Kinahan, 2015), though it lacks sensitivity for early-stage diagnosis. EUS evaluates tissue elasticity non-invasively but has limited resolution for small or deep lesions (Sigrist et al., 2017). While these techniques provide valuable diagnostic and prognostic capabilities, significantly contributing to the management and treatment of breast cancer, certain persistent limitations remain, such as high costs and radiation-related risks

Dynamic Optical Breast Imaging (DOBI) is a technique based on the theory of tumor neovascularization, utilizing the scattering and absorption properties of near-infrared light by blood components within tumor to differentiate between benign and malignant breast tissue (Fournier et al., 2008). Malignant tumors release angiogenic factors in their early stages, promoting endothelial cell proliferation and the development of new, highly permeable vasculature. During DOBI examination, externally applied compression temporarily increases blood retention in tumor vessels, thereby enhancing the concentration of deoxyhemoglobin. This increase amplifies the absorption of near-infrared light by deoxyhemoglobin, facilitating the detection of metabolic irregularities in malignant tissue. With its non-ionizing nature, non-invasiveness, cost-effectiveness, and ease of operation, DOBI has great potential in the early diagnosis of breast cancer.

To enhance the diagnostic capabilities of breast imaging, traditional machine learning (ML) and deep learning (DL) techniques have been increasingly applied in breast image processing. Among traditional ML methods, support vector machine (SVM), naive Bayes, k-nearest neighbors (KNN), and their improved methods are particularly commonly used. Jasti et al. (2022) used AlexNet and the Relief algorithm for feature extraction, followed by Least Squares Support Vector Machine (LS-SVM), KNN, Random Forest (RF), and Naive Bayes classifiers to classify MIAS mammographic images, with LS-SVM achieving the highest accuracy. Allugunti (2022) employed SVM and RF for classifying breast thermography images, achieving high accuracy and low error rates. Mojrian et al. (2020) proposed the Fuzzy Extreme Learning Machine - Radial Basis Function (Fuzzy ELM-RBF) model, which effectively handles complex features in digitized mammographic images by leveraging the fast learning capability of ELM and the nonlinear mapping ability of the RBF kernel. While these traditional ML techniques can effectively classify tumors based on extracted image features, they heavily rely on manual feature extraction and selection.

Deep Learning, with its powerful ability to automatically extract hierarchical features from raw data, has demonstrated remarkable achievements in breast image classification. Rathinam et al. (2024) proposed the AFCM-DCNN method, combining Adaptive Fuzzy C-Means with Deep Convolutional Neural Networks (DCNN) for early breast cancer detection, outperforming other DCNN models in efficiency and performance. Liu et al. (2022) proposed an improved ResNet-101 convolutional neural network to enhance lesion classification specificity in breast MRI images without the need for pixel-level annotations. Baccouche et al. (2022) used a stacked residual neural network with ResNetV2 models for breast tumor classification, achieving high accuracy on the CBIS-DDSM dataset, though with long training times. However, DL methods typically require large quantities of high-quality labeled data to achieve optimal performance, and their generalization capabilities are often limited when annotated datasets are insufficient.

To further improve classification accuracy and generalization, Ensemble Learning (EL), which integrates the strengths of multiple classifiers, has garnered significant attention. Zheng et al. (2023) applied a weighted voting mechanism to the outputs of pre-trained deep learning models, while Bashir et al. (2015) used a weighted approach to combine five heterogeneous classifiers, achieving an average improvement of approximately 8% in accuracy and 5% in recall across four breast cancer datasets compared to individual models. Moreover, Wang et al. (2018) proposed a Weighted Area Under the Curve Ensemble (WAUCE) method, which significantly improved classification accuracy on mammographic images by leveraging the AUC of multiple SVM models. EL generally achieves higher accuracy than single-model approaches by balancing their respective strengths and weaknesses. However, some EL methods may introduce high complexity and carry a risk of overfitting, particularly when dealing with imbalanced datasets.

In this paper, we proposed a new ensemble learning method, Bayesian Dynamic Ensemble Selection (BDES) to improve the diagnosis of benign and malignant breast tumors. The specific challenges of the DOBI dataset, include limited sample size and class imbalance, necessitate the need for methods that can mitigate issues like low sensitivity to malignant samples and poor generalization. To achieve this, BDES employs several component methods, each chosen to overcome specific issues. Firstly, KNND-CS is employed to dynamically select the most relevant base classifiers for each test sample. By doing so, it allows the framework to dynamically adjust the classification strategy based on the sample's characteristics, improving accuracy. Additionally, it helps to eliminate the interference of irrelevant or redundant classifiers, which reduces computational complexity. Then, simulated annealing is used for choosing the most appropriate classifiers for the ensemble within the specialized classifier pool. A key feature of this process is the incorporation of expected misclassification loss, which applies greater penalties for misclassifying minority class samples. This is particularly important for addressing the class imbalance issue in the DOBI dataset and improving the sensitivity to malignant samples. Finally, Bayesian formula is used to combine prior probability and the output of the selected classifiers in a probabilistic manner, which ensures that the ensemble decision process accounts for the uncertainty and variability of each classifier, thereby enhancing the generalization. Main contributions of this work are as follows:

A Bayesian Dynamic Ensemble Selection (BDES) method is proposed. BDES selects the most appropriate base classifier combination for each test sample through two selection steps, and integrates them through Bayesian probability function to improve the breast cancer diagnosis ability.

K-Nearest Neighbor Dynamic Classifier Selection (KNND-CS) method is established to construct a specific classifiers pool based on all available base classifiers. This step effectively reduces search scope for further selection and integration of multiple base classifiers.

The simulated annealing algorithm and Bayesian probability function are combined to select and ensemble the optimal base classifiers stably, which can effectively enhance breast cancer diagnosis accuracy and generalization.

The rest part of this paper is organized as follows: Section 2 details compute of the BDES method. Section 3 outlines the experimental design and comparative methods. Section 4 presents experimental results and discusses the effectiveness of the proposed method. Section 5 discusses the performance, limitations, and future directions of the BDES approach. Section 6 concludes this paper and provides outlook on future research directions.

2 Method

To improve the classification accuracy on the data-limited DOBI dataset, Bayesian Dynamic Ensemble Selection (BDES) method is proposed, the architecture of BDES is shown in Figure 1. The BDES method is implemented based on predictions of multiple trained base classifiers.

Figure 1.

Bayesian Dynamic Ensemble Selection Architecture Diagram. The BDES Uses Training Data to Calculate Prior Probabilities and Construct a Classifier Pool Tailored to Each Test Sample Based on KNND-CS. It Then Applies Simulated Annealing and a Bayesian Function to Select Classifiers and Generate Final Predictions.

In the BDES method, as illustrated in Figure 1, the training data is used to determine the prior probability based on the distribution of benign and malignant samples of different ages. The feature space is composed of features extracted by the pyradiomics method. And then KNND-CS selects n training samples that are closest to the test sample in the feature space. The base classifiers that correctly classify these n training samples are dynamically selected to form a specific classifier pool for each test sample. On this basis, simulated annealing and the Bayesian function are employed to select the classifiers for integration and to generate the final prediction results. The following sections will provide a detailed description of the different steps of the BDES method.

2.1 Initialization of Prior Probability

The survey reveals the risk of breast cancer varies with age, with an estimated 1 in 53 for women under 50 years of age, 1 in 43 for those aged 50 to 59, and increases to 1 in 23 for women over 60 (Giaquinto et al., 2024; Kim et al., 2025; McGuire et al., 2015). These findings underscore the correlation between age and the malignancy probability of breast tumor. Thus, we initially establish the prior probability of malignant breast tumors based on age. To calculate these age-dependent priors, the training samples are grouped with a 50-year threshold. The proportion of malignant cases among samples <=50 years determines the malignancy prior for individuals under 50. Similarly, the proportion of malignant cases among samples >50 years sets the prior for those over 50 years old. This process is as shown in Equations (2)-(3).

\begin{aligned} P (+ 1 | \leq λ) & = \frac{\sum_{i = 1}^{n} (y_{i} \cdot I (a_{i} \leq λ))}{\sum_{i = 1}^{n} I (a_{i} \leq λ)} \end{aligned}

(1)

\begin{aligned} P (+ 1 | > λ) & = \frac{\sum_{i = 1}^{n} (y_{i} \cdot I (a_{i} > λ))}{\sum_{i = 1}^{n} I (a_{i} > λ)} \end{aligned}

(2)

\begin{aligned} I (expression) & = {\begin{array}{l} 1, if expression is true \\ 0, otherwise \end{array} \end{aligned}

(3)

where,

P (+ 1 | \leq λ)

represents the proportion of malignant samples among samples with age less than or equal to

λ

P (+ 1 | > λ)

represents the proportion of malignant samples among samples with age older than

λ

. + 1 represents malignant sample,

λ

is the age threshold, n is the number of training samples,

a_{i}

is the age of the

i^{t h}

sample,

y_{i}

is the label of the

i^{t h}

sample. Equation (4) represents an indicator function used to filter samples in different age ranges.

Thus, the initial malignant prior probability of sample can be expressed as follow.

\begin{aligned} P_{0} (x_{i}) = {\begin{array}{l} P (+ 1 | \leq λ), if a_{x_{i}} \leq λ \\ P (+ 1 | > λ), if a_{x_{i}} > λ \end{array} \end{aligned}

(4)

where,

P_{0} (x_{i})

represents the initial malignant prior probability of the sample

x_{i}

a_{x_{i}}

is the age of sample

x_{i}

2.2 Construction of Specific Classifiers Pool

In this part, we propose K-Nearest Neighbor Dynamic Classifier Selection (KNND-CS) method to construct a specific classifiers pool for each test sample. During the ensemble process, classifiers are dynamically selected from this pool.

In the calculation of KNND-CS, we employ the Mutual Information (MI) method for feature selection. MI (Vergara and Estévez, 2014) provides a robust measure of the dependence between each feature and the target variable, enabling us to select representative features. This selection process minimizes the impact of irrelevant features and reduces the computational cost of model training, facilitating the search for neighbors in a low-dimensional, highly correlated feature space. So, following the extraction of radiomics features from each image sample by the pyradiomics method, we calculate the MI-score for each feature within the training set. And then we select the top v features with highest MI-scores, as detailed in Equation (5).

\begin{aligned} M I (F; Y) = \sum_{f \in F, y \in Y} p (f, y) \log \frac{p (f, y)}{p (f) p (y)} \end{aligned}

(5)

where,

M I (F; Y)

represents the MI-score between feature F and label Y, F is the set of certain radiomics features for each training sample, Y is the set of label for each training sample.

p (f)

and

p (y)

are the marginal probability distributions of F and Y respectively,

p (f, y)

is the joint probability distribution of F and Y.

Based on the v features selected by MI, we calculate the Euclidean distance between the test sample and all training samples, and then select the k training samples with the closest distance as the nearest neighbors. The Euclidean distance is shown in Equation (6).

\begin{aligned} d (x_{i}, x_{j}) = \sqrt{\sum_{z = 1}^{v} {(f_{z} (x_{i}) - f_{z} (x_{j}))}^{2}} \end{aligned}

(6)

where,

d (x_{i}, x_{j})

represents the Euclidean distance between the training sample

x_{i}

and the test sample

x_{j}

f_{z} (x_{i})

and

f_{z} (x_{j})

are the values of

x_{i}

and

x_{j}

on the

z^{t h}

radiomics feature, respectively.

The specific classifiers pool for each test sample can be constructed by selecting the classifiers that predicts correctly at least one nearest neighbor sample from the base classifiers, as shown in Equation (7).

\begin{aligned} S (x_{j}) = {m \in M | (\sum_{x \in N (x_{j})} I_{m} (x)) \geq 1} \end{aligned}

(7)

where,

S (x_{j})

represents the specific classifiers pool for test sample

x_{j}

, M is the base classifiers pool

M = {m_{0}, m_{1}, \dots, m_{p}}

, and

N (x_{j})

represents the set of k nearest neighbor samples of

x_{j}

. The indicator function

I_{m} (x)

represents the correctness of classifier m's prediction for sample x. If the prediction is correct, the value of

I_{m} (x)

is 1, otherwise is 0.

In the KNND-CS method, two parameters are involved: number of Neighbors (k) and number of Features (v). The number of neighbors, k, is selected based on the size of the training data. Specifically, when the training data size is less than 500, we set k = 5 to avoid overly complex calculations and reduce variance in classifier selection. When the training data size exceeds 500, k is set to 10% of the total number of samples. This choice ensures that enough neighbors are considered to improve classifier selection while maintaining computational feasibility as the dataset grows larger. For the v (number of features) in KNND-CS, we did not set an explicit value, as the method is designed to work flexibly with a wide range of feature sets. Instead, we ensure that the features selected are relevant and non-redundant.

In summary, KNND-CS method filters out the classifiers that are more helpful for the ensemble, which can help to enhance the robustness and pertinency of the model.

2.3 Dynamic Ensemble Selection Based on Bayesian

After constructing the specific classifiers pool, we implement a Simulated Annealing method to globally search for the optimal ensemble of classifier and use the Bayesian probability function to fusion these selected classifiers.

The procedure of Simulated Annealing Ensemble Selection is illustrated in Figure 2. We initialize the parameters firstly, including an initial temperature T of 1.0, termination temperature $T_{e n d}$ of 0.1, and temperature cooling rate $ρ$ of 0.95. Then, the initial accuracy and the initial Expected Misclassification Loss (EML) are calculated according to the initial malignant prior probability. The calculation of EML is shown in Equation (8).

\begin{aligned} E M L = \sum_{i = 1}^{L} \sum_{j = 1, j \neq i}^{L} P (C_{i}) P (C_{j} | C_{i}) α_{i j} \end{aligned}

(8)

where, L is the number of classes (L = 2 in the breast tumor diagnosis scenario),

P (C_{i})

is the proportion of samples of class is

C_{i}

P (C_{j} | C_{i})

is the proportion of samples in class

C_{i}

that are predicted to be

C_{j}

α_{i j}

is the penalty for misclassifying the class

C_{i}

C_{j}

Figure 2.

Dynamic Ensemble Selection. This Process Employs Simulated Annealing to Dynamically Select Suitable Base Learners from the Test Sample’s Specific Classifier Pool. The Predictions of these Learners are then Combined using Bayes’ Theorem and Prior Probabilities to Produce the Final Result.

Bayesian theory provides a robust framework for reasoning with empirical data by integrating prior knowledge with observed information to compute the posterior probability of an event (Balasubramanian et al., 2020; Xiong et al., 2021). In dynamic ensemble selection, as shown in Figure 2, Bayesian probability fusion function and simulated annealing are combined to dynamically select the suitable classifiers. As shown in equation (9), the principle of Bayesian probability fusion function is to constantly adjust our estimate of the probability of an event based on new evidence. Classifier $m_{i}$ in the specific classifiers pool is randomly selected firstly. And then based on the initial malignant prior probability $P_{0}$ and the Bayesian probability fusion function, the posterior probability can be calculated.

\begin{aligned} P_{post} (x_{i}) = \frac{P_{prior} (x_{i}) P_{m_{i}} (x_{i})}{P_{prior} (x_{i}) P_{m_{i}} (x_{i}) + (1 - P_{prior} (x_{i})) (1 - P_{m_{i}} (x_{i}))} \end{aligned}

(9)

where,

P_{post} (x_{i})

is the malignant posterior probability of samples

x_{i}

P_{prior} (x_{i})

is the malignant prior probability of sample

x_{i}

P_{m_{i}} (x_{i})

is the predicted probability of classifier

m_{i}

for sample

x_{i}

. Numerator represents the joint probability that the patient has a malignant tumor, combining the prior belief with the likelihood given by the new classifier's prediction for malignancy. Denominator is the normalization term, which sums the joint probabilities for all possible classes (malignant and benign). It ensures that the resulting posterior probabilities are properly normalized and sum to 1.

At each iteration in Figure 2, the posterior probability obtained from the previous fusion step is used as the new prior. A new base classifier from the selected pool then provides updated likelihoods (i.e., predicted class probabilities), which are treated as new evidence to update the posterior. By continuously updating classification beliefs using all available evidence, the Bayesian probability fusion function can reduce the impact of any single classifier error, thereby improving the accuracy and stability of the final prediction, which is particularly valuable in uncertain or imbalanced datasets such as DOBI.

Upon obtaining the malignant posterior probability, we calculate the new EML and accuracy, then substitute the changes in EML and the current temperature into the acceptance condition formula. If the acceptance condition is met, the classifier $m_{i}$ is added to the current ensemble classifier set, with the posterior probability replacing the prior probability, and the EML and accuracy are updated. If the accuracy is improved, the optimal ensemble classifiers set is also updated. Conversely, if the acceptance condition is not satisfied, the classifier $m_{i}$ is rejected, the temperature is reduced and the random selection of classifiers continues. The simulated annealing ensemble selection algorithm terminates when the temperature falls below the termination temperature or when the number of optimal ensemble classifiers exceeds 1/3 of the base classifiers pool. The acceptance condition is defined in Equation (10), and the temperature reduction function is presented in equation (11).

\begin{aligned} e^{\frac{- Δ EML}{T}} > 0.9 s \end{aligned}

(10)

\begin{aligned} T = T \cdot ρ \end{aligned}

(11)

where,

Δ E M L = E M L_{prior \propto m} - E M L_{prior}

represents the change in EML,

E M L_{prior}

is the EML before ensemble,

E M L_{prior \propto m}

is the EML after ensemble with classifier

m_{i}

. T is the current temperature,

ρ (0 < ρ < 1)

is the cooling rate.

After selecting the ensemble classifiers for each test sample, the prediction of the test sample can be obtained based on the initial malignant prior probabilities and formula (10).

It should be noted the performance of the simulated annealing (SA) algorithm is indeed affected by parameters such as initial temperature, cooling rate, termination temperature and acceptance threshold. The setting of these parameters has the following considerations.

Initial Temperature: We set the initial temperature to 1.0 to cover a reasonable range of potential misclassification errors, which helps the algorithm escape local minima during the early iterations.

Cooling Rate: We chose a cooling rate of 0.95, which provides a moderate decrease in temperature, ensuring a balance between exploration and exploitation during the optimization process.

Termination Temperature: The termination temperature was set to 0.1, which is the threshold for stopping the algorithm. Given the initial temperature of 1.0, the temperature decreases to 0.1 after approximately 45 iterations, providing a suitable stopping condition that ensures the algorithm has enough time to converge to a solution.

Acceptance Threshold: We use an acceptance probability threshold of 0.9. This high acceptance threshold ensures that the algorithm remains stable and is less likely to prematurely reject potentially beneficial solutions during the search process.

In summary, within the specific classifiers pool constructed by KNND-CS, BDES dynamically selects the optimal classifiers using the simulated annealing ensemble selection and computes the ensemble result through Bayesian probability fusion function.

3 Experimental Methods

3.1 Datasets

The datasets used in experiments include mini-DDSM and DOBI datasets. The mini-DDSM (Lekamlage et al., 2020) dataset is a condensed version of Digital Database for Screening Mammography (DDSM) offering breast mammography dataset for academic research. mini-DDSM dataset has 5388 samples, including 2682 benign samples and 2706 malignant samples. DOBI dataset consists of 2927 samples, including 1825 benign samples and 1102 malignant samples, which shows data imbalance.

The data divisions of the mini-DDSM dataset and the DOBI dataset are shown in Tables 1 and 2, respectively. For the base classifier, the mini-DDSM dataset has 3769 samples in the training set and 1619 samples in the testing set. The DOBI dataset has 2261 samples in the training set and 666 samples in the testing set. For the ensemble model, the mini-DDSM dataset has 1079 samples in the training set and 540 samples in the test set. The DOBI dataset has 533 samples in the training set and 133 samples in the test set.

Table 1.
Data Splitting of mini-DDSM and DOBI Datasets for Base Classifiers.

Train Test

Dataset Malignant Benign Malignant Benign

mini-DDSM 1901 1879 815 804

DOBI 848 1413 254 412

	Train	Test
mini-DDSM	1901	1879	815	804
DOBI	848	1413	254	412

Table 2.

Data Splitting of mini-DDSM and DOBI Datasets for Ensemble Learning.

	Train		Test
	Malignant	Benign	Malignant	Benign
mini-DDSM	543	536	272	268
DOBI	203	330	51	82

3.2 Evaluation Metrics

In this paper, the performance of model is evaluated by metrics of accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score. These metrics are calculated based on confusion matrix in Table 3, as shown in Equations (12)-(17).

\begin{aligned} Accuracy & = \frac{T P + T N}{T P + T N + F P + F N} \end{aligned}

(12)

\begin{aligned} Sensitivity & = Recall = \frac{T P}{T P + F N} \end{aligned}

(13)

\begin{aligned} Specificity & = \frac{T N}{T N + F P} \end{aligned}

(14)

\begin{aligned} P P V & = precision = \frac{T P}{T P + F P} \end{aligned}

(15)

\begin{aligned} N P V & = \frac{T N}{T N + F N} \end{aligned}

(16)

\begin{aligned} F 1 s c o r e & = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l} \end{aligned}

(17)

Table 3.

Confusion Matrix for Binary Classification.

		Predicted Type
		P	N
True Type	P	TP (True Positive)	FN (False Negative)
True Type	N	FP (False Positive)	TN (True Negative)

where, TP is the number of samples correctly predicted as malignant (true positive), FP is the number of samples incorrectly predicted as malignant (false positive), TN is the number of samples correctly predicted as benign (true negative), and FN is the number of samples incorrectly predicted as benign (false negative).

It should be noted that sensitivity and specificity are the core metrics to measure the performance of cancer diagnosis models. It is difficult to balance sensitivity and specificity. When setting high specificity, it usually leads to a decrease in sensitivity. Therefore, the performance can’t be judged only by these two metrics. PPV represents the proportion of patients predicted to be positive (cancer) who actually have cancer, and is a measure of the model's ability to avoid “miscalculation”. NPV represents the proportion of patients predicted to be negative (benign), and is a measure of the model's ability to avoid “missed judgments”. The F1-score is a harmonic average of accuracy and recall used to comprehensively evaluate the performance of the model.

3.3 Base Classifiers

Before applying ensemble learning methods, we constructed 10 classifiers using both traditional machine learning and deep learning models on the mini-DDSM dataset. Table 4 shows the methods of base classifiers and their performance on the mini-DDSM test set. Among these basic classifiers, the EfficientNet model had the best comprehensive performance, with accuracy, sensitivity and specificity of 72%, 79% and 65%, respectively. The VGG16 model performs best in sensitivity, reaching 80%. The RF_WT model performs the best on specificity with 76%.

Table 4.
Base Classifiers and Their Performance on the mini-DDSM Test Dataset.

Classifiers Method introduction Accuracy Sensitivity Specificity

VGG16 Visual Geometry Group 16-layer network (Mascarenhas and Agarwal, 2021) was used to classify the images. 0.65 0.80 0.50

ResNet18 Residual Network 18 (Zhang et al., 2017) was used for image classification. 0.72 0.78 0.65

ResNet50 Residual Network 50 (Zhang et al., 2017) is used for image classification. 0.66 0.82 0.50

EfficientNet Efficient Network B0 (Tan and Le, 2019) was used for image classification. 0.72 0.79 0.65

SVM SVM (Huang et al., 2018) classifier based on radiomics features of images. 0.65 0.67 0.62

SVM_WT SVM classifier based on radiomics features wavelet transformed images. 0.66 0.67 0.65

LR_WT Logistic Regression based on radiomics features wavelet transformed images. 0.66 0.67 0.65

RF Random Forest (Biau and Scornet, 2016) based on radiomics features of images. 0.66 0.66 0.67

RF_WT Random Forest based on radiomics features wavelet transformed images. 0.71 0.66 0.76

LDA_WT Linear Discriminant Analysis (Tharwat et al., 2017) based on radiomics features wavelet transformed images. 0.67 0.65 0.70

Classifiers	Method introduction	Accuracy	Sensitivity	Specificity
VGG16	Visual Geometry Group 16-layer network (Mascarenhas and Agarwal, 2021) was used to classify the images.	0.65	0.80	0.50
ResNet18	Residual Network 18 (Zhang et al., 2017) was used for image classification.	0.72	0.78	0.65
ResNet50	Residual Network 50 (Zhang et al., 2017) is used for image classification.	0.66	0.82	0.50
EfficientNet	Efficient Network B0 (Tan and Le, 2019) was used for image classification.	0.72	0.79	0.65
SVM	SVM (Huang et al., 2018) classifier based on radiomics features of images.	0.65	0.67	0.62
SVM_WT	SVM classifier based on radiomics features wavelet transformed images.	0.66	0.67	0.65
LR_WT	Logistic Regression based on radiomics features wavelet transformed images.	0.66	0.67	0.65
RF	Random Forest (Biau and Scornet, 2016) based on radiomics features of images.	0.66	0.66	0.67
RF_WT	Random Forest based on radiomics features wavelet transformed images.	0.71	0.66	0.76
LDA_WT	Linear Discriminant Analysis (Tharwat et al., 2017) based on radiomics features wavelet transformed images.	0.67	0.65	0.70

* The best-performing method has been bolded, and the second-best method is now underlined.

For the DOBI dataset we designed 10 base classifiers. Table 5 shows the base classifiers and their performance on the DOBI test set. As shown in Table 5, the NIRCL model has the best performance, with accuracy, sensitivity and specificity of 72%, 74% and 71%, respectively. ResLRA model has the best sensitivity, reaching 74%. The VQRes18 model performs the best specificity with 78%.

Table 5.

Base Classifiers and Their Performance on DOBI Test Dataset.

Classifiers	Method introduction	Accuracy	Sensitivity	Specificity
PML	Multi-layer perceptron (Popescu et al., 2009) based on radiomics features.	0.64	0.69	0.61
NIRCL	Contrastive learning method based on DOBI images.	0.72	0.74	0.71
LR	Logistic Regression (Boateng and Abaye, 2019) based on DOBI images.	0.71	0.71	0.72
PCL	Convolutional network (Li et al., 2021) combined with long short-term memory neural network (Van Houdt et al., 2020) based on radiomics features.	0.69	0.68	0.70
ResMLP	Residual Network (Zhang et al., 2017) extracts image features, while a multi-layer-perceptron extracts features like age, cup, and band, with a linear layer completing the classification.	0.64	0.44	0.77
ResNL	Neural network model combining deep residual network and non-local operation.	0.66	0.68	0.65
ResLRA	ResNet performance was optimized using a learning rate adjustment algorithm.	0.68	0.74	0.64
ResSimAM	The similarity-based attention module is integrated into ResNet.	0.67	0.68	0.67
VQRes18	ResNet combined with vector quantization autoencoder.	0.65	0.42	0.78
ResUp	ResNet combines with upsampling images.	0.67	0.66	0.69

* The best-performing method has been bolded, and the second-best method is now underlined.

3.4 Comparison Methods

We compare the BDES with several advanced ensemble methods to evaluate its effectiveness in classifying benign and malignant of breast tumors.

Voting: Each base classifier casts a vote for the predicted class, and the class with the most votes is the classification result.

Stacking: This method inputs the predicted probability values of different base classifiers into the meta-learner, and the meta-learner uses XGBoost.

KNORA-ELIMINATE (Ko et al., 2008): This dynamic ensemble selection method selects the most suitable classifiers for each test sample by identifying the k nearest neighbor in the validation set and allowing only those classifiers that correctly classify all nearest neighbors to vote.

KNORA-UNION (Ko et al., 2008): Similar to KNORA-ELIMINATE, this method allows classifiers that correctly classify any nearest neighbor to be include in the ensemble. Classifiers receive greater voting weight based on the number of neighbors they correctly classify.

Snapshot Ensemble (Huang et al., 2017): This is a deep ensemble method, involves training with a high initial learning rate that gradually decreases, saving model weights as a “snapshot” at a stable state. During testing, the predictions from all snapshot models are averaged.

Ensemble Graph Neural Networks(EGNN) (Singh et al., 2023): This is an ensemble learning method based on deep learning, which realizes the classification of lung medical images by integrating three graph neural networks.

4 Results

To evaluate the effectiveness of the BDES method, we conducted comprehensive experiments by integrating the base classifiers of mini-DDSM dataset listed in Table 4. The performance of BDES method is compared with advanced ensemble methods in section 4.4. Moreover, the robustness of our approach is analyzed through five-fold cross-validation experiments. To further validate its generalization potential under data-scarce conditions, BDES is applied to the Dynamic Optical Breast Imaging (DOBI) dataset. Additionally, ablation experiments were conducted to evaluate the contributions of each BDES component, ensuring a comprehensive understanding of its effectiveness.

4.1 Comparison Experiment on mini-DDSM Dataset

In this section, we employ the proposed BDES method to integrate the base classifiers of Mini-DDSM in Section 4.3. A comparative analysis conducted against other advanced ensemble methods introduced in Section 4.4. The experimental results with highlighted comparative performance metrics are shown in Figure 3.

Figure 3.

Results of Cross-Validation and Comparative Experiments for mini-DDSM Dataset. the BDES Method Predicts Results with Higher Performance in Multiple Indicators and has Less Fluctuation During the Cross-Validation Process.

The experimental results presented in Figure 3 demonstrate that the BDES model achieved the highest performance across metrics, including accuracy, sensitivity, NPV and F1 score. While the KNORA-ELIMINATE method exhibits the highest specificity (0.84), its sensitivity is relatively low at 0.46. In the diagnosis of breast tumors, sensitivity is of critical importance due to the potentially severe consequences of misdiagnosing malignant tumors as benign. The BDES model addresses this challenge by incorporating a modified EML function that imposes a higher penalty for misclassifying malignant cases as benign. This process enhances the model's ability to accurately identify the malignant tumors, thereby improving sensitivity and ensuring more reliable detection of potential malignant cases. In addition, as for the generalization ability of the model, we can see from the fluctuation range of the five-fold cross validation that although the performance indicators of most of the comparison methods fluctuate within a small range, the sensitivity and specificity of some comparison methods fluctuate greatly. The BDES method shows a small fluctuation in all performance indicators, which proves that the BDES method has good generalization ability.

4.2 Comparison and Cross-Validation Experiments on DOBI Dataset

In order to verify the effectiveness and robustness of the DBES method on the DOBI dataset, this section designed a five-fold cross-validation experiment based on the DOBI dataset and compared it with multiple ensemble learning methods. In the experiment, each fold contains 533 training samples and 133 test samples.

Figure 4 illustrates the performance comparison of six models on different metrics. Each bar is the average value of a specific metric for the corresponding model in five-fold cross-validation, with higher height indicating better results. The error bar represents the fluctuation range, where the red upper edge is the max value and the green lower edge is the min value. A shorter length of the error bar indicates a more stable result. According to the results of BDES shown in Figure 3, the BDES model outperformed the best-performing base classifier NIRCL in Table 5, by 11%, 4%, and 15% in accuracy, sensitivity, and specificity, respectively. These results means that BDES method can provide more accurate classification decisions, which proves that the BDES method can select and ensemble excellent base classifiers by simulated annealing ensemble selection and Bayesian probability fusion function.

Figure 4.

Results of Cross-Validation and Comparative Experiments for DOBI Dataset. The BDES Method Predicts Results with Higher Performance in Multiple Indicators and has less Fluctuation During the Cross-Validation Process.

In comparison with other methods, BDES achieves a higher average accuracy (0.83), which represents a 4% and 5% improvement over Stacking (0.79) and Voting (0.78), respectively. Compared with the medical image integration school method EGNN, the performance of the BDES method is also significantly better than EGNN. Moreover, BDES exhibits smaller differences between the maximum and minimum values across all metrics, indicating greater model stability. This can be attributed to the well-designed initialization of prior probability, which provides more accurate priors for Bayesian probability fusion. In addition, the simulated annealing ensemble selection performs global optimization during classifier selection, ensuring the stability of the ensemble results. Therefore, the BDES model can get better stability and generalization ability in the DOBI dataset with small amount of data.

4.3 Comparison of Calculation Time

In this study, the BDES method incorporates KNN-CS and simulated annealing, which increases computational complexity. To evaluate the real-time performance of the proposed approach, this section compares the training time, the total inference time on 133 test samples, and the average inference time per sample of the BDES method and various ensemble learning methods presented in Section 3.4. The results are summarized in Table 6.

Table 6.
Comparison of Computational Time of Different Ensemble Learning Methods (s).

Model Training time Total test time Average inference time

Voting / 0.16 0.0012

Stacking 0.35 0.02 0.00015

KNORA-ELIMINATE / 0.34 0.0026

KNORA-UNION / 0.31 0.0023

Snapshot Ensemble 1092.67 12.74 0.0958

EGNN 34.85 0.14 0.0011

BDES / 60.12 0.4520

Model	Training time	Total test time	Average inference time
Voting	/	0.16	0.0012
Stacking	0.35	0.02	0.00015
KNORA-ELIMINATE	/	0.34	0.0026
KNORA-UNION	/	0.31	0.0023
Snapshot Ensemble	1092.67	12.74	0.0958
EGNN	34.85	0.14	0.0011
BDES	/	60.12	0.4520

As shown in Table 6, the inference time per sample for the BDES method is 0.4520 s, which is significantly higher than that of the compared methods. This indicates that the increased computational complexity of BDES indeed leads to longer inference times. However, considering the application scenario of this study—benign and malignant breast tumor diagnosis—the inference time remains well below 1 s, which is acceptable for clinical diagnosis.

It is also important to note that when the BDES method is extended to higher-dimensional feature spaces or incorporates a larger number of base classifiers, the computational time will inevitably increase further. In such cases, it is recommended to perform dimensionality reduction and base classifier selection in advance, in order to avoid unnecessary computational overhead caused by redundant features and classifiers.

4.4 Ablation Experiment

The main part of BDES method include the dynamic ensemble selection, simulated annealing algorithms and Bayesian probability fusion function. In order to validate the contributions of these three parts, we conducted ablation experiments on the DOBI dataset and the mini-DDSM dataset.

Here, BDES-noDynamic refers to the model using a fixed set of classifiers for all test samples, without constructing a specific classifiers pool and ensemble for each individual sample. BDES-noSA indicates that the model does not employ the simulated annealing algorithm, but selects the classifier with the smallest EML at each layer. If this classifier improves accuracy, it is added to the ensemble classifiers set, otherwise the process halts. BDES-noBayesian refers to the approach where the Bayesian probability fusion function is omitted, and the average predictions from each classifier are used instead. Random Priors experiment does not use the prior probability based on the age distribution, but uses a random probability value as the prior. In order to avoid contingency, each cross-validation experiment is repeated 10 times, and the mean of the repeated experimental results is taken as the final result.

As shown in Table 7, compared with the ablation methods, the BDES method has the best performance on both the DOBI dataset and the mini-DDSM. The performance of the BDES-noDynamic model on both datasets declined, especially on the DOBI dataset. This result proves that the dynamic selection of classifiers provides a stable classifier foundation for the subsequent integration process and effectively alleviates the model instability caused by severe data imbalance. The accuracy of BDES-noSA is 1% lower than that of BDES, with decreases also observed in sensitivity and F1 score. This indicates that the simulated annealing algorithm can effectively explore the global search space to find near-optimal classifiers combinations, thereby enhancing the overall performance of the model. Additionally, BDES-noBayesian shows a 2% drop in accuracy and a 3% decrease in F1 score, which demonstrates that the Bayesian probability fusion function effectively combines prior information with the outputs of base classifiers to make the final decision, thereby improving the model's accuracy. Finally, the performance of the random prior experiment is lower than that of the BDES method on both DOBI and mini-DDSM dataset, which proves the effectiveness of the prior probability based on age distribution.

Table 7.
Results of Ablation Experiments.

Dataset Model Accuracy Sensitivity Specificity PPV NPV F1 score

mini-DDSM BDES 0.76 0.83 0.70 0.73 0.80 0.78

noDynamic 0.75 0.81 0.69 0.72 0.78 0.76

noSA 0.75 0.82 0.68 0.72 0.79 0.77

noBayesian 0.74 0.81 0.67 0.71 0.77 0.76

Random Priors 0.75 0.83 0.66 0.71 0.79 0.76

DOBI BDES 0.83 0.78 0.85 0.77 0.86 0.78

noDynamic 0.80 0.75 0.83 0.73 0.84 0.74

noSA 0.82 0.76 0.85 0.76 0.85 0.76

noBayesian 0.81 0.76 0.84 0.75 0.85 0.75

Random Priors 0.77 0.78 0.76 0.67 0.85 0.72

Dataset	Model	Accuracy	Sensitivity	Specificity	PPV	NPV	F1 score
mini-DDSM	BDES	0.76	0.83	0.70	0.73	0.80	0.78
noDynamic	0.75	0.81	0.69	0.72	0.78	0.76
noSA	0.75	0.82	0.68	0.72	0.79	0.77
noBayesian	0.74	0.81	0.67	0.71	0.77	0.76
Random Priors	0.75	0.83	0.66	0.71	0.79	0.76
DOBI	BDES	0.83	0.78	0.85	0.77	0.86	0.78
noDynamic	0.80	0.75	0.83	0.73	0.84	0.74
noSA	0.82	0.76	0.85	0.76	0.85	0.76
noBayesian	0.81	0.76	0.84	0.75	0.85	0.75
Random Priors	0.77	0.78	0.76	0.67	0.85	0.72

Furthermore, we further analyze the cross-validation of ablation experiments. As shown in Figure 5, BDES achieves an accuracy variance of only 0.42 × 10⁻⁴, significantly lower than that of other models, indicating the most stable classification performance across different folds. The simulated annealing algorithm and Bayesian probability fusion contribute to the optimization and adjustment of base classifier ensembles, thereby enhancing classification stability.

Figure 5.

Variance of the Cross-Validation Accuracy on the DOBI Dataset. the Variance of BDES Without Ablation is Significantly Smaller Than That of Other Methods with Ablated Models.

5 Discussion

The accurate diagnosis of breast tumor benignancy and malignancy is critical for early detection and timely treatment. In this study, we aimed to address the challenges posed by limited and imbalanced data in dynamic optical breast imaging (DOBI) datasets, which hinder the development of robust diagnostic models. By designing an ensemble learning method, Bayesian Dynamic Ensemble Selection (BDES), we achieved significant improvements in classification accuracy, robustness, and generalization for early-stage breast tumor diagnosis.

(1) Discussion of BDES Performance

The BDES framework demonstrated excellent accuracy and sensitivity across both the mini-DDSM and DOBI datasets, highlighting its effectiveness in medical image classification. Additionally, the use of five-fold cross-validation further corroborated the robustness of the model. However, it is important to note that the specificity of BDES was suboptimal on both datasets. This can be attributed to the inherent trade-off between sensitivity and specificity. During the design of the BDES method, a greater penalty was assigned to the misclassification of malignant samples, reflecting the clinical prioritization of minimizing false negatives. The Expected Misclassification Loss (EML) function embedded in BDES enforces this bias by imposing a higher penalty on the misclassification of malignant samples, thereby prioritizing sensitivity over specificity. As a result, while sensitivity is maximized, specificity for benign samples is slightly compromised, which is consistent with our clinical focus on reducing false negatives.

Moreover, the performance of BDES, like most ensemble learning methods, is contingent on the quality of the base classifiers. The primary objective of BDES is to harness the complementary strengths of multiple base classifiers to enhance overall classification performance, with the aim of exceeding the performance of the best individual base learner. This strategy, grounded in the principles of ensemble learning, underscores the importance of diversity and accuracy among base classifiers. However, it is important to acknowledge that if the base classifiers are weak or inadequately trained, the ensemble's performance may be suboptimal, as the model would lack sufficient useful information to exploit. This limitation is a general characteristic of ensemble-based approaches and is not unique to BDES. In this study, we assume that the base classifiers are reasonably well-trained and exhibit a degree of diversity, which is a standard assumption in ensemble learning research.

(2) Discussion on Prior Information

In the BDES, we used patient age to initialize the prior probability of malignancy, which is because multiple studies indicating differences in breast cancer between younger (< 50 years) and older (≥ 50 years) populations (McGuire et al., 2015; Giaquinto et al., 2024; Kim et al., 2025). In the current study, the prior probability is estimated from the distribution of the training dataset. Although this assumption may not be entirely accurate, it holds statistical significance and thus provides a reasonable basis for supporting benign and malignant classification of breast tumors. The effectiveness of this approach has also been validated through ablation experiments conducted in this study.

However, this prior probability may not generalize well to populations with different age distributions. Therefore, when applying the BDES method to other domains or datasets, it is necessary to recalculate the prior probability based on the corresponding training data. In future work, we plan to incorporate additional demographic and clinical variables to further enhance the model's performance and adaptability.

(3) Discussion on Computational Complexity

Firstly, while KNND-CS is effective in capturing local patterns within the data, its computational complexity increases with both the size of the dataset and the dimensionality of the feature space. Specifically, the time complexity of the k-NN algorithm is O(n⋅d), where n is the number of samples and d is the number of features. Although the number of features and neighbors we currently use is small and will not bring too much computational burden, when the feature dimension of the data increases, it may still lead to major computational challenges in high-dimensional space. To mitigate this, we plan to incorporate dimensionality reduction techniques for the case with higher dimensional features, which can reduce the feature space while retaining key information, thereby improving the efficiency of k-NN without substantial loss of performance.

Secondly, to reduce the computational burden of the simulated annealing process, we implement early stopping rules, where the algorithm terminates once the number of optimal ensemble classifiers exceeds one-third of the base classifier pool. This helps minimize unnecessary iterations, especially when convergence is achieved quickly, thus enhancing computational efficiency while preserving the integrity of model selection. At the same time, in order to further improve the limitations of the simulated annealing method in the BDES method, in future research, we will consider distributed computing and adaptive optimization algorithms.

In addition, the BDES method involves hyperparameters, which we currently set empirically based on the characteristics of the data, which does have certain complexity. In future work, we plan to explore automated hyperparameter tuning techniques to simplify the process and determine optimal parameter values more efficiently.

(4) Discussion on Transparency of Decision-Making Process

The key motivation of BDES is to improve diagnostic performance while addressing challenges such as limited data and class imbalance. While ensemble methods like BDES inherently involve multiple models, we have taken specific steps to enhance interpretability and transparency:

Interpretability of Component Methods: Although BDES utilizes several algorithms, each component method—such as KNND-CS, simulated annealing, and Bayesian fusion—has a clear, well-defined role. We have ensured that the individual steps in the process are transparent, and the influence of each classifier selection is explainable. For example, KNND-CS dynamically selects base classifiers based on their relevance to the specific test sample, which can be traced and understood by clinicians.

Classifier Selection Process: The simulated annealing algorithm employed for classifier selection works by evaluating classifier performance in a way that can be monitored and interpreted, ensuring that the ensemble reflects a balance between accuracy and efficiency. Although the final ensemble model may appear complex, clinicians can review the chosen base classifiers and their contributions to the final decision.

It should be noted that the goal of BDES is not to replace clinical decision-making but to assist clinicians by providing them with a more accurate and data-driven diagnostic tool. Thus, the ensemble's decision process can be made more transparent by integrating clinician feedback in future iterations of the model, allowing practitioners to understand the reasoning behind each prediction.

(5) Limitations and Future Directions

In summary, although the Bayesian dynamic ensemble selection (BDES) method performs well in the diagnosis of benign and malignant breast tumors, there are still some limitations in practical applications. First, the KNND-CS and simulated annealing algorithms in the BDES method significantly increase the computational complexity when facing high-dimensional data sets, which may lead to performance bottlenecks. Second, BDES involves the selection of multiple hyperparameters, and the current empirical settings may affect the model stability and tuning efficiency. Finally, the interpretability of the model still needs to be further improved.

Future research will focus on several key directions. First, the research will explore dimensionality reduction techniques and distributed computing to address the challenges brought by high-dimensional data and large-scale data sets. And automated hyperparameter optimization will also be the core of future research. Secondly, new optimization algorithms will be studied to alleviate the computational complexity limitations of simulated annealing. At the same time, in order to improve the interpretability and transparency of the model, explainable artificial intelligence (XAI) methods will be combined to enhance clinicians’ understanding of the model's decision-making process. In addition, the research will also focus on multimodal data integration, especially the fusion of imaging, clinical and genetic data, to enhance diagnostic support and promote the real-time deployment and application of BDES models on edge computing and mobile platforms.

6 Conclusion

In this study, we proposed the BDES (Bayesian Dynamic Ensemble Selection) method to address the challenges of limited and imbalanced data in the DOBI dataset, aiming to enhance breast cancer diagnosis. BDES dynamically select the optimal combination of classifiers for each test sample based on KNND-CS (K-Nearest Neighbor Dynamic Classifier Selection) and simulated annealing method, and fuse selected classifiers’ prediction based on Bayesian probability function to obtain accurate diagnosis results of breast tumors. Experiments shown that the accuracy and sensitivity of the BDES method on the mini-DDSM dataset are 76% and 83%, and on the DOBI dataset are 83% and 78%. These results are significantly better than those of many comparative methods, underscoring the high sensitivity of BDES in identifying malignant tumors. In addition, cross-validation and ablation experiments further validated the generalization and robustness of the proposed BDES method and the contributions of its various components.

This research contributes valuable insights into breast cancer diagnosis. By combining Bayesian probability fusion with an adaptive classifier dynamic selection mechanism, BDES achieved improved robustness and generalizability, and the simulated annealing in BDES ensures the model avoids local optima to improves overall performance. These advancements provide an accurate and rapid breast cancer diagnosis tool, which enhancing early breast cancer detection and improving patient outcomes.

However, the utilization of the simulated annealing method in BDES introduces computational complexity. Future research will focus on developing more efficient optimization algorithms to reduce computational overhead while maintaining or improving the model's performance.

Footnotes

Acknowledgments

This work was supported by the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under Grant No.LHZY24F05001. We thank the DOBI Medical Technology Co. for access to the data.

ORCID iDs

Xue Li

Yaoyao Li

Authors’ Contributions

Xue Li: Conceptualization, Investigation, Formal analysis, Methodology, Validation, Writing - original draft, Writing – review and editing. Pengyue Liu: Conceptualization, Investigation, Validation, Writing- original draft. Xiguo Yuan: Data curation, Project administration, Supervision, Writing – original draft, Writing – review and editing, visualization. Ruowen Rong: Data curation, Supervision, Formal analysis, and Visualization. Yaoyao Li: Funding acquisition, Validation, visualization. Rong Luan: Data curation, Validation, Visualization.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under Grant No. LHZY24F05001.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Statement

Data in this paper is unavailable to access or unsuitable to post, because the research data includes sensitive information.

References

Allugunti

V. R.

(2022). Breast cancer detection based on thermographic images using machine learning and deep learning algorithms[J]. International Journal of Engineering in Computer Science, 4(1), 49–56. https://doi.org/10.33545/26633582.2022.v4.i1a.68

Baccouche

Garcia-Zapirain

Elmaghraby

A. S.

(2022). An integrated framework for breast mass classification and diagnosis using stacked ensemble of residual neural networks[J]. Scientific Reports, 12(1), 12259. https://doi.org/10.1038/s41598-022-15632-6

Balasubramanian

J. B.

Boes

R. D.

Gopalakrishnan

(2020). A novel approach to modeling multifactorial diseases using ensemble Bayesian rule classifiers[J]. Journal of Biomedical Informatics, 107(1), 103455. https://doi.org/10.1016/j.jbi.2020.103455

Bashir

Qamar

Khan

F. H.

(2015). Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble[J]. Quality & Quantity, 49(5), 2061–2076. https://doi.org/10.1007/s11135-014-0090-z

Biau

Scornet

(2016). A random forest guided tour[J]. Test, 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7

Boateng

E. Y.

Abaye

D. A.

(2019). A review of the logistic regression model with emphasis on medical research[J]. Journal of Data Analysis and Information Processing, 7(04), 190. https://doi.org/10.4236/jdaip.2019.74012

Chen

Shachter

R. D.

Kurian

A. W.

, et al. (2017). Dynamic strategy for personalized medicine: An application to metastatic breast cancer[J]. Journal of Biomedical Informatics, 68(1), 50–57. https://doi.org/10.1016/j.jbi.2017.02.012

Chhikara

B. S.

Parang

(2023). Global cancer statistics 2022: The trends projection analysis[J]. Chemical Biology Letters, 10(1), 451–451. https://pubs.thesciencein.org/journal/index.php/cbl/article/view/451

Fournier

L. S.

, et al. (2008). Dynamic optical breast imaging: A novel technique to detect and characterize tumor vessels[J]. European Journal of Radiology, 69(1), 43–49. https://doi.org/10.1016/j.ejrad.2008.07.038

10.

Giaquinto

A. N.

Sung

Newman

L. A.

, et al. (2024). Breast cancer statistics 2024[J]. CA: a Cancer Journal for Clinicians, 74(6), 477–495. https://doi.org/10.3322/caac.21863

11.

Guo

Qin

, et al. (2018). Ultrasound imaging technologies for breast cancer detection and management: A review[J]. Ultrasound in Medicine & Biology, 44(1), 37–70. https://doi.org/10.1016/j.ultrasmedbio.2017.09.012

12.

Huang

Pleiss

, et al. (2017). Snapshot ensembles: Train 1, get m for free[J]. arXiv preprint arXiv:1704.00109.

13.

Huang

Cai

Pacheco

P. P.

, et al. (2018). Applications of support vector machine (SVM) learning in cancer genomics[J]. Cancer Genomics & Proteomics, 15(1), 41–51. https://doi.org/10.21873/cgp.20063

14.

Jasti

V. D. P.

Zamani

A. S.

Arumugam

, et al. (2022). Computational technique based on machine learning and image processing for medical image analysis of breast cancer diagnosis[J]. Security and Communication Networks, 2022(1), 1918379. https://doi.org/10.1155/2022/1918379

15.

Kim

Harper

McCormack

, et al. (2025). Global patterns and trends in breast cancer incidence and mortality across 185 countries[J]. Nature Medicine, 31(1), 1–9. https://doi.org/10.1038/s41591-025-03502-3

16.

A. H. R.

Sabourin

Britto

A. S.,

Jr (2008). From dynamic classifier selection to dynamic ensemble selection[J]. Pattern Recognition, 41(5), 1718–1731. https://doi.org/10.1016/j.patcog.2007.10.015

17.

Lekamlage

C. D.

Afzal

Westerberg

, et al. (2020). Mini-DDSM: Mammography-based automatic age estimation[C]. 2020 3rd International Conference on Digital Medicine and Image Processing. 1–6.

18.

Liu

Yang

, et al. (2021). A survey of convolutional neural networks: Analysis, applications, and prospects[J]. IEEE transactions on Neural Networks and Learning Systems, 33(12), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827

19.

Liu

M. Z.

Swintelski

Sun

, et al. (2022). Weakly supervised deep learning approach to breast MRI assessment[J]. Academic Radiology, 29(Supplement 1), S166–S172. https://doi.org/10.1016/j.acra.2021.03.032

20.

Mascarenhas

Agarwal

(2021). A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for Image Classification[C]. 2021 International conference on disruptive technologies for multi-disciplinary research and applications (CENTCON). IEEE, 1: 96–99.

21.

McGuire

Brown

J. A. L.

Malone

, et al. (2015). Effects of age on the detection and management of breast cancer[J]. Cancers, 7(2), 908–929. https://doi.org/10.3390/cancers7020815

22.

Michell

M. J.

Iqbal

Wasan

R. K.

, et al. (2012). A comparison of the accuracy of film-screen mammography, full-field digital mammography, and digital breast tomosynthesis[J]. Clinical Radiology, 67(10), 976–981. https://doi.org/10.1016/j.crad.2012.03.009

23.

Mojrian

Pinter

Joloudari

J. H.

, et al. (2020). Hybrid machine learning model of extreme learning machine radial basis function for breast cancer detection and diagnosis; a multilayer fuzzy expert system[C]. 2020 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, 1–7.

24.

Popescu

M. C.

Balas

V. E.

Perescu-Popescu

, et al. (2009). Multilayer perceptron and neural networks[J]. WSEAS Transactions on Circuits and Systems, 8(7), 579–588.

25.

Rathinam

Sasireka

Valarmathi

(2024). An adaptive fuzzy C-means segmentation and deep learning model for efficient mammogram classification using VGG-net[J]. Biomedical Signal Processing and Control, 88(Part B), 105617. https://doi.org/10.1016/j.bspc.2023.105617

26.

Sigrist

R. M. S.

Liau

El Kaffas

, et al. (2017). Ultrasound elastography: Review of techniques and clinical applications[J]. Theranostics, 7(5), 1303. https://doi.org/10.7150/thno.18650

27.

Singh

Van de Ven

Eising

, et al. (2023). Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images[C]. 2023 31st Irish Conference on Artificial Intelligence and Cognitive Science (AICS). IEEE, 1–8.

28.

Tan

(2019). Efficientnet: Rethinking model scaling for convolutional neural networks[C]. International conference on machine learning. PMLR, 6105–6114.

29.

Terreno

Castelli

D. D.

Viale

, et al. (2010). Challenges for molecular magnetic resonance imaging[J]. Chemical Reviews, 110(5), 3019–3042. https://doi.org/10.1021/cr100025t

30.

Tharwat

Gaber

Ibrahim

, et al. (2017). Linear discriminant analysis: A detailed tutorial[J]. AI communications, 30(2), 169–190. https://doi.org/10.3233/AIC-170729

31.

Van Houdt

Mosquera

Nápoles

(2020). A review on the long short-term memory model[J]. Artificial Intelligence Review, 53(8), 5929–5955. https://doi.org/10.1007/s10462-020-09838-1

32.

Vaquero

J. J.

Kinahan

(2015). Positron emission tomography: Current challenges and opportunities for technological advances in clinical and preclinical imaging systems[J]. Annual Review of Biomedical Engineering, 17(1), 385–414. https://doi.org/10.1146/annurev-bioeng-071114-040723

33.

Vergara

J. R.

Estévez

P. A.

(2014). A review of feature selection methods based on mutual information[J]. Neural Computing and Applications, 24(1), 175–186. https://doi.org/10.1007/s00521-013-1368-0

34.

Wang

Zheng

Yoon

S. W.

, et al. (2018). A support vector machine-based ensemble algorithm for breast cancer diagnosis[J]. European Journal of Operational Research, 267(2), 687–699. https://doi.org/10.1016/j.ejor.2017.12.001

35.

Xiong

Berkovsky

Romano

, et al. (2021). Prediction of anxiety disorders using a feature ensemble based Bayesian neural network[J]. Journal of Biomedical Informatics, 123(1), 103921. https://doi.org/10.1016/j.jbi.2021.103921

36.

Zhang

Sun

Han

T. X.

, et al. (2017). Residual networks of residual networks: Multilevel residual networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 28(6), 1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543

37.

Zheng

Zhou

, et al. (2023). Application of transfer learning and ensemble learning in image-level classification for breast histopathology[J]. Intelligent Medicine, 3(02), 115–128. https://doi.org/10.1016/j.imed.2022.05.004

Bayesian Dynamic Ensemble Selection Method for Dynamic Optical Breast Imaging Classification

Abstract

Keywords

1 Introduction

2 Method

3.1 Datasets

Table 1. Data Splitting of mini-DDSM and DOBI Datasets for Base Classifiers. Train Test Dataset Malignant Benign Malignant Benign mini-DDSM 1901 1879 815 804 DOBI 848 1413 254 412

4 Results

4.1 Comparison Experiment on mini-DDSM Dataset

(1) Discussion of BDES Performance

(2) Discussion on Prior Information

(3) Discussion on Computational Complexity

(4) Discussion on Transparency of Decision-Making Process

(5) Limitations and Future Directions

6 Conclusion

Footnotes

Acknowledgments

ORCID iDs

Authors’ Contributions

Funding

Declaration of Conflicting Interests

Data Statement

References

Table 1.
Data Splitting of mini-DDSM and DOBI Datasets for Base Classifiers.

Train Test

Dataset Malignant Benign Malignant Benign

mini-DDSM 1901 1879 815 804

DOBI 848 1413 254 412