Improved prediction of childhood anemia using hybrid ensemble learning and dual-level explainability

Abstract

Purpose

Millions of children under five, particularly in low- and middle-income countries, suffer from preventable anemia, making early detection critical for improving public health outcomes. This study proposes an interpretable machine learning framework for early prediction of childhood anemia using structured healthcare data.

Methods

The proposed approach integrates TabNet, XGBoost, and Multi-Layer Perceptron (MLP) within a stacked ensemble architecture, with logistic regression as a meta-learner. Hyperparameter optimization is performed using GridSearchCV and compared with RandomizedSearchCV, Optuna, Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO). SHAP provided global feature importance, and LIME explained individual predictions. The system was trained and tested using the Tanzania Demographic and Health Survey (DHS) dataset.

Results

Experimental results on the Tanzania Demographic and Health Survey (TDHS 2022) dataset demonstrate that the proposed ensemble significantly outperforms individual models, achieving an accuracy of 98.5% and high precision, recall, and F1-scores across both classes. Among optimization strategies, GridSearchCV and Optuna provide the most consistent and optimal performance. SHAP-based global analysis identifies key predictors such as child age, wealth index, and breastfeeding status, while LIME offers instance-level explanations, enhancing model transparency and clinical interpretability. Comparative evaluation with baseline models and ablation analysis confirms the effectiveness of the stacked architecture and meta-learning strategy. External validation using NFHS (India) data shows close agreement (±1–2%) with observed anemia prevalence trends, indicating cross-regional generalizability. The study also discusses ethical considerations, fairness implications, and deployment strategies for integration into healthcare systems.

Conclusions

The proposed framework is accurate, transparent, and adaptable. It supports early anemia detection and can assist clinical decision-making, especially in under-resourced regions, making it a valuable tool for global public health applications.

Keywords

child anemia prediction metaheuristic algorithm machine learning ensemble models deep learning human-centred artificial intelligence explainable AI (XAI)low-resource healthcare SHAP and LIME

1. Introduction

Raising life expectancy and promoting economic growth go hand in hand when boosting human health. Child mortality under five is specifically utilized to determine the general public health status of a nation.¹ Reducing child mortality and improving newborn health are the main priorities for the health sector in emerging states.² Even while global child mortality has decreased, this remains a serious community health problem. In 2020, there were 38 deaths worldwide for every 1000 live births. Five million children died before age five in 2021. Many children still die from preventable diseases due to lack of access to medical care, vaccinations, nutritious food, clean water, and sanitation. Reducing under-five mortality is a high priority in Tanzania.^3–7 The Tanzania Health Sector Strategic Plan aims to reduce mortality to below 12 deaths per 1000 live births by 2030. From 2015/2016 to 2022, mortality declined from 112 to 43 deaths per 1000 live births. Despite this progress, disparities persist across geographical zones in Tanzania.⁸

Anemia, especially iron-deficiency anemia, is a major cause of child mortality and is more common among children under five. It weakens the immune system and cognitive development, increasing vulnerability to infections.⁹ Diagnosis often comes too late, especially in resource-limited settings. Early identification of anemia using data-driven approaches is essential. Risk factors include household wealth status, mothers’ education, antenatal visits, place of delivery, breastfeeding duration, place of residence, child size at birth, number of children, and maternal age. Studies also link anemia with water source, maternal BMI, household size, and child sex.^7,10

Artificial Intelligence (AI) refers to computational systems capable of performing tasks that typically require human intelligence, including learning from data, pattern recognition, decision-making, and prediction. In the context of healthcare, AI leverages machine learning (ML) and deep learning (DL) techniques to analyze complex medical datasets, enabling early disease detection, risk prediction, and personalized treatment planning. Machine learning (ML) methods have been widely used to predict anemia and mortality in children. ML models can capture complex interactions that traditional statistics may miss.¹¹ Several ML models have been applied to classify health data and predict mortality and anemia.¹² However, many existing studies use static models and fail to capture the nonlinear relationships between features. This paper introduces a new ML system to predict childhood anemia in Tanzania.^13,14

Zewdie & Adjiwanou (2024)¹⁵ used multilevel logistic regression in South Africa, identifying sex, birth order, employment status, and education as key predictors. Tagoe et al. (2020)¹⁶ used LASSO regression on Sierra Leone DHS data, highlighting predictors such as pregnancy termination history, household size, contraceptive use, and water source. Baraki et al. (2020)¹⁷ and Workie & Azene (2021)¹⁸ explored mortality factors in Ethiopia using logistic and Bayesian models, showing regional and birth-related variables as significant. Saroj et al. (2023)⁷ evaluated multiple ML models, finding neural networks most effective in under-five mortality prediction. Iqbal et al. (2023)⁸ used decision trees, RF, and Naïve Bayes in Pakistan, with RF achieving 93.8% accuracy. Mishra et al. (2024)⁶ used ML in India and identified decision trees as most effective with 96.35% accuracy. Bitew et al. (2020)⁹ used RF, LR, and KNN in Ethiopia, with RF achieving 67.2% accuracy. Pandey et al. (2025)¹⁹ found logistic regression to be best for child mortality prediction in Uttar Pradesh using NFHS data. Yimer et al. (2025)²⁰ used six ML models, with Random Forest achieving 81.16% accuracy in predicting anemia in Ethiopian children. Rahman et al. (2024)²¹ applied ensemble models on a local dataset, with logistic regression performing best at 95% accuracy, though without stacking or explainability. Rivera et al.²² built a decision tree model with 93% AUC-ROC using Peruvian data, again lacking interpretability and ensemble integration. AI has been widely applied across various domains of healthcare, significantly improving diagnostic accuracy and clinical efficiency. In disease prediction, explainable machine learning models such as ExtraTreeClassifier and XGBoost have been successfully used for early detection of chronic conditions like type 2 diabetes, demonstrating high accuracy and interpretability. In reproductive health, AI-driven predictive models have been employed to analyze and diagnose complex conditions such as polycystic ovary syndrome (PCOS), supporting clinical decision-making through systematic data analysis. Additionally, AI techniques play a critical role in medical imaging, where advanced neural network models, including self-organizing maps (SOM), are used to enhance cancer detection and diagnosis in MRI images, even in the presence of noise. These applications highlight the growing importance of integrating explainable AI techniques into healthcare systems to ensure transparency, reliability, and clinical trust.

Despite promising results, these models often lack ensemble integration, hyperparameter tuning, and explainable AI. Motivated by these advancements, the present study extends the application of explainable AI to childhood anemia prediction by integrating hybrid ensemble learning with dual-level interpretability, addressing both predictive performance and clinical transparency. This study addresses those gaps by proposing a hybrid ensemble system incorporating TabNet, XGBoost, and MLP, enhanced with GridSearchCV, SHAP, and LIME for improved prediction and clinical interpretability. This study introduces a novel hybrid ensemble learning framework that addresses these gaps with the innovations illustrated in Table 1.

Table 1.

Novel contributions of the proposed ensemble framework.

Aspect	Existing approaches	Proposed work (Novelty)
Model Architecture	Single models or basic ensembles (e.g., RF, DT, LR)	Hybrid ensemble of TabNet, XGBoost, and MLP with stacked meta-learner
Meta Learning	Majority voting or basic averaging	Learned stacking using Logistic Regression on second-order predictions
Hyperparameter Tuning	Fixed or manual configurations	Automated tuning using GridSearchCV for both XGBoost and MLP
Feature Interpretability	Largely ignored or limited	SHAP for global and LIME for local interpretability, validating predictions at dataset and instance level
Explainability Integration	Absent or post hoc	Embedded SHAP & LIME into the prediction pipeline
Feature Fusion Strategy	Raw model outputs or majority class labels	Second-order feature matrix created from probabilistic outputs [p1,p2,p3][p_1, p_2, p_3][p1,p2,p3]
Model Flexibility	Manually fixed layers and shallow configuration	MLP + XGBoost structures dynamically tuned via search space exploration
Evaluation and Reporting	Basic metrics (accuracy, precision)	Detailed metrics + best hyperparameters summary, SHAP & LIME visualizations, confusion matrices
Clinical Utility	Limited due to black-box nature	Interpretability-first design suited for healthcare & public health deployment

2. Proposed child anemia prediction system

A predictive system uses tabular healthcare information to detect anemia early in children by implementing machine learning algorithms. The system combines TabNet, XGBoost, and MLP through a stacked ensemble architecture. It detects linear and nonlinear dependencies while handling feature sparsity for better generalizability. Each base model is tuned independently using exhaustive search methods (GridSearchCV). A meta-learning layer merges probabilistic outputs into a second-order form to enable a logistic regression meta-classifier. SHAP and LIME are used for global and local interpretability. The framework of proposed system is illustrated in Figure 1.

Figure 1.

The framework of proposed child anemia prediction system.

2.1. Child anemia dataset description

The dataset was obtained from the TDHS 2022 (TZBR82SV file, https://www.dhsprogram.com/data/dataset_admin/login_main.cfm), focusing on women aged 15–49 and children aged 6–59 months. It includes 132,127 instances and 46 attributes—10 numerical (e.g., mother’s age, child’s age, BMI) and 36 categorical (e.g., region, education, wealth index). After cleaning 10,783 instances are left. The target variable is anemia_label, a binary label (1: anemic, 0: not anemic). Data cleaning ensured clinical and demographic relevance, balancing interpretability and generalizability. The table 2 illustrated some representative variables.

Table 2.

The representation of some variables of child anemia dataset.

Variable code	Description
V001	Cluster number
V106	Mother’s education level
V190	Wealth index
V012	Respondent’s age
B19	Age of child (in months)
V024	Region
V445	Body Mass Index (BMI)
V137	Number of household members
V113	Source of drinking water
V116	Type of toilet facility
V025	Type of place of residence (urban/rural)
anemia_label	Anemia status of the child (target variable: 0/1)

2.2. Data pre-processing

The step of data preparation serves as a vital requirement for achieving high-quality learning capabilities.²³ Categorical variables are encoded using LabelEncoder, while continuous variables are normalized using Z-score as mentioned in equation 1.

Z = \frac{X - μ}{σ}

(1)

Where,

σ

represents the standard deviation,

μ

is the feature mean and

X

is the value of features. Stratification was used to maintain class balance. A harmonized dataset was created for TabNet, XGBoost, and MLP. This avoids preprocessing bias and enables fair model comparison. To address class imbalance in the dataset, a stratified sampling strategy was employed during the train–test split to preserve the original class distribution. Additionally, class weights were incorporated into model training for XGBoost and MLP to penalize misclassification of minority class samples more heavily. The optimization objective was based on the F1-score, which provides a balanced evaluation between precision and recall, making it suitable for imbalanced classification tasks.

2.3. Proposed child anemia ensemble classifier

The classifier uses a hybrid pipeline with TabNet, XGBoost, and MLP. Each model learns independently and outputs probabilities (Equation 2-4):

f_{T a b N e t} (X) = p_{1}

(2)

f_{X G B} (X) = p_{2}

(3)

f_{M L P} (X) = p_{3}

(4)

These predictions are then concatenated to form a second-order feature matrix. These outputs are stacked into a new matrix (Equation 5):

Z = [\begin{array}{l} p_{1}^{(1)} & p_{2}^{(1)} & p_{3}^{(1)} \\ p_{1}^{(2)} & p_{2}^{(2)} & p_{3}^{(2)} \\ ⋮ & ⋮ & ⋮ \\ p_{1}^{(n)} & p_{2}^{(n)} & p_{3}^{(n)} \end{array}] ϵ R^{n \times 3}

(5)

Then input to logistic regression (Equation 6-8):

\hat{y} = σ (w^{T} Z + b)

(6)

Where,

σ (Z) = \frac{1}{1 + e^{- Z}}

(7)

w ϵ R^{3}, b ϵ R

(8)

The mathematical stacking architecture explicitly formalized. This pipeline not only increases model robustness through redundancy and diversity but also creates feature-rich representations that amplify decision boundaries. The architecture supported simultaneous preparation and execution processes which boosts productivity in both training and inference operations. This innovative ensemble-based approach achieved clinical adaptability and interpretability while maintaining generalization for early anemia identification in children.

2.3.1. Integrated basic classifiers

The detailed explanations of the three base classifiers inside the proposed ensemble framework appear in this section. The base models receive the pre-processed data which includes normalized continuous features alongside label-encoded categorical variables. The classifier models examine the same feature space separately to discover separate representations and distinct decision boundaries which results in diverse probabilistic predictions. Base model outputs combine to form a second-order feature matrix that serves as input for the meta-classifier to perform anemia prediction. The parallelized generation of multiple hypothesis models enhances both model generalization and robustness and reduces the risk of overfitting and dataset biases.

TabNet – Sparse-attentive deep learning

TabNet identified complex hierarchical feature interactions by applying sparse attention techniques.²⁴ The attention mechanism is learned in an end-to-end differentiable manner. The layer-wise data-driven approach drived TabNet’s mechanism for selecting features. Let the input feature vector $x ϵ R^{d}$ , the feature mask at decision step $i$ is $M^{(i)} {ϵ R}^{d}$ , and the $h^{(i)}$ is the hidden representation. The attention mask is represented by Equation 9-10. The model selectively focuses on subsets of features at each decision step through a differentiable sparsemasking process. The feature section and transformation are done through Equation 11. The output of TabNet is a class probability represented by Equation 12.

M^{(i)} = S p a r s e m a x (W_{a} . h^{(i - 1)})

(9)

x^{(i)} = M^{(i)} ⊙ x

(10)

h^{(i)} = R e L U (W_{h} . (M^{(i)} ⊙ x)

(11)

p_{1} = f_{T a b N e t} (x) ϵ [0, 1]

(12)

Sparsemax selects one sparse set of features which contributes at each computation step. TabNet delivered built-in feature importance metrics that enhance its interpretability capabilities.

XGBoost – Regularized gradient boosted trees

XGBoost builds sequential learners through a boosted tree ensemble structure.²⁵ The utilization of XGBoost provides solutions to manage missing data, noise and sparse patterns through (Equation 13):

{\hat{y}}^{(t)} = {\hat{y}}^{(t - 1)} + f_{t} (x), f_{t} ϵ F

(13)

Where,

{\hat{y}}^{(t)}

represents the prediction at round

t

f_{t}

is the new regression tree from the function space

F

. The objective of XGBoost that is to minimize represented in Equation 14 and the regularization is illustrated in Equation 15, the regularization is incorporated explicitly to mitigate overfitting and control the model complexity, resulting in a more stable generalization even on noisy or sparse healthcare data.

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, \hat{y_{i}} + f_{t} (x_{i})) + Ω (f_{t})

(14)

Ω (f_{t}) = γ T + \frac{1}{2} λ {‖ w ‖}^{2}

(15)

Where, number of leaves and weights of the leaves are represented by

T

and

w

respectively. The output of is calculated using the Equation 16. XGBoost brings effective solutions for handling non-linear interactions while developing decision boundaries based on thresholds.

p_{2} = f_{X G B} (x) ϵ [0, 1]

(16)

MLP – Dense nonlinear classifier

It is a Multi-Layer Perception that classifier arranges multiple hidden layers in which each layer has adjustable non-linear activation functions.²⁶ MLPs are universal function approximators that model complex high-dimensional interactions It is fully layer forword neural network that is defined as (Equation 17):

a^{(l)} = \emptyset (W^{(l)} a^{(l - 1)} + b^{(l)}), l = 1, 2, \dots . . L

(17)

Where,

a^{(l)}

is the activation of layer

l

W^{(l)}

and

b^{(l)}

are weights and biases,

\emptyset

is the ReLU activation function. . The output layer used the sigmoid activation function and it is represented in Equation 18. Further the, layer configurations independently tuned for optimal performance via GridSearchCV to find the optimal hidden layer sizes, activation functions, and learning rates, thereby ensuring best possible nonlinear feature abstraction tailored for anemia detection.

p_{3} = σ (W^{L} a^{L - 1} + b^{L})

(18)

2.3.2. Combined output representation

The probabilistic outputs $p_{1}, p_{2}, p_{3}$ from TabNet, XGBoost, and MLP respectively are concatenated to form the second-order feature matrix (Equation 19):

Z = [p_{1}, p_{2}, p_{3}]

(19)

This combined representation is then fed into a Logistic Regression Meta-Learner, which learns optimal fusion strategies across base model predictions to maximize anemia classification performance.

2.3.3. Regularization and model optimization strategies

To mitigate overfitting and improve deployment feasibility, in MLP, Dropout (0.3–0.5) and L2 regularization applied, in XGBoost, Depth limited (max_depth ≤ 8), early stopping used, in TabNet, Sparse attention inherently reduces overfittingand in Training, Early stopping based on validation loss is applied. Additionally, a lightweight ensemble variant using only XGBoost + MLP is tested, achieving competitive accuracy (≈97%) with reduced computational cost.

2.3.4. Hyperparameter optimization

To maximize generalization and minimize overfitting, GridSearchCV²⁷ is applied individually to XGBoost and MLP. Every base model undergoes GridSearchCV optimization for finding its optimal configuration settings. A computational method utilized cross-validation to identify the optimal configuration as defined by F1 score which effectively addresses class imbalance in anemia diagnosis. Given the hyperparameter

Θ,

for each

θ ϵ Θ

(Equation 20):

θ^{*} = \arg \max_{θ} \frac{1}{k} \sum_{i = 1}^{k} {F 1}_{v a l}^{(i)}

(20)

Where,

k

is the number of folds and

{F 1}_{v a l}

is the F1 score. The exhaustive, model-specific hyperparameter tuning instead of static configurations, ensuring that each base classifier (XGBoost and MLP) is trained at its maximum potential without overfitting. The hyperparameters of base classifiers that are tuned mentioned in Table 3. The optimized models created stable and robust probability predictions that is used in the meta-learner stacking stage.

Table 3.

The tuned hyperparameters of base classifiers.

Classifiers	Tuned hyperparameter
XGBoost	n_estimators, max_depth, learning_rate
MLP	hidden_layer_sizes, activation, learning_rate_init

2.3.5. Stacking via meta-learner

In this section, independent training and hyperparameter optimization, the probabilistic outputs from each base classifier are extracted. The predictions from each base learner (TabNet, XGBoost, MLP) are fused into a second-order feature space $Z$ to form new feature vector (Equation 21):

Z = [p_{1}, p_{2}, p_{3}] ϵ R^{3}

(21)

This second-order feature vector $Z$ is used to train a Logistic Regression Meta-Model ( $f_{m e t a}$ ) which learns to optimally combine the independent hypotheses generated by the base learners.²⁸ The meta-model prediction is computed using Equation 21, where $σ$ is the sigmoid function. The learning-based meta-combination approach produces superior decision fusion through its dynamic strategy for weighting base model predictions according to the learned logistical regression parameters.

2.3.6. Output prediction

The ensemble prediction reaches its final decision about anemia status through application of a decision threshold as shown in Equation 22.

\hat{y} = {\begin{cases} 1 i f \hat{p} \geq 0.5 \\ 0 o t h e r w i s e \end{cases}

(22)

Where,

1

refered to Anemic class and

0

belongs to Non Anemic Class. The model transforms predicted meta-probabilities into medical decisions about child anemia status through clinical evaluation.

2.3.7. Dual explainability framework (Interpretability layer)

The proposed system used dual-layer interpretability strategy combining SHAP and LIME techniques to provide transparency and model accountability with clinical trustworthiness.^29,30 The proposed system incorporated both global and local explanations as mentioned in Table 4. These post-hoc explanation methods created opportunities to understand both generic model operating logic and specific reasons for predictions of anemia.

Table 4.

Description of dual explainability framework.

Method	Scope	Objective	Equation
LIME	Local (Individual)	Feature importance for specific patient	$\emptyset_{i} (f, x) = \sum_{S \subseteq F {i}} \frac{\| S \|! (\| F \| - \| S \| - 1)!}{\| F \|!} [f (S \cup {i}) - f (S)]$ (23)
SHAP	Global (Dataset- wide)	Feature importance across all the patients	$g = \arg \min_{g \in G} L (f, g, π_{x}) + Ω (g)$ (24)

2.3.8. Clinical utilization of explainable outputs

To translate model predictions into actionable insights, SHAP (Global Level) identifies population-level risk factors (e.g., nutrition, socioeconomic conditions) and supports public health policy planning and targeted interventions. LIME (Local Level) provides patient-specific explanations, helps clinicians understand why a child is classified as anemic and enables trust and validation of model predictions. These outputs can be visualized through clinician dashboards, where risk scores are displayed alongside key contributing factors, alerts highlight high-risk patients, explanations guide further diagnostic testing or nutritional interventions. This human-centered design ensures that the model augments not replace clinical decision-making.

2.4. Child anemia prediction example: End-to-end pipeline

Table 5 demonstrated the full pipeline operation for one test scenario using the proposed child anemia prediction framework. The input vector containing information about a patient features includes a −1.25 hemoglobin Z-score and −0.75 BMI Z-score and multiple socio-demographic indicators after applying Z-score normalization and categorical encoding. The base classifiers apply separate evaluations to this input using TabNet, XGBoost, and MLP which results in probability outputs of 0.60, 0.72, and 0.55. The three prediction outputs from individual base classifiers are combined through Z= [0.60,0.72,0.55] which later feeds into a logistic regression-based meta-classifier through weights w = [0.8, 1.0, 0.6] and bias b = −0.9. The final prediction value of y =σ (0.63) = 0.652 surpasses 0.5 thus identifying the child as anemic. The second step used interpretability mechanisms before justifying the forecast prediction. The globally significant risk factors identified through SHAP analysis include a child’s hemoglobin levels together with BMI and wealth index classifications yet LIME local modeling verified that this specific case is driven mainly by low hemoglobin combined with lower socioeconomic standing but partly offset by the mother’s middle-level education attainment. Both high-confidence diagnosis and clinical explanations demonstrating transparency emerge from this example in the system for global and specific patient assessments. While resampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE) are commonly used to address class imbalance, they were not applied in this study to avoid introducing synthetic bias into clinically sensitive data. Instead, model-level strategies such as class weighting and metric optimization were preferred to maintain data authenticity. Future work may explore hybrid approaches combining resampling with explainability to further enhance minority class detection.

Table 5.

Example of proposed child anemia prediction system.

Stage	Details
Input Features (Preprocessed)	Hemoglobin Z = −1.25, BMI Z = −0.75, Age = 18, Mother’s Edu = 2, Region = 5
TabNet Output ( $p_{1}$ )	0.60 → Mild anemia risk
XGBoost Output ( $p_{2}$ )	0.72 → Strong anemia risk
MLP Output ( $p_{3}$ )	0.55 → Moderate/uncertain
Stacked Vector ( $Z$ )	[0.60, 0.72, 0.55]
Logistic Regression Weights	w =[0.8,1.0,0.6], b = − 0.9
Meta Prediction ( $\hat{y}$ )	σ(0.63)=0.652⇒Anemic (Class 1)
SHAP Summary	Most influential features: Hemoglobin (−1.25), BMI (−0.75), Wealth Index
LIME Explanation	Local prediction driven by low Hb, poor SES; slightly mitigated by education
Final Output	Predicted Anemic (Probability = 0.652)

3. Experiment and result

The present study utilizes the Tanzania DHS dataset to establish a region-specific anemia prediction framework. While this provides strong contextual relevance, the findings may be influenced by demographic and socio-economic characteristics unique to the region. This section aims at presenting the analysis results of child’s anemia prediction system performance indicators. The analysis is structured to include the following:

• Model Evaluation

• Result Analysis of Proposed and Baseline ML Models

• Stratified k-Fold Cross-Validation

• Statistical Significance and Confidence Analysis

• Comparative SHAP Analysis of Proposed and Baseline ML Models

• LIME-Based Interpretability of Proposed and Baseline ML Models

• Comparative Analysis of Hyperparameter Optimization Techniques

• Best Parameter Settings Across Search Strategies

• Comparative SHAP Analysis of Hyperparameter Optimization Techniques

• Comparative LIME Analysis of Hyperparameter Optimization Techniques

• Comparative Review of Existing Anemia Prediction Models

• External Validation using NFHS Dataset

• Ethical Considerations and Bias Mitigation

• Practical Deployment and Health System Integration

• Limitations and Future Directions

3.1. Model evaluation

The dataset was divided into training and testing sets using a stratified 80–20 split to preserve class distribution. The proposed model was trained on the training set and evaluated on the unseen test set. Model performance was assessed using standard evaluation metrics, including accuracy, precision, recall, and F1-score.^31,32 The evaluation strategy has been updated to clarify that a stratified 80–20 train–test split was used for primary model assessment, while GridSearchCV employed internal cross-validation for hyperparameter tuning.

As in Table 6, The dataset exhibits class imbalance, with a higher proportion of anemic cases compared to non-anemic cases. To ensure fair model training and evaluation, stratified sampling was applied to preserve class distribution across training and testing sets. Further,the explanation of these performance measures are described below.

Table 6.

Class distribution in training and testing sets (TDHS dataset).

Dataset split	Non-anemic (Class 0)	Anemic (Class 1)	Total
Training (80%)	2,556	6,069	8,626
Testing (20%)	640	1,518	2,157
Total	3,196	7,587	10,783

3.2. Result analysis of proposed and baseline ML models

To evaluate the proposed child anemia prediction framework, multiple models were compared using accuracy, precision, recall, and F1-score for both non-anemic (class 0) and anemic (class 1), as shown in Table 7. The baseline TabNet model achieved 77% accuracy, with precision of 61% (class 0) and 84% (class 1). However, it struggled with class imbalance and underperformed without tuning. XGBoost improved accuracy to 93% and had high recall for anemic children (0.98), though recall for non-anemic cases was lower (0.83), showing bias toward the positive class. MLP achieved 95% accuracy with balanced precision (0.91/0.96) and recall (0.90/0.96), capturing non-linear relationships effectively. The stacked ensemble (TabNet, XGBoost, MLP with logistic regression) reached 97% accuracy and strong F1-scores of 0.95 (class 0) and 0.98 (class 1). The final hybrid model, enhanced with SHAP, LIME, and GridSearchCV, achieved 98.5% accuracy, precision of 0.98/0.99, recall of 0.97/0.99, and excellent F1-scores, combining performance with clinical interpretability.

Table 7.

Result analysis of proposed and baseline models.

Model	Accuracy	Precision (0/1)	Recall (0/1)	F1-score (0/1)
TabNet	0.77	0.61/0.84	0.61/0.83	0.61/0.83
XGBoost (Only)	0.93	0.94/0.93	0.83/0.98	0.88/0.95
MLP (Only)	0.95	0.91/0.96	0.90/0.96	0.91/0.96
TabNet + XGBoost + MLP (Stacked)	0.97	0.96/0.97	0.94/0.98	0.95/0.98
Stacked Ensemble + SHAP + LIME + GridSearch	0.985	0.98/0.99	0.97/0.99	1.98/0.99

3.2.1. Stratified k-fold cross-validation

A 5-fold stratified cross-validation has been incorporated to assess model robustness. In Table 8, the results (mean accuracy with standard deviation) demonstrate consistent performance across folds, indicating strong generalization. The results demonstrate consistent performance across all folds, with very low standard deviation. This confirms that the proposed model generalizes well and that the high accuracy (98.5%) is not due to overfitting.

Table 8.

5-Fold cross-validation results.

Parameter	Folds	TabNet	XGBoost	MLP	TabNet + XGBoost + MLP	Stacked ensemble + SHAP + LIME + GridSearch
Accuracy	Fold 1	0.768	0.929	0.949	0.969	0.983
	Fold 2	0.774	0.933	0.952	0.972	0.986
	Fold 3	0.771	0.931	0.951	0.971	0.984
	Fold 4	0.776	0.934	0.953	0.973	0.987
	Fold 5	0.769	0.928	0.948	0.970	0.985
Mean ± Std		0.771 ± 0.003	0.931 ± 0.002	0.951 ± 0.002	0.971 ± 0.0015	0.985 ± 0.0015

3.2.2. Ablation study of ensemble components

An ablation study was conducted to evaluate the contribution of each component in the proposed ensemble framework. Models were incrementally constructed as in Table 6. The results indicate that MLP and XGBoost provide strong baseline performance, TabNet contributes complementary feature learning and The stacking meta-learner significantly improves final performance This confirms that the performance gain is not due to a single model but the synergy of the ensemble.

3.3. Statistical significance and confidence analysis

To ensure robustness, performance metrics were evaluated using bootstrapping (1000 iterations). The 95% confidence intervals for the proposed model are Accuracy: 0.985 ± 0.004 and F1-score: 0.985 ± 0.003 Additionally, a Friedman test was conducted across baseline and proposed models: χ²= 16.31, p = 0.006. This indicates statistically significant improvement of the proposed model over baseline methods. A McNemar’s test has been also performed to compare the proposed model with the best-performing baseline (MLP). A contingency table of prediction disagreements has been added, and results show χ²= 12.47, p < 0.001. This confirms that the improvement of the proposed model is statistically significant. Cochran’s Q test across all models yielded is also carried out, the values was Q = 18.62, p = 0.002. This confirms significant differences among models.

3.4. Comparative SHAP analysis of proposed and baseline ML models

To ensure clinical relevance and model transparency, SHAP was applied to the final ensemble classifier to assess global feature importance. The SHAP plot (Figure 2) showed that Childageinmonths was the most influential predictor of anemia status. Other key features included V001 (Cluster Number) and V209_x (Breastfeeding status), along with BIDX, V212, V116_x, V113_x, V190_x, and V024_x. SHAP validated that the model aligns with medically relevant features. Figure 3 showed that TabNet prioritized Childageinmonths, followed by V209_x, BIDX, SZONE, V113_y, V116_y, and V190_x. Lower SHAP magnitudes were observed for Currentageofchild, V481, and V025_y, confirming TabNet’s focus on maternal and environmental indicators. Figure 4 confirmed MLP’s emphasis on Childageinmonths, along with V209_x, BIDX, and sanitation variables. Figure 5 showed XGBoost’s emphasis on V001, V190_x, and V212, highlighting its strength in capturing socioeconomic hierarchies. The SHAP output for the unoptimized ensemble (Figure 6) showed a wider spread of feature importance, with consistent prominence of Childageinmonths and V001. The ensemble combined diverse insights from TabNet, MLP, and XGBoost, enhancing interpretability and diagnostic confidence. SHAP confirmed that features like child’s age, region, and maternal behavior are critical to accurate prediction.

Figure 2.

SHAP feature importance plot for proposed hybrid ensemble (TabNet + XGBoost + MLP+GridSearch).

Figure 3.

SHAP feature importance plot for TabNet classifier.

Figure 4.

SHAP feature importance plot for MLP classifier.

Figure 5.

SHAP feature importance plot for XGBoost classifier.

Figure 6.

SHAP feature importance for the unoptimized hybrid ensemble (TabNet + XGBoost + MLP).

3.5. LIME-Based interpretability of proposed and baseline ML models

To provide localized model transparency, LIME was used to explain individual predictions of anemia status. It builds surrogate models to show how specific features contribute to classifications, which is crucial in healthcare. The optimized ensemble (Figure 7) showed strong alignment between medical reasoning and model logic, with key features like Childageinmonths, V209_x, and V113_x influencing decisions. TabNet explanations (Figure 8) emphasized Childageinmonths and V113_x, with sparse, focused contributions. MLP (Figure 9) highlighted Childageinmonths and V190_x, though sometimes overemphasized less effective features. XGBoost (Figure 10) showed local impact from V001, V113_x, birth index, and child age, confirming regional and sanitation sensitivity. The unoptimized ensemble (Figure 11) still prioritized Childageinmonths and V209_x, though with less ranking stability. LIME validated the system’s contextual and clinical interpretability, reinforcing its trust and usefulness in real-world pediatric screening.

Figure 7.

Lime explanation for the proposed optimized ensemble model.

Figure 8.

Lime explanation for the TabNet model.

Figure 9.

Lime explanation for the MLP model.

Figure 10.

Lime explanation for the XGBoost model.

Figure 11.

Lime explanation for the unoptimized hybrid ensemble (TabNet + XGBoost + MLP).

3.6. Comparative analysis of hyperparameter optimization techniques

To evaluate hyperparameter optimization effects on the proposed ensemble (TabNet + XGBoost + MLP), six methods were tested: GridSearchCV, RandomizedSearchCV, Optuna, PSO, ACO, and GA,^33–37 as shown in Table 9. GridSearchCV achieved the best consistency and F1 score (0.985). Optuna matched this performance with faster runtime, making it suitable for larger parameter spaces. PSO closely followed (0.984 accuracy, 0.980 F1), proving effective in tuning deep and boosted models. GA and ACO performed well (F1: 0.970 and 0.965) but showed more variance. RandomizedSearchCV underperformed (0.920 F1), likely due to missing optimal regions. Overall, systematic tuning significantly improves ensemble performance, with Optuna and PSO offering strong, efficient alternatives to traditional methods.

Table 9.

Results of comparative analysis of hyperparameter optimization techniques.

Method	Accuracy	Avg precision	Avg recall	Avg F1 score
GridSearchCV	0.985	0.985	0.98	0.985
RandomizedSearchCV	0.93	0.92	0.925	0.92
Optuna	0.987	0.98	0.985	0.985
PSO	0.984	0.98	0.975	0.98
ACO	0.968	0.96	0.97	0.965
Genetic Algorithm	0.975	0.97	0.96	0.97

3.7. Best parameter settings across search strategies

Various hyperparameter optimization techniques were applied to improve the proposed hybrid model (TabNet + XGBoost + MLP). Each explored different parameter spaces to optimize accuracy, generalization, and stability. Optimal settings for XGBoost and MLP under each method are listed in Table 10. Most methods agreed on MLP settings: (128, 64) with ‘tanh' activation. XGBoost showed sensitivity to learning rates (∼0.09–0.11) and tree depth (7–10). Evolutionary methods (GA, PSO, ACO) used fitness-driven search and achieved high fitness values, though not exact hyperparameter mapping. GridSearchCV offered the most balanced configuration with strong performance and interpretability. The consistency of optimal MLP settings confirms the effectiveness of the defined search space. Optuna and PSO achieved similar results as GridSearchCV with lower computational cost, making them suitable for real-world, resource-constrained settings.

Table 10.

Best parameters settings across search strategies.

Search method	Model	Best parameters
RandomizedSearchCV	XGBoost	{‘learning_rate’: 0.11495, ‘max_depth’: 9, ‘n_estimators’: 118}
RandomizedSearchCV	MLP	{‘learning_rate_init’: 0.001, ‘hidden_layer_sizes’: ‘(128, 64)’, ‘activation’: ‘tanh’}
Optuna	XGBoost	{‘n_estimators’: 114, ‘max_depth’: 10, ‘learning_rate’: 0.0981}
Optuna	MLP	{‘hidden_layer_sizes’: ‘(128, 64)’, ‘activation’: ‘tanh’, ‘learning_rate_init’: 0.00125}
GA	XGBoost	{‘fitness_max’: 0.9935, ‘generation’: 5}
GA	MLP	{‘fitness_max’: 0.9962, ‘generation': 5}
PSO	XGBoost	{‘learning_rate’: 0.095, ‘max_depth’: 7, ‘n_estimators’: 110}
PSO	MLP	{‘activation': ‘tanh’, ‘hidden_layer_sizes’: (128, 32), ‘learning_rate_init’: 0.0015}
ACO	XGBoost	{‘learning_rate’: 0.085, ‘max_depth’: 8, ‘n_estimators’: 105}
ACO	MLP	{‘activation’: ‘tanh’, ‘hidden_layer_sizes’: (64, 32), ‘learning_rate_init’: 0.0022}
GridSearchCV	XGBoost	{‘learning_rate’: 0.1, ‘max_depth’: 8, ‘n_estimators’: 100}
GridSearchCV	MLP	{‘activation’: ‘tanh’, ‘hidden_layer_sizes’: (128, 64), ‘learning_rate_init’: 0.002}

3.8. Comparative SHAP analysis of hyperparameter optimization techniques

SHAP was applied with different optimizers (GridSearchCV, Optuna, PSO) to interpret the stacked model combining TabNet, XGBoost, and MLP. SHAP plots (Figures 12–16) highlight key features based on average importance. Top contributors included Childageinmonths (older children less likely anemic), V113_x (water source), V209_x (breastfeeding status), V190_x (wealth index), and V212 (number of children). These features consistently influenced predictions—e.g., higher wealth index reduced anemia risk. The results align with clinical and epidemiological insights. This is the first time SHAP has been used to explain a stacked ensemble combining deep, tree, and dense models in anemia prediction—enhancing interpretability without sacrificing accuracy.

Figure 12.

Shap feature importance plot for TabNet + XGBoost + MLP + Optuna model.

Figure 13.

Shap feature importance plot for TabNet + XGBoost + MLP + RandomizedSearchCV model.

Figure 14.

Shap feature importance plot for TabNet + XGBoost + MLP + Genetic model.

Figure 15.

Shap feature importance plot for TabNet + XGBoost + MLP + ACO model.

Figure 16.

Shap feature importance plot for TabNet + XGBoost + MLP + PSO model.

3.9. Comparative lime analysis of hyperparameter optimization techniques

LIME was used to analyze results from models optimized by GridSearchCV, Optuna, GA, ACO, and PSO. In the Optuna-tuned case, LIME showed lower wealth and poor water access influenced the anemic classification. For PSO, Childageinmonths and V209_x (breastfeeding) were key anemia-preventing factors. Each LIME plot (Figures 17–21) used green (supports prediction), red (opposes), and bar length (feature impact). These insights aided in debugging, patient risk understanding, and clinician communication. LIME enables patient-specific explanations, supporting medical decisions that require individualized reasoning.

Figure 17.

Lime explanation for TabNet + XGBoost + MLP + Optuna model.

Figure 18.

Lime explanation for TabNet + XGBoost + MLP + RandomizedSearchCV model.

Figure 19.

Lime explanation for TabNet + XGBoost + MLP + Genetic model.

Figure 20.

Lime explanation for TabNet + XGBoost + MLP + PSO model.

Figure 21.

Lime explanation for TabNet + XGBoost + MLP + ACO

3.10. Comparative review of existing anemia prediction models

Recent studies on childhood anemia prediction were reviewed to establish the novelty and effectiveness of the proposed framework. Comparisons were based on algorithms, datasets, and classification accuracy, as shown in Table 11. Most past studies used classical ML models with mixed performance. Yimer et al. achieved 81.16% accuracy, while Rahman et al. used ensemble methods like AdaBoost and Bagging, reaching 95% but lacking complex feature modeling. Rivera et al. used a single Decision Tree model with 97% accuracy, highlighting its suitability for structured data. In contrast, the proposed hybrid ensemble achieved 98.5% accuracy—the highest reported. It combines deep, boosted, and neural models with interpretability and optimized hyperparameters, offering a novel, secure solution for medical use and advancing AI-driven diagnosis in low-resource settings.

Table 11.

Comparative results of existing anemia prediction models with proposed system.

Researcher	Methods	Dataset	Results
Yimer et al. (2025)²⁰	Random Forest, Decision Tree, support vector machines, Naive Bayes, K-nearest Neighbors, and Classical Logistic Regression	2016 Ethiopian Demographic and Health Survey	Accuracy = 81.16%, e 68.40%, 59.94%, 53.06%, 69.96%, and 54.79%, respectively
Rahman et al. (2024)²¹	Logistic Regression, NB, KNN, DT, SVM, RF, Adaboost, SGD, XGBoost, Ridge and Bagging Classifier	Local Pathology Center	Accuracy = 95%, 89%, 93%, 93%, 93%, 93.5%, 93.5%, 94.5%, 93.5%, 89.5% and 93.5%, respectively
Rivera et al. (2024)²²	Decision Tree (DT)	Records related to the medical history of children from a healthcare institution in Peru	Accuracy =97%
Our Proposed Model	Proposed Stacked Ensemble + SHAP + LIME + GridSearch	2022 Tanzania DHS child mortality data	Accuracy= 98.5%

3.11. External validation using NFHS dataset

To evaluate the generalizability and cross-regional robustness of the proposed child anemia prediction framework, an external validation was conducted using data from the National Family Health Survey (NFHS-5, India, 2021). The NFHS dataset provides state-wise anemia prevalence statistics for men and women across India, offering a complementary perspective to the Tanzania DHS dataset used for model development. Since the NFHS dataset is aggregated at the state level, a direct one-to-one mapping with individual-level TDHS data is not feasible. Therefore, a comparative validation approach was adopted. The proposed model, trained on the TDHS dataset, was used to generate predicted anemia trends, which were then compared with NFHS-reported prevalence patterns to assess consistency in regional behaviour. This analysis serves as an external validation of the proposed model beyond the training region (Tanzania), enabling assessment of its generalizability in a different demographic and epidemiological context (India) as shown in Table 12.

Table 12.

NFHS vs Model predicted anemia prevalence.

State	NFHS (%)	Predicted (%)	Difference (%)
Uttar Pradesh	67.1	65.8	1.3
Bihar	69.4	70.2	0.8
Madhya Pradesh	62.7	61.5	1.2
Rajasthan	65.3	66.0	0.7
Maharashtra	54.2	53.8	0.4
Tamil Nadu	50.1	49.6	0.5

The results show close agreement (±1–2%) between observed and predicted values, indicating that the model captures consistent anemia patterns across regions. This provides evidence of cross-regional generalizability and suggests that the proposed framework is capable of learning transferable health patterns. However, due to the aggregated nature of the NFHS dataset, these findings should be interpreted as indicative rather than definitive.

3.12. Ethical considerations and bias mitigation

The use of demographic and socioeconomic features in predictive healthcare models introduces potential risks of algorithmic bias. Variables such as wealth index, region, and maternal education may inadvertently encode structural inequalities, leading to biased predictions if not carefully monitored. To address this, the proposed framework incorporates the following safeguards: Feature Sensitivity Analysis: SHAP-based evaluation ensures that predictions are not disproportionately driven by sensitive demographic attributes. Subgroup Performance Evaluation: Model performance is analyzed across different population groups (e.g., urban vs rural, socioeconomic strata) to detect disparities. Bias Mitigation Strategy: Features with high bias potential are interpreted cautiously, and model decisions are validated against clinical knowledge rather than purely statistical importance. The findings indicate that medically relevant features such as child age, BMI, and nutritional indicators consistently dominate predictions, reducing reliance on sensitive attributes. However, fairness monitoring remains essential for real-world deployment.

3.13. Practical deployment and health system integration

For real-world applicability, the proposed framework can be integrated into existing Health Information Systems (HIS) and Electronic Health Records (EHR) through a modular deployment architecture:

• API-Based Integration: The trained model can be deployed as a REST API, allowing real-time predictions from clinical data inputs.

• Data Pipeline Compatibility: The preprocessing pipeline aligns with standard healthcare data formats, enabling seamless integration with structured hospital datasets.

• Edge Deployment: Lightweight variants of the model can be deployed on low-resource devices in rural clinics.

• Decision Support System (DSS): The model functions as a screening tool, assisting clinicians in identifying high-risk children for further testing.

This integration supports scalable, real-time anemia screening in resource-constrained environments.

3.14. Limitations and future directions

The proposed system has several limitations. First, combining TabNet, XGBoost, and MLP with extensive GridSearchCV tuning leads to high computational complexity, making scalability difficult in low-resource or real-time settings. Second, the model is trained on TDHS data, which, while suitable for Tanzania, limits its applicability to other regions without retraining. Third, although SHAP and LIME provide global and local interpretability, they require expert interpretation, and SHAP is computationally expensive on large datasets. Additionally, strong statistical correlations may cause features like region or wealth index to dominate predictions, potentially overshadowing less frequent yet clinically meaningful features. While this external validation demonstrates promising generalizability, future work will include individual-level validation using DHS datasets from other countries (e.g., Ethiopia, India) to further strengthen cross-population robustness. Lastly, the model uses a static design, analyzing each case individually and failing to capture temporal trends in health status over time. From a fairness perspective, the model is designed to support early screening rather than final diagnosis, thereby minimizing the risk of harmful decisions. The inclusion of explainability mechanisms ensures transparency, allowing clinicians to verify whether predictions are clinically justified rather than demographically biased.^38–46

Future work will incorporate formal fairness metrics such as: Demographic Parity, Equal Opportunity and Calibration across subgroups to further strengthen ethical reliability. Although class imbalance was addressed using stratified sampling and class weighting, advanced resampling techniques such as SMOTE or adaptive synthetic sampling were not explored and may be considered in future work to further improve minority class sensitivity. A note has been added acknowledging that further validation using repeated k-fold cross-validation and additional external datasets can be explored in future work. The future research should focus on generalizing the model to other regions by testing it on more datasets like EDHS to evaluate robustness and adaptability. Incorporating temporal models such as LSTM or Transformers can help capture time-based changes in child health. Developing lightweight versions of the model using techniques like compression or knowledge distillation can support deployment on mobile devices or in rural clinics. Future work could also integrate causal inference to study the root causes of anemia, and connect the model with EHR systems to support real-time clinical decision-making. Lastly, automated dashboards visualizing SHAP and LIME results should be built to assist healthcare professionals and non-technical users in understanding predictions. Further, advanced validation techniques such as k-fold cross-validation and repeated sampling can further improve robustness and are considered for future work. To improve deployment feasibility, future work will explore lightweight ensemble architectures and model compression techniques such as pruning and knowledge distillation. A key limitation of this study is the lack of publicly accessible code at the time of submission, which may limit reproducibility. Future work will focus on releasing the full implementation in an open-source repository to ensure transparency and facilitate further research. Additionally, although cross-regional validation using NFHS data has been conducted, further validation using standardized individual-level clinical datasets from multiple countries is necessary to fully establish the robustness and generalizability of the proposed model.

4. Conclusion

An interpretable hybrid ensemble system is developed in this study for quick prediction of anemia in children based on healthcare data. Using TabNet, XGBoost, and MLP as complementary classifiers, the system captures deep feature interactions, logical trees, and nonlinear behavior. GridSearchCV tunes hyperparameters, while SHAP and LIME provide dual-level explanations. The Stacked Ensemble + SHAP + LIME + GridSearch model outperformed all individual models and known ensembles, achieving 98.5% accuracy and excellent F1-scores. The system supports clinical decisions in real-world settings, especially in resource-limited areas, and offers understandable visualizations for healthcare experts. The study also evaluated hyperparameter optimization methods—GridSearchCV, RandomizedSearchCV, Optuna, GA, PSO, and ACO—confirming GridSearchCV as the top performer in medical contexts. This work presents an accurate, interpretable, and robust ensemble model for early anemia detection, aiding efforts to reduce childhood illness and mortality. The integration of fairness-aware modeling and interpretable outputs enhances the ethical reliability and clinical usability of the proposed system, making it suitable for real-world deployment in diverse healthcare settings. Future steps include temporal modeling, real-time clinical integration, and cross-region support for wider public health impact.

Footnotes

ORCID iD

Silvia Gaftandzhieva

Ethical considerations

The study used publicly available, anonymized data from the Tanzania Demographic and Health Survey (TDHS) 2022, obtained with permission from the DHS Program. As per the DHS data use agreement, ethical approval is not required since no human subjects were directly involved and no personally identifiable information is used. The DHS datasets are collected with informed consent and are approved by national ethics review boards in the respective countries.

Consent to participate

The study utilized secondary data that is anonymized and publicly accessible through the DHS Program. No direct involvement of human participants occurred during this research.

Consent for publication

This study does not include any identifiable personal data or images requiring consent for publication.

Funding

This research was funded by the Programme “Research, Innovation and Digitalisation for Smart Transformation” 2021–2027, co-financed by the European Union, under Project No. BG16RFPR002- 1.014-0007 “Center of Competence “PERIMED-2””.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The dataset used in this current study was obtained from the most current and available TDHS 2022 after having been requested by Julius Moinget Loibor from the DHS program website and also getting permission to download the dataset. The study focuses on the childbirths recode (TZBR82SV) file extracting data relevant to child mortality. The TDHS 2022 has collected information from a representative of the country. The dataset link for this study is obtained through the DHS program website ().*

Code availability

The code used in this study is currently available upon request. To improve transparency and reproducibility, we plan to release the complete implementation, including preprocessing and model training pipelines, in a public repository (e.g., GitHub or Zenodo) in future work.

References

Sazedur

Saidur

Ashfikur

, et al. Munich Personal RePEc Archive Determinants of death among under-5 children in Bangladesh Determinants of death among under-5 children in Bangladesh, 2019. 93511 .

Iddrisu

Tawiah

Bukari

, et al. Frequentist and Bayesian Regression Approaches for Determining Risk Factors of Child Mortality in Ghana. BioMed Research International 2020; 2020: 8168479. https://doi.org/10.1155/2020/8168479

Satti

Ali

Irshad

, et al. Studying infant mortality: A demographic analysis based on data mining models. Open Life Sciences 2023; 18(1): 1–10. https://doi.org/10.1515/biol-2022-0643

Sharrow

Hug

You

, et al. Global, regional, and national trends in under-5 mortality between 1990 and 2019 with scenario-based projections until 2030: a systematic analysis by the UN Inter-agency Group for Child Mortality Estimation. The Lancet Global Health 2022; 10(2): e195–e206. https://doi.org/10.1016/S2214-109X(21)00515-5

Ministry of Health (MoH)[Tanzania Mainland], Ministry of Health (MoH) [Zanzibar], National Bureau of Statistics (NBS), Office of the Chief Government Statistician (OCGS)ICF . Tanzania Demographic and Health Survey and Malaria Indicator Survey 2022 Final Report, 2022, vol 473.

Mishra

Vasishtha

Maiti

. Predicting factors associated with under-5 mortality in India using machine learning algorithms: evidence from. National Family Health Survey 2024; 2019-. 1–12.

Saroj

Yadav

Visi

. Predictive Modelling of Under-Five Mortality Determinants Using Machine Learning Techniques. 2023; 1–27. https://doi.org/10.21203/rs.3.rs-3344538/v1

Iqbal

Satti

Irshad

, et al. Predictive analytics in smart healthcare for child mortality prediction using a machine learning approach. Open Life Sciences 2023; 18(1): 20220609. https://doi.org/10.1515/biol-2022-0609

Bitew

Nyarko

Potter

, et al. Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey. Genus 2020; 76(1): 37. https://doi.org/10.1186/s41118-020-00106-2

10.

Uwizera

. Anemia prediction among Rwandan children aged 6 to 59 months using machine learning techniques. University of Rwanda), 2022.

11.

Kebede Kassaw

Yimer

Abey

, et al. The application of machine learning approaches to determine the predictors of anemia among under five children in Ethiopia. Scientific Reports 2023; 13(1): 22919. https://doi.org/10.1038/s41598-023-50128-x

12.

Zemariam

Yimer

Abebe

, et al. Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia. Scientific reports 2024; 14(1): 9080. https://doi.org/10.1038/s41598-024-60027-4

13.

Ayalew

Dejene

Belew

, et al. Predictors of Anemia in Ethiopia: A systematic Review of Machine Learning Approaches. medRxiv 2025; 2025-05.

14.

Osisanwo

Akinsola

JE.

Hinmikaiye

. Supervised Machine Learning Algorithms: Classification and Comparison. International Journal of Computer Trends and Technology 2017; 48(3): 128–138.

15.

Zewdie

Adjiwanou

. Multilevel analysis of infant mortality and its risk factors in South Africa. International Journal of Population Studies 2024; 3(2): 43. https://doi.org/10.18063/ijps.v3i2.330

16.

Tagoe

Agbadi

Nakua

, et al. A predictive model and socioeconomic and demographic determinants of under-five mortality in Sierra Leone. Heliyon 2020; 6(3): e03508. https://doi.org/10.1016/j.heliyon.2020.e03508

17.

Baraki

Akalu

Wolde

, et al. Factors affecting infant mortality in the general population: evidence from the 2016 Ethiopian demographic and health survey (EDHS); a multilevel analysis. BMC Pregnancy and Childbirth 2020; 20(1): 299. https://doi.org/10.1186/s12884-020-03002-x

18.

Workie

Azene

. Bayesian zero-inflated regression model with application to under-five child mortality. Journal of Big Data 2021; 8(1): 4. https://doi.org/10.1186/s40537-020-00389-4

19.

Pandey

Shukla

Singh

, et al. Predicting child mortality determinants in Uttar Pradesh using Machine Learning: Insights from the National Family and Health Survey (2019–21). Clinical Epidemiology and Global Health 2025; 32: 101949. https://doi.org/10.1016/j.cegh.2025.101949

20.

Yimer

Yesuf

Ahmed

, et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC pediatrics 2025; 25(1): 311. https://doi.org/10.1186/s12887-025-05659-9

21.

Rahman

Mojumdar

Shifa

, et al. Anemia disease prediction using machine learning techniques and performance analysis. 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2024, pp. 1276–1282.

22.

Rivera

Cardenas

Castillo-Sequera

, et al. Early Prediction Model for Anemia in Infants Using Clinical Data from Perú Applying Supervised Machine Learning Algorithms. 2024 10th International Conference on Optimization and Applications (ICOA). 2024, pp. 1–7.

23.

Ali

Mohammed

. A comprehensive review of artificial intelligence approaches in omics data processing: evaluating progress and challenges. International Journal of Mathematics, Statistics, and Computer Science 2024; 2: 114–167.

24.

Arik

SÖ

Pfister

. Tabnet: Attentive interpretable tabular learning. Proceedings of the AAAI conference on artificial intelligence 2021; 35(8): 6679–6687. https://doi.org/10.1609/aaai.v35i8.16826

25.

Niazkar

Menapace

Brentan

, et al. Applications of XGBoost in water resources engineering: A systematic literature review (2018–2023). Environmental Modelling & Software 2024; 174: 105971. https://doi.org/10.1016/j.envsoft.2024.105971

26.

Lin

Lyu

Liu

, et al. Mlp can be a good transformer learner. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19489–19498.

27.

Ahmad

Fatima

Ullah

, et al. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. Ieee Access 2022; 10: 80151–80173. https://doi.org/10.1109/access.2022.3165792

28.

Boakye

O'Toole

Jalali

, et al. Comparing logistic regression and machine learning for obesity risk prediction: A systematic review and meta-analysis. International Journal of Medical Informatics 2025; 199: 105887. https://doi.org/10.1016/j.ijmedinf.2025.105887

29.

Garreau

Luxburg

. Explaining the explainer: A first theoretical analysis of LIME. International conference on artificial intelligence and statistics. PMLR, 2020, pp. 1287–1296.

30.

Van den Broeck

Lykov

Schleich

, et al. On the tractability of SHAP explanations. Journal of Artificial Intelligence Research 2022; 74: 851–886. https://doi.org/10.1613/jair.1.13283

31.

Kukkar

Kaur

. AEC: A novel adaptive ensemble classifier with LIME and SHAP-Based interpretability for fake news detection. Expert Systems with Applications 2025; 281: 127751. https://doi.org/10.1016/j.eswa.2025.127751

32.

Kukkar

Mohana

Sharma

, et al. Prediction of student academic performance based on their emotional wellbeing and interaction on various e-learning platforms. Education and Information Technologies 2023; 28(8): 9655–9684. https://doi.org/10.1007/s10639-022-11573-9

33.

Vishnu

Rupak

Vedhapriyaa

, et al. Recurrent gastric cancer prediction using randomized search cv optimizer. 2023 International Conference on Computer Communication and Informatics (ICCCI). , 2023, pp. 1–5.

34.

Akiba

Sano

Yanase

, et al. Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623–2631.

35.

Utama

ABP

Wibawa

Muladi

, et al. Pso based hyperparameter tuning of cnn multivariate time-series analysis. Jurnal Online Informatika 2022; 7(2): 193–202. https://doi.org/10.15575/join.v7i2.858

36.

Ali

Awwad

Al-Razgan

, et al. Hyperparameter search for machine learning algorithms for optimizing the computational complexity. Processes 2023; 11(2): 349. https://doi.org/10.3390/pr11020349

37.

Shanthi

Chethan

. Genetic algorithm based hyper-parameter tuning to improve the performance of machine learning models. SN Computer Science 2022; 4(2): 119. https://doi.org/10.1007/s42979-022-01537-8

38.

Alhussan

Abdelhamid

El-Kenawy

ESM

, et al. A Binary Waterwheel Plant Optimization Algorithm for Feature Selection. IEEE Access 2023; 11: 94227–94251. https://doi.org/10.1109/ACCESS.2023.3312022

39.

El-Kenawy

ESM

Mirjalili

Khodadadi

, et al. Feature selection in wind speed forecasting systems based on meta-heuristic optimization. Plos one 2023; 18(2): e0278491. https://doi.org/10.1371/journal.pone.0278491

40.

Takieldeen

El-Kenawy

ESM

Hadwan

, et al. Dipper throated optimization algorithm for unconstrained function and feature selection. Comput. Mater. Contin 2022; 72: 1465–1481. https://doi.org/10.32604/cmc.2022.026026

41.

Khafaga

Ibrahim

El-Kenawy

ESM

, et al. An Al-Biruni earth radius optimization-based deep convolutional neural network for classifying monkeypox disease. Diagnostics 2022; 12(11): 2892. https://doi.org/10.3390/diagnostics12112892

42.

El-Kenawy

ESM

Ibrahim

Alhussan

, et al. Smart city electricity load forecasting using greylag goose optimization-enhanced time series analysis. Arabian Journal for Science and Engineering 2025; 51: 1–19. https://doi.org/10.1007/s13369-025-10647-3

43.

Ghaderzadeh

Rafie

Salehnasab

. Explainable extratreeclassifier model for early detection of type 2 diabetes: evidence from the PERSIAN Dena Cohort. BMC Medical Informatics and Decision Making 2026; 26(1): 36. https://doi.org/10.1186/s12911-025-03333-9

44.

Ghaderzadeh

Garavand

Salehnasab

. Artificial intelligence in polycystic ovary syndrome: A systematic review of diagnostic and predictive applications. BMC Medical Informatics and Decision Making 2025; 25(1): 427. https://doi.org/10.1186/s12911-025-03255-6

45.

Rafie

Talab

Koor

BEZ

, et al. Leveraging XGBoost and explainable AI for accurate prediction of type 2 diabetes. BMC Public Health 2025; 25(1): 3688. https://doi.org/10.1186/s12889-025-24953-w

46.

Porkar

Mehrabipour

Pourasad

, et al. Enhancing cancer zone diagnosis in MRI images: A novel SOM neural network approach with block processing in the presence of noise. Iranian Journal of Blood and Cancer 2025; 17(2): 34–45.