Prediction of dementia-related mortality in metabolic dysfunction-associated steatotic liver disease using survival machine-learning models

Abstract

Background

Metabolic dysfunction-associated steatotic liver disease (MASLD) is increasingly recognized as a multisystem condition with extrahepatic complications, including cognitive decline, Alzheimer's disease, and related dementias. However, conventional clinical tools have limited ability to predict dementia-related mortality in this population.

Objective

To develop and internally validate survival-based machine-learning models for predicting dementia-related mortality among adults with MASLD and identify key clinical predictors.

Methods

Adults with MASLD were identified from NHANES III with linked mortality data. Dementia-related mortality was defined using National Death Index codes for Alzheimer's disease and other dementias. Cox proportional hazards models, random survival forests, gradient-boosted survival models, and logistic regression were developed and compared with exploratory comparator model. Performance was evaluated using concordance indices, area under the receiver-operating-characteristic curve, Brier scores, and reclassification metrics relative to the FIB-4 score.

Results

Among 1774 adults with MASLD, 115 dementia-related deaths occurred during a median follow-up of 193 months. Overall discrimination was modest. Logistic regression showed the highest discrimination in 5-fold cross-validation, whereas the penalized Cox model achieved the highest bootstrap C-index. The gradient-boosted survival model demonstrated the greatest improvement in reclassification compared with FIB-4. Exploratory risk stratification classified all dementia-related deaths within the high-risk cohort, although these findings warrant cautious interpretation due to possible overfitting. Waist circumference, diabetes, body mass index, and age were the most influential predictors.

Conclusions

Survival-based models may improve identification of MASLD patients at elevated risk of dementia-related mortality. Metabolic and anthropometric factors appeared more informative than liver fibrosis scores, and warrant external validation in contemporary cohorts.

Keywords

Alzheimer's disease dementia-related mortality machine learning MASLD metabolic dysfunction-associated steatotic liver disease risk prediction survival analysis

Introduction

Metabolic dysfunction-associated steatotic liver disease (MASLD), a hepatic pathology with multisystemic effects, impacts a growing proportion of adults worldwide and is increasingly recognized.^1,2 MASLD's systemic metabolic and inflammatory milieu may also contribute to neurodegenerative outcomes, including dementia mortality.^3–5

Previous observational data have underscored this risk; in large-scale matched-cohort studies, patients with steatotic liver disease demonstrated significantly higher rates of dementia compared to reference individuals (adjusted hazard ratio [aHR], 1.38; 95% confidence interval [CI], 1.10 to 1.72), with particularly elevated risks observed in those with comorbid heart disease (aHR, 1.50; 95% CI, 1.08 to 2.05) or prior stroke (aHR, 2.60; 95% CI, 1.95 to 3.47).⁶ Despite these associations, identifying which specific patients are at the highest risk for dementia-related mortality remains a critical, yet unmet, clinical priority.

Despite the substantial long-term morbidity associated with dementia mortality in MASLD, current risk-stratification tools remain limited. Traditional prognostic instruments, such as the FIB-4 score, were designed for hepatic fibrosis assessment and offer little utility for predicting neurological endpoints.^7–9 As MASLD prevalence rises and its phenotypes diversify, there is an urgent need for more accurate, individualized prognostic approaches capable of identifying patients who may benefit from targeted surveillance or early preventive interventions.

Advances in survival machine-learning methods offer an opportunity to refine risk prediction for complex, low-frequency outcomes by leveraging high-dimensional clinical data. Such models may detect nonlinear interactions and latent risk patterns that conventional statistical approaches are unable to capture.^10,11 However, their application to dementia mortality in MASLD has not been previously evaluated at scale.

In this study, we aimed to determine whether modern survival learning methods can improve risk prediction for Alzheimer's disease and related dementia mortality among adults with MASLD and we aimed to identify the clinical features most strongly associated with neurological death in this population. To accomplish these aims, we developed and internally validated multiple survival machine-learning models to predict dementia mortality among adults with MASLD using nationally representative data. We further compared model performance, assessed clinical utility through decision-curve analysis, and examined reclassification metrics relative to FIB-4.

Methods

Study design, participants, and reporting

We conducted a retrospective cohort study using publicly available data from the National Health and Nutrition Examination Survey (NHANES) III, identifying adults with MASLD who had linked mortality follow-up. Dementia-related mortality was defined using the National Death Index cause-of-death codes corresponding to Alzheimer's disease and other dementias. Follow-up time was calculated from the NHANES examination date to the date of death or censoring. This manuscript is written per the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis-Artificial Intelligence (TRIPOD + AI).¹²

Predictors

Twenty-three clinical, metabolic, laboratory, and demographic features were included as candidate predictors. These included age, sex, race/ethnicity, body mass index (BMI), waist circumference, smoking status, physical activity relative to peers, metabolic markers (e.g., triglycerides, HDL cholesterol, fasting glucose), indicators of liver health (e.g., FIB-4 score, albumin, platelet count), and socioeconomic indicators (e.g., poverty-income ratio). Data preprocessing followed NHANES analytic guidelines.

Missingness was assessed for all candidate predictors before model development. The proportion of missing values was quantified for each variable; most predictors had low rates of missingness (<5%), although selected laboratory values (e.g., fasting glucose, triglycerides) had higher rates of missingness (up to approximately 10–15%) reflecting the fasting subsample design of NHANES III. The overall pattern of missingness was assessed and considered to be consistent with a missing-at-random (MAR) assumption, given that missingness was predominantly attributable to the survey sampling design rather than to unmeasured patient-level factors.

Participants with missing outcome data (mortality status) or missing follow-up time were excluded from the analytic cohort prior to any modeling. For predictor variables with missing values, simple imputation was performed within the training data during cross-validation to avoid information leakage between training and validation folds. Specifically, continuous variables were imputed using the median value computed from the training fold, and categorical variables were imputed using the most frequent category (mode) from the training fold. The same imputation parameters (i.e., training-fold medians and modes) were then applied to the corresponding held-out validation fold without recalculation. This procedure was repeated independently within each of the five cross-validation splits. For the final full-dataset refit models, imputation was performed once using median and mode values from the entire analytic cohort. Given the relatively low rates of missingness for most predictors and the exploratory nature of this study, this approach was considered a reasonable and reproducible strategy. Continuous predictors were standardized (zero mean, unit variance) where required for regression-based models (i.e., penalized Cox proportional hazards and logistic regression); tree-based models did not require standardization.

Outcome

The primary outcome was time to dementia-related death, defined by National Death Index code for dementia (i.e., cause of death code 7). Of note, secondary neurological mortality (i.e., dementia and stroke) was catalogued but was not modeled as a separate endpoint.

Model development and hyperparameter optimization

Four primary models were developed: (1) a Cox proportional hazards (CoxPH) model with both unpenalized and elastic-net-penalized (L1/L2) configurations; (2) a random survival forest (RSF); (3) a gradient-boosted survival model (GBS); and (4) a logistic regression model included as a baseline binary-outcome comparator. In addition, we evaluated exploratory comparator models, including a latent profile survival model inspired by mixed-membership approaches used in Alzheimer's disease research and a neural network–based survival model (DeepSurv architecture). Model selection was prespecified based on established utility in clinical prediction and the ability to accommodate nonlinear or high-dimensional relationships.

Cox proportional hazards models: The unpenalized CoxPH model was fitted using all 23 candidate predictors, with the Breslow method for handling ties in event times. The penalized CoxPH model used elastic-net regularization (scikit-survival or lifelines implementation) with the regularization strength (alpha) selected via cross-validated partial log-likelihood. A mixing parameter (L1 ratio) of 0.5 was used to balance variable selection (L1) and coefficient shrinkage (L2).

Random survival forest: The RSF was implemented using the scikit-survival library. Hyperparameters tuned included the number of trees (candidates: 100, 300, 500, 1000), maximum tree depth (candidates: 3, 5, 10), and minimum number of samples required at a terminal node (candidates: 5, 10, 20, 50). Log-rank splitting was used as the split criterion.

Gradient-boosted survival model: The GBS model was implemented using scikit-survival's GradientBoostingSurvivalAnalysis. Hyperparameters tuned included the number of boosting iterations (candidates: 100, 300, 500), learning rate (candidates: 0.01, 0.05, 0.1), maximum tree depth (candidates: 1, 2, 3, 5), and minimum samples per terminal node (candidates: 5, 10, 20). The loss function was set to the Cox partial-likelihood (coxph) loss.

Logistic regression: A binary logistic regression model was fit to predict the occurrence of dementia-related death (binary outcome), rather than time-to-event, and was included as a non-survival baseline comparator. L2 regularization was applied with the regularization parameter (C) selected via cross-validation.

Hyperparameter optimization procedure: For each machine-learning model, hyperparameter optimization was performed using an inner cross-validation loop (3-fold) nested within the outer 5-fold cross-validation used for performance evaluation. The combination of hyperparameters yielding the highest cross-validated concordance index (C-index) was selected for each model. In cases where C-index values were similar (within 0.005), the configuration with the lower Brier score (better calibration) was preferred. For regression-based models (CoxPH and logistic regression), penalization parameters were selected using cross-validated likelihood-based criteria where applicable.

Exploratory comparator models: The latent profile survival model used a two-class mixture specification, and the neural survival model used a single hidden layer with 64 nodes, dropout of 0.3, and the Adam optimizer with a learning rate of 0.001. These models are reported for completeness but were not the focus of the primary analysis.

Non-linearity

Tree-based models incorporated nonlinear effects without transformation. Global feature importance was extracted for machine-learning models.

Validation and calibration assessment

Discrimination: Model discrimination was evaluated using the Harrell concordance index (C-index) and time-dependent area under the receiver-operating-characteristic curve (AUC) at 10- and 15-year horizons. The C-index quantifies the probability that, for a randomly selected pair of participants, the individual with the higher predicted risk experienced the event first. Values above 0.5 indicate better-than-chance discrimination.

Calibration: Calibration was assessed using multiple complementary approaches including use of integrated Brier scores, risk-group calibration comparisons, and fixed-time survival probability comparisons where applicable. First, the integrated Brier score (IBS) was computed over the observed follow-up period, combining discrimination and calibration into a single measure of overall prediction error; lower values indicate better agreement between predicted survival probabilities and observed outcomes. Second, for each model, participants were grouped into risk deciles (or quintiles, when event counts were insufficient for decile-level assessment) based on predicted risk probabilities, and the mean predicted risk within each group was compared with the observed event proportion (i.e., a calibration-in-the-large assessment). A perfectly calibrated model would produce a 45-degree line when mean predicted probabilities are plotted against observed proportions. Third, where survival function estimates were available (CoxPH, RSF, GBS models), calibration was additionally examined by comparing predicted survival probabilities at fixed time horizons (e.g., 10 and 15 years) with Kaplan–Meier observed survival within risk strata. We note that formal calibration slope and intercept estimation was not performed owing to the low event count, which would have limited the precision of such estimates and affect the model stability. Accordingly, calibration findings are presented descriptively and should be interpreted with caution.

Cross-validation: Model performance was evaluated using stratified 5-fold cross-validation, with stratification based on the outcome (dementia-related death) to ensure approximately equal event rates across folds given the low prevalence of the outcome. Within each fold, imputation parameters and standardization parameters were derived exclusively from the training partition and applied to the validation partition to prevent information leakage. All performance metrics (C-index, time-dependent AUC, Brier score) were computed on the held-out validation fold, and the mean and standard deviation across the five folds are reported.

Bootstrap validation: To further evaluate model robustness and quantify optimism, we performed 200-iteration bootstrap internal validation. In each iteration, a bootstrap sample (with replacement) of equal size to the original cohort was drawn, the model was fit on the bootstrap sample, and performance was evaluated both on the bootstrap sample (apparent performance) and on the original full cohort (test performance). The optimism was estimated as the mean difference between bootstrap apparent and test performance across iterations, and the optimism-corrected C-index was obtained by subtracting the estimated optimism from the apparent performance on the full dataset. Bootstrap 95% confidence intervals for the C-index were derived using the percentile method.

Censoring assumptions and competing risks

All time-to-event analyses assumed non-informative right censoring, whereby the probability of being censored at any time point was assumed to be independent of the future risk of dementia-related mortality, conditional on the observed covariates. In practice, participants were censored under two scenarios: (1) survival to the end of available mortality follow-up without experiencing dementia-related death (administrative censoring), or (2) death from a non-dementia cause prior to the end of follow-up. Follow-up time was calculated as the interval (in months) from the date of the NHANES III examination to the date of death or the end of the mortality follow-up period (December 31, 2019, corresponding to the most recent NHANES III linked mortality file), whichever occurred first.

We recognize that death from non-dementia causes (e.g., cardiovascular disease, cancer) constitutes a competing risk that precludes the subsequent occurrence of dementia-related death. Under a cause-specific hazard framework, as employed in the present study, the cause-specific hazard for dementia-related mortality is estimated by treating deaths from other causes as censored observations. This approach estimates the instantaneous rate of dementia-related death among individuals who remain alive and at risk, but it does not account for the fact that individuals who die from competing causes are removed from the risk set. Consequently, the predicted probabilities from cause-specific models may overestimate the cumulative incidence of dementia-related death in the presence of substantial competing mortality, particularly over long follow-up periods.

A competing-risk framework using Fine–Gray subdistribution hazard models or cause-specific cumulative incidence functions would provide complementary inference by estimating the probability of dementia-related death while accounting for the competing risk of death from other causes. However, given the exploratory nature of this study and the low event count, formal competing-risk models were not implemented. The present findings should therefore be interpreted as cause-specific prediction of dementia-related mortality, and future studies should evaluate competing-risk approaches, particularly in older MASLD cohorts where non-dementia mortality is prevalent.

Because non-dementia deaths may act as competing events, the present findings should be interpreted as cause-specific prediction rather than cumulative incidence prediction. Future studies with larger event counts should evaluate competing-risk approaches to better characterize absolute dementia-related mortality risk.

Feature importance analysis

For models using tree-based methods, feature importance was estimated using gain-based metrics. We ranked predictors within each model to identify variables contributing most to discrimination.

Model training

After validation, each model was refitted on the full dataset. These training metrics are presented descriptively and not used for clinical inference.

Risk stratification

Patients were stratified into low-, intermediate-, and high-risk groups based on predicted risk percentiles. Kaplan–Meier curves were generated for each risk tier.

Clinical utility

Clinical net benefit was assessed using decision-curve analysis (DCA), and improvements in reclassification were quantified using the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), with FIB-4 serving as a clinical comparator.

Software and reproducibility

All analyses were conducted in Python (version 3.14). Survival models were implemented using the scikit-survival library (version 0.22) for random survival forests, gradient-boosted survival analysis, and penalized Cox models, and the lifelines library (version 0.27) for standard Cox proportional hazards models and Kaplan–Meier estimation. Logistic regression and cross-validation procedures used scikit-learn (version 1.3). Missing-data imputation used scikit-learn's Simple Imputer class with median and most frequent strategies. Decision-curve analysis was performed using the dcurves package. All random seeds were set to a fixed value (seed = 42) to support reproducibility. Final model-specific hyperparameters were selected through the cross-validation procedure described above.

Ethical and governance considerations

Data were de-identified and the study followed best-practice guidance for transparent reporting of AI-enabled prediction models.

Results

Baseline demographics

The study population included 1850 adults categorized into obese MASLD (n = 817) and non-obese MASLD (n = 1033). Compared with the obese MASLD group, participants with non-obese MASLD were older (mean [ ± SD] age, 60 ± 12 years versus 56 ± 13 years; p < 0.001) and more likely to be male (59% versus 46%; p < 0.001). BMI was markedly higher in the obese MASLD phenotype (mean, 35.1 ± 4.8 kg/m²) than in the non-obese phenotype (mean, 25.7 ± 2.9 kg/m²; p < 0.001).

Participants with non-obese MASLD were more likely to be current smokers (28% versus 23%) or former smokers (38% versus 34%) compared with those in the obese group (P < 0.001). Furthermore, physical activity levels were higher among non-obese participants, with 32% reporting they were more active than their peers, compared with only 22% of participants with obese MASLD (p < 0.001). Conversely, a larger proportion of the obese group reported being less active than their peers (34% versus 21%) (Table 1). Table 1 uses the broader MASLD phenotype cohort and the modeling cohort is restricted to MASLD with linked mortality follow-up.

Table 1.

Baseline characteristics of NHANES III adults by MASLD phenotype.

	Obese MASLDN = 817	Non-Obese MASLDN = 1033	p	SMD
Age (years), mean (SD)	56 (12.82)	60 (12.18)	<0.001	−0.3212
Sex, N (%)			<0.001	−0.2687
Male	372 (46%)	609 (59%)
Female	445 (54%)	424 (41%)
Race/Ethnicity			<0.001	0.2335
Non-Hispanic White	301 (37%)	477 (46%)
Non-Hispanic Black	219 (27%)	241 (23%)
Mexican American	271 (33%)	290 (28%)
Other	26 (3.2%)	25 (2.4%)
Region			<0.001	0.1081
Northeast	105 (13%)	136 (13%)
Midwest	156 (19%)	181 (18%)
South	358 (44%)	504 (49%)
West	198 (24%)	212 (21%)
Marital Status			<0.001	0.1595
Married/partnered	498 (61%)	698 (68%)
Widowed	120 (15%)	139 (13%)
Divorced/separated	123 (15%)	130 (13%)
Never married	74 (9.1%)	63 (6.1%)
Rural/ Urban			0.2	0.0655
Metro	353 (43%)	413 (40%)
Non-metro	464 (57%)	620 (60%)
Poverty-Income Ratio, N (%)			0.003	0.1406
<1 (Below poverty)	205 (25%)	199 (19%)
1-3	361 (44%)	461 (45%)
≥3	251 (31%)	373 (36%)		SMD
Body Mass Index (kg/m²), mean (SD)	35.1 (4.8)	25.7 (2.9)	<0.001	−0.3212
Smoking status, N (%)			<0.001	−0.2687
Never smoker	348 (43%)	353 (34%)
Former smoker	280 (34%)	393 (38%)
Current smoker	189 (23%)	287 (28%)		0.2335
Physical activity versus peers, N (%)			<0.001
About the same	351 (43%)	476 (47%)
Less active than peers	278 (34%)	207 (21%)
More active than peers	178 (22%)	324 (32%)
Club/organization membership (yes), N (%)	241 (29%)	346 (33%)	0.2	0.1081

There were 1774 adults with MASLD and linked mortality follow-up. Among them, 115 deaths were attributed to Alzheimer's disease and other dementias (6.48%). The median follow-up was 193 months. Baseline characteristics stratified by dementia-related mortality are presented in Table 2. Participants with subsequent dementia-related mortality were younger at baseline NHANES assessment than those without dementia-related mortality (56.1 ± 11.9 versus 58.9 ± 12.2 years; p = 0.019). They also had higher BMI and waist circumference. However, the standardized mean differences for most variables were small, indicating that the magnitude of baseline differences between groups was generally modest.

Table 2.

Baseline characteristics of MASLD patients with non-dementia death compared to participants with dementia deaths.

Characteristic	Overall (N = 1774)	No dementia (n = 1659)	Dementia death (n = 115)	SMD	p
Age, years	58.6956 ± 12.1757	58.8740 ± 12.1763	56.1217 ± 11.9238	−0.2263	0.0190
Female sex	0.4177	0.4141	0.4696	0.1119	0.2853
BMI, kg/m²	30.6103 ± 6.4605	30.4524 ± 6.3609	32.8980 ± 7.4288	0.3801	0.0002
Waist circumference, cm	104.9361 ± 13.8116	104.5851 ± 13.6135	110.0547 ± 15.6356	0.3978	0.0002
Diabetes mellitus	0.3636	0.3430	0.6609	0.6706	<0.0001
Hypertension	0.7210	0.7233	0.6870	−0.0798	0.4633
Dyslipidemia	0.2221	0.2224	0.2174	−0.0121	0.9924
Total cholesterol, mg/dL	222.0397 ± 51.4725	220.7188 ± 45.4057	241.1923 ± 105.7692	0.3992	0.0497
Triglycerides, mg/dL	194.8908 ± 153.4054	189.5889 ± 120.2691	271.7692 ± 392.8041	0.5397	0.0081
HDL cholesterol, mg/dL	47.5436 ± 14.6284	47.6516 ± 14.7799	45.9200 ± 12.2608	−0.1183	0.5672
Fasting glucose, mg/dL	123.7963 ± 60.3305	120.0908 ± 52.7737	178.0962 ± 116.1944	0.9880	<0.0001
AST, U/L	25.7268 ± 22.7704	25.2567 ± 18.9756	32.7600 ± 54.2880	0.3302	0.1108
ALT, U/L	19.8371 ± 24.8268	18.7380 ± 15.4130	36.2800 ± 78.9232	0.7163	0.0006
GGT, U/L	49.8176 ± 76.2983	49.8598 ± 77.5383	48.6667 ± 27.6598	−0.0156	0.9701
Platelet count, x10³/µL	279.7531 ± 78.3816	279.5411 ± 78.3160	283.0833 ± 81.0344	0.0451	0.8303
Albumin, g/dL	4.2045 ± 0.3704	4.2075 ± 0.3585	4.1600 ± 0.5252	−0.1281	0.5356
FIB-4 score	1.4154 ± 1.1549	1.4207 ± 1.1613	1.3355 ± 1.0736	−0.0737	0.7269
NFS score	−0.8333 ± 1.7011	−0.8344 ± 1.6783	−0.8174 ± 2.0512	0.0100	0.9622
High-risk FIB-4 (>2.67)	0.0924	0.0934	0.0783	−0.0542	0.7064
High-risk NFS (>0.676)	0.0366	0.0374	0.0261	−0.0644	0.7142
Follow-up, years	15.6493 ± 7.7177	15.6203 ± 7.7453	16.0681 ± 7.3275	0.0580	0.5475

HDL: high density lipoprotein; AST: aspartate transaminase; ALT: alanine transaminase; FIB-4: fibrosis index 4; NFS score: non-alcoholic fatty liver disease (NAFLD) fibrosis score; GGT: gamma glutamyl transferase; SMD: standardized mean difference.

Cross validation

In 5-fold cross-validation, discriminative performance was modest across models. Logistic regression showed the highest overall discrimination, with a C-index of 0.69 (±0.03) and an AUC of 0.69 (±0.04). Among the survival-based models, the random survival forest demonstrated the highest discriminative performance, whereas the gradient-boosted survival model showed the lowest Brier score. Overall, cross-validated performance differences between models were modest.

Bootstrap validation

The penalized Cox proportional-hazards model demonstrated the highest discriminative ability, achieving a C-index of 0.71 (95% confidence interval [CI], 0.63 to 0.77). The logistic regression model showed comparable performance, with a C-index of 0.67 (95% CI, 0.60 to 0.76). The random survival forest and gradient-boosted survival models yielded lower point estimates, with bootstrap C-indices of 0.65 (95% CI, 0.56 to 0.73) and 0.61 (95% CI, 0.52 to 0.70), respectively. Across all evaluated models, bootstrapped AUC values ranged from 0.59 to 0.66.

Final model

After internal validation, each model was refit on the full analytic cohort to generate final model outputs for feature importance, risk stratification, ROC visualization, and clinical utility analyses. These full-dataset refit outputs are reported separately from the cross-validation and bootstrap validation results.

Feature importance

Waist circumference was the most influential predictor (importance, 0.15), followed closely by the presence of diabetes (0.15) and BMI (0.14). Age also demonstrated a high predictive value (0.14) (Figure 1).

Figure 1.

Clinical features ranked by model importance.

Risk stratification

In exploratory risk stratification using the final refitted model, all observed dementia-related deaths were classified within the high-risk cohort (115 deaths among 592 participants; mortality rate, 19.4%).

Clinical utility

The gradient-boosted survival model had the largest NRI (1.71) and a substantial IDI (0.54) relative to the FIB-4 score. In decision-curve analysis, this model showed higher net benefit across the evaluated threshold probabilities compared with the other models and the treat-all strategy (Table 3). The apparent ROC curves from the final full-dataset refit models are shown in Supplemental Figure 1.

Table 3.

Reclassification metrics and discriminative improvement compared with FIB-4.

Model	NRI	IDI	NRI (Events)	NRI (Non-events)
Gradient Boosted Survival	1.71	0.54	0.91	0.80
Random Survival Forest	0.37	0.36	1.00	−0.63
Logistic Regression	0.03	0.18	1.00	−0.97
Cox PH (Penalized)	0.00	0.68	1.00	−1.00

Discussion

Summary of findings

In this longitudinal study of adults with MASLD, we identified distinct demographic and clinical profiles between obese and non-obese phenotypes. Internal validation showed modest discrimination across models, with some machine-learning approaches demonstrating favorable calibration and clinical utility. The gradient-boosted survival model demonstrated favorable apparent calibration and clinical utility in exploratory downstream analyses. Metabolic and anthropometric features, particularly waist circumference, diabetes, BMI, and age, emerged consistently as the most important predictors of dementia-related mortality risk among the NHANES III population with MASLD. Exploratory risk stratification using the final refitted gradient-boosted model classified all observed dementia-related deaths within the high-risk group. Decision-curve and reclassification analyses suggested potential clinical utility beyond FIB-4; however, these downstream findings should be interpreted cautiously in light of the modest internally validated discrimination, low event count, and potential overfitting.

Liver-brain axis

Our findings align with existing literature showing that metabolic and vascular derangements carry cognitive consequences and that composite models can capture additive risk beyond single biomarkers.¹³ Neuroinflammation has been proposed as one potential mechanism linking liver dysfunction and neurodegenerative outcomes, and is characterized by microglial activation and the in situ synthesis of proinflammatory cytokines such as TNF, IL-1β, and IL-6.¹⁴ In MASLD, proinflammatory cytokines may contribute to blood-brain barrier dysfunction allowing exposure to circulating metabolites such as ammonia, lactate, and manganese, which have been implicated in neuroinflammatory responses and neuronal injury.¹⁴ Specifically, elevated ammonia levels promote glutamine buildup in astrocytes, leading to cerebral edema and Alzheimer's type 2 astrocytic morphology.¹⁴ High brain lactate concentrations further correlate with the severity of clinical symptoms and the release of TNF and IL-6 from microglia.¹⁴ In addition, since the liver is the primary site for peripheral amyloid-β (Aβ) metabolism, hepatic dysfunction may lead to imbalanced Aβ clearance, contributing to Alzheimer's disease progression.¹⁴

Obesity

The identification of waist circumference, diabetes, and BMI as primary predictors reflects the importance of these metabolic drivers over liver-specific fibrosis markers like FIB-4. In obesity, expanded visceral fat depots secrete adipokines and pro-inflammatory cytokines promoting neuroinflammation.¹⁵ Insulin resistance, common in MASLD and type 2 diabetes, interferes with neuronal insulin signaling critical for synaptic plasticity and memory, amplifies amyloid-β deposition and tau hyperphosphorylation via IDE competition and GSK-3β dysregulation, and heightens oxidative stress.¹⁶

Implications

MASLD may contribute to neuroinflammatory and vascular pathways through hepatic steatosis, fibrosis, and systemic metabolic dysfunction. Although our predictive models cannot establish causal pathways, the prominence of metabolic and anthropometric predictors is consistent with a broader hypothesis linking visceral adiposity, insulin resistance, dyslipidemia, vascular injury, and neuroinflammatory processes with dementia-related mortality. While strategies targeting ammonia metabolism or systemic inflammation have been explored in related neurological or hepatic contexts, the present findings do not establish whether such interventions would reduce dementia-related mortality in MASLD. Instead, our results suggest that individualized risk stratification may help identify metabolically vulnerable patients who warrant further study in future mechanistic and interventional research.

Limitations and strengths

This study has several important limitations. First, its reliance on NHANES III restricts generalizability to contemporary MASLD populations. Second, the relatively small number of dementia-related deaths compared with the number of candidate predictors and evaluated models raises concerns regarding statistical power, model stability, and overfitting. The discrepancy between modest internally validated performance and substantially higher apparent performance in the final full-dataset refit models further supports cautious interpretation. The ROC curves from the final refit models represent apparent in-sample performance and may overestimate generalizable discrimination, particularly for flexible machine-learning approaches in the setting of a low event count. Accordingly, the cross-validation and bootstrap estimates should be considered the more conservative assessment of model performance. Although internal validation was performed using cross-validation and bootstrap resampling, these procedures cannot fully overcome the limitations imposed by a low event count. The findings should therefore be interpreted as exploratory and hypothesis-generating, and external validation in larger cohorts with more dementia-related mortality events is needed before clinical implementation. Also, notwithstanding that fact that the study assessed for missingness in all candidate predictors before model development, and excluded participants with missing data, there is still a likelihood of missing data due to the retrospective nature the study. In addition, the analysis was based on existing NHANES III and mortality data, limiting causal inferences. The associations between MASLD-related factors and dementia mortality cannot establish causation. Also, models were developed using variables available in NHANES III. Emerging biomarkers, neurocognitive testing, imaging markers, and genetic data were not available and could potentially improve predictive performance.

The observation that participants with subsequent dementia-related mortality were younger at baseline than those without dementia-related mortality should also be interpreted cautiously. This comparison reflects age at NHANES assessment rather than age at dementia onset or age at death, and therefore does not imply that dementia-related mortality occurred at younger ages. One possible explanation is that participants who later experienced dementia-related mortality had a higher burden of adverse metabolic features at baseline, including diabetes, higher BMI, greater waist circumference, higher fasting glucose, and higher triglyceride levels. Prior studies have linked midlife metabolic dysfunction, obesity, and diabetes with later dementia risk, supporting the possibility that earlier metabolic risk burden may contribute to subsequent neurodegenerative vulnerability.^17–19 However, given the small number of dementia-related deaths and the potential influence of competing mortality and selective survival, this finding should be considered exploratory. Also, although NHANES is nationally representative of the United States, findings may not be generalized to non-U.S. populations or healthcare systems with different demographic and metabolic risk profiles.

Notwithstanding the aforementioned limitations, the study has several notable strengths. It leverages a large, nationally representative U.S. cohort with long-term mortality follow-up, enabling examination of dementia mortality over nearly two decades. The inclusion of multiple survival-modeling frameworks offers methodological triangulation and provides consistent identification of metabolic predictors. Internal validation through both 5-fold cross-validation and 200-iteration bootstrapping helped mitigate overfitting concerns. The study also evaluated clinical utility using decision-curve analysis and systematically quantifies reclassification metrics offering insight into potential improvements over existing clinical markers. Together, these elements provide a solid foundation for future work, particularly external validation in modern MASLD cohorts and refinement of model calibration and generalizability.

Conclusion

In this nationally representative MASLD cohort, survival-based machine-learning models demonstrated the potential to identify individuals at elevated risk for dementia-related mortality, with metabolic factors such as waist circumference, diabetes, BMI, and age emerging as the strongest predictors. Although internal validation supported moderate discriminative performance, the low event rate, reliance on historical NHANES III data, and evidence of overfitting underscore the need for cautious interpretation. The study nonetheless highlights the promise of modern survival-modeling approaches for dementia-risk estimation in metabolically vulnerable populations. Future research should focus on external validation in contemporary cohorts, improved calibration, and integration of neurological, genetic, and social determinants to develop clinically reliable prediction tools.

Supplemental Material

sj-docx-1-alr-10.1177_25424823261465039 - Supplemental material for Prediction of dementia-related mortality in metabolic dysfunction-associated steatotic liver disease using survival machine-learning models

Supplemental material, sj-docx-1-alr-10.1177_25424823261465039 for Prediction of dementia-related mortality in metabolic dysfunction-associated steatotic liver disease using survival machine-learning models by Basile Njei, Sarpong Boateng, Solomon Gyabaah, Guy Loic Nguefang Tchoukeu, Yazan Al-Ajlouni and Ulrick Sidney Kanmounye in Journal of Alzheimer's Disease Reports

Footnotes

Acknowledgements

No external assistance was sought or received for the design, execution, analysis, or reporting of this study.

ORCID iDs

Basile Njei

Solomon Gyabaah

Guy Loic Nguefang Tchoukeu

Yazan Al-Ajlouni

Ulrick Sidney Kanmounye

Ethical considerations

This study used publicly available, de-identified data from NHANES III with linked mortality follow-up. In accordance with applicable policies for publicly available de-identified datasets, formal institutional review board approval was not required for this secondary analysis.

Author contribution(s)

Basile Njei: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Software; Supervision; Validation; Writing – original draft; Writing – review & editing.

Sarpong Boateng: Data curation; Formal analysis; Investigation; Software; Writing – original draft; Writing – review & editing.

Solomon Gyabaah: Data curation; Investigation; Methodology; Project administration; Validation; Writing – original draft; Writing – review & editing.

Guy Loic Nguefang Tchoukeu: Investigation; Software; Validation; Writing – original draft; Writing – review & editing.

Yazan Al-Ajlouni: Data curation; Investigation; Methodology; Validation; Writing – original draft; Writing – review & editing.

Ulrick Sidney Kanmounye: Data curation; Investigation; Methodology; Supervision; Validation; Writing – original draft; Writing – review & editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data underlying this study are publicly available through NHANES, a nationally representative survey of the non-institutionalized U.S. civilian population conducted by the National Center for Health Statistics using a complex, multistage, stratified probability sampling design. NHANES data are available at: .

Supplemental material

Supplemental material for this article is available online.

References

Feng

Targher

Byrne

. Global burden of metabolic dysfunction-associated steatotic liver disease, 2010 to 2021. JHEP Reports 2025; 7: 101271.

Targher

Byrne

Tilg

. MASLD: a systemic metabolic disorder with cardiovascular and malignant complications. Gut 2024; 73: 691–702.

Basu

Mehta

Zhang

. Association of chronic liver disease with cognition and brain volumes in two randomized controlled trial populations. J Neurol Sci 2022; 434: 120117.

George

Sood

Daly

, et al. Is there an association between non-alcoholic fatty liver disease and cognitive function? A systematic review. BMC Geriatr 2022; 22: 47.

Weinstein

Zelber-Sagi

Preis

. Association of nonalcoholic fatty liver disease with lower brain volume in healthy middle-aged adults in the framingham study. JAMA Neurol 2018; 75: 97–104.

Shang

Widman

Hagström

. Nonalcoholic fatty liver disease and risk of dementia. Neurology 2022; 99: e574–ee82.

Sterling

Duarte-Rojo

Patel

. AASLD Practice guideline on imaging-based noninvasive liver disease assessment of hepatic fibrosis and steatosis. Hepatology 2025; 81: 672–724.

Meng

Zheng

Zhang

. Noninvasive evaluation of liver fibrosis using real-time tissue elastography and transient elastography (FibroScan). J Ultrasound Med 2015; 34: 403–410.

van Katwyk

Coyle

Cooper

. Transient elastography for the diagnosis of liver fibrosis: a systematic review of economic evaluations. Liver Int 2017; 37: 851–861.

10.

Wang

Reddy

. Machine learning for survival analysis: a survey. ACM Comput Surv 2019; 51: 1–36.

11.

Huang

, et al. Application of machine learning in predicting survival outcomes involving real-world data: a scoping review. BMC Med Res Methodol 2023; 23: 68.

12.

Collins

Moons

KGM

Dhiman

. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J 2024; 385: e078378.

13.

Gong

Harris

Peters

SAE

, et al. Serum lipid traits and the risk of dementia: a cohort study of 254,575 women and 214,891 men in the UK biobank. eClinicalMed 2022; 54: 101695.

14.

Yan

Man

Sun

. Gut liver brain axis in diseases: the implications for therapeutic interventions. Signal Transduct Target Ther 2023; 8: 43.

15.

Kiliaan

Arnoldussen

IAC

Gustafson

. Adipokines: a link between obesity and dementia? Lancet Neurol 2014; 13: 913–923.

16.

Wei

Koya

Reznik

. Insulin resistance exacerbates Alzheimer disease via multiple mechanisms. Front Neurosci 2021; 15: 687157.

17.

Zhu

Luo

, et al. Age at diagnosis of diabetes, obesity, and the risk of dementia among adult patients with type 2 diabetes. PLoS One 2024; 19: e0310964.

18.

Whitmer

Gunderson

Barrett-Connor

, et al. Obesity in middle age and future risk of dementia: a 27 year longitudinal population based study. Br Med J 2005; 330: 1360.

19.

Atti

A-R

Gatz

, et al. Midlife overweight and obesity increase late-life dementia risk: a population-based twin study. Neurology 2011; 76: 1568–1574.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.15 MB