Abstract
Background
Benzo[a]pyrene (BaP), a common environmental neurotoxicant, has been linked to neurodegenerative diseases, yet its role in Alzheimer's disease (AD) remains unclear.
Objective
This study integrated network toxicology, machine learning, single-cell transcriptomics, and bibliometric analysis to explore BaP's mechanistic role in AD.
Methods
A total of 253 BaP-AD common targets were identified and analyzed via GO/KEGG enrichment and PPI network construction. Key genes were screened using GEO-based AD differential expression data and machine learning (LASSO and SVM). Molecular docking assessed BaP–target interactions. Cell-type-specific expression was analyzed using single-cell RNA-seq (GSE157827). ROC curves evaluated diagnostic value, and bibliometrics explored research trends.
Results
Targets were enriched in oxidative stress and MAPK/PI3K-Akt pathways. EGFR and HSP90AB1 were identified as core targets, with strong BaP binding affinities (−8.4 and −11.7 kcal/mol, respectively). EGFR was highly expressed in astrocytes and OPCs; HSP90AB1 in astrocytes and neurons. EGFR had better diagnostic performance (AUC = 0.781). Bibliometric analysis showed increasing attention on EGFR's role in AD.
Conclusions
BaP may promote AD by targeting EGFR and HSP90AB1, affecting inflammation, proteostasis, and survival pathways. Notably, EGFR may serve emerge a promising biomarker for early diagnosis and therapeutic intervention in AD. This study revealed the underlying molecular mechanisms linking environmental toxicants to AD pathogenesis.
Keywords
Introduction
Alzheimer's disease (AD), the leading cause of dementia, is exhibiting a steadily rising incidence in parallel with global population aging. 1 As a major and escalating public health challenge, AD imposes a profound burden on both individuals and society. 2 An increasing number of studies have highlighted the role of modifiable risk factors in AD pathogenesis, including cardiovascular comorbidities, auditory impairment, infectious pathogens, and traumatic brain injury. 3 In addition to these established factors, emerging evidence has emphasized the potential role of environmental exposures in modulating AD risk. Notably, chronic exposure to air pollutants such as NO2 and PM2.5, has been increasingly linked to a heightened risk of AD.4,5 Conversely, sustained exposure to natural green environments has been proposed to confer a protective effect, potentially mitigating AD risk over the long term. 6
Benzo[a]pyrene (BaP), a neurotoxic polycyclic aromatic hydrocarbon and ubiquitous environmental pollutant, has been shown to disrupt neuronal function 7 and promote apoptosis. 8 Accumulating evidence also links BaP exposure to cognitive impairment. 9 Furthermore, animal studies have demonstrated that chronic low-level exposure to BaP can induce neurodegenerative-like phenotypes in zebrafish models. 10 Although emerging evidence suggests a potential link, it remains unclear whether BaP contributes to the development of AD, and the molecular mechanisms underlying its neurotoxic effects are yet to be clarified.
Network toxicology is an emerging interdisciplinary field that integrates computational tools with multi-omics biological data to elucidate the mechanistic links between environmental toxicants and human disease. 11 In this study, we employed a comprehensive systems-level technique—combining, machine learning, network toxicology, single-cell transcriptomics, molecular docking, and bibliometric analysis—to explore the potential role of BaP in the progress of AD. Our goal is to identify key molecular targets and pathways through which BaP may influence AD pathogenesis.
Methods
Acquisition of chemical component and targets of BaP
The SMILES structure of BaP was retrieved from the PubChem database. Toxicity profiles were subsequently predicted using the ProTox and ADMETlab platforms. To identify potential target genes of BaP, we queried three independent databases: PharmMapper, SEA, and ChEMBL. The resulting target gene sets were integrated using R software for downstream analyses.
Acquisition of AD-related targets
AD-related target genes were retrieved from the TTD, OMIM, and GeneCards databases (relevance score threshold >10). The gene lists obtained from the three sources were then aggregated and deduplicated using R software for subsequent analysis.
Establishment of the protein-protein interaction (PPI) network and screening of hub targets
A Venn diagram was constructed using R software to recognize overlapping targets between BaP and AD, yielding 253 shared genes. PPI analysis of these common targets was subsequently conducted utilizing the STRING database. The analysis parameters were set as follows: organism species was Homo sapiens, and the minimum required interaction score was defined as “medium confidence” (>0.7).
Interaction data from the STRING database were visualized and analyzed for network topology using Cytoscape (v3.9.1). The CytoHubba plugin was utilized to evaluate node importance according to the Maximal Clique Centrality (MCC) algorithm, and the top 10 hub genes were determined according to node degree. In addition, modular clustering of the PPI network was performed utilizing the MCODE plugin to detect highly interconnected sub-networks, which may represent functionally significant protein complexes.
The gene pathway enrichment analysis
To elucidate the potential biological functions and pathways related to the 253 shared target genes, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analyses were executed. Then, the results were visualized to highlight significantly enriched cellular components (CC), molecular functions (MF), biological processes (BP), and signaling pathways.
Expression analysis of key genes in the AD dataset in GEO
Publicly available gene expression datasets relevant to AD were retrieved from the GEO database using the search term “Alzheimer's disease”. Five datasets—GSE5281,12–15 GSE29378,16,17 GSE36980, 18 GSE37263, 19 and GSE138260 20 —were selected based on relevance and data quality. After downloading and pre-processing the raw data from the selected GEO datasets, normalization was performed. Differential expression analysis identified a total of 1111 differentially expressed genes (DEGs). Genes exhibiting p-values <0.05 were considered significant and differentially expressed. Subsequently, the 1111 differentially expressed genes were intersected with the 253 previously identified BaP–AD common target genes using R software, resulting in the identification of 21 overlapping core genes. The intersecting gene set was visualized using a venn diagram to facilitate identification of shared targets.
Core target selection by machine learning algorithms
To further identify key biomarkers with diagnostic potential among the 21 core genes, machine learning algorithms were employed for feature selection analysis. Specifically, two commonly used algorithms—support vector machine (SVM) and least absolute shrinkage and selection operator (LASSO) regression—were implemented in R software. Recursive Feature Elimination (RFE) was subsequently applied to refine feature selection and identify the optimal variable subset for maximizing model performance. 17 overlapping genes were identified by both methods and considered robust candidate biomarkers. LASSO regression was performed utilizing the “glmnet” R package 21 with the following parameters: normalized = TRUE, α = 1, family = “gaussian”, and nfolds = 3. SVM-RFE analysis was performed utilizing the “mlbench” and “caret” R packages.
Analysis of differential expression and diagnostic performance of 17 core genes
To further assess the diagnostic accuracy of the 17 identified core genes, receiver operating characteristic (ROC) curve analyses were performed using mRNA expression data from five pre-processed datasets containing both AD and normal control samples. To evaluate diagnostic accuracy, the area under the curve (AUC) was determined for each gene.
Molecular docking and single-cell level expression and distribution analysis of key genes
To identify the most critical targets, we intersected the top 10 hub genes with the 17 machine learning–derived core genes and obtained two overlapping key genes: EGFR and HSP90AB1. Subsequently, molecular docking studies were performed to explore the specific binding interactions between BaP and these two target proteins. The three-dimensional structures of EGFR and HSP90AB1 were retrieved from the Protein Data Bank, followed by docking simulations performed via the CB-Dock2 platform. This enabled structure optimization and estimation of binding affinities, providing insights into the potential binding mechanisms between BaP and the target proteins.
To characterize the cell-type-specific expression of key targets, snRNA-seq datasets for AD were sourced from the GEO database. The dataset GSE157827,22,23 generated using 10x Genomics technology, includes snRNA-seq profiles from 12 AD brain tissue samples and 9 normal controls (NC). Preprocessing and quality control of the data were conducted with the Seurat package in R. To identify statistically significant principal components, PCA was performed utilizing the JackStraw function. The Harmony package was used for batch effect correction, and clustering was conducted via the FindClusters function. Samples were grouped according to disease status and visualized using UMAP, followed by manual annotation of cell types. Finally, the expression patterns of EGFR and HSP90AB1 were examined and compared between AD and control groups to characterize their distribution across distinct cell populations.
Bibliometric analysis
Given that EGFR exhibited a higher AUC compared to HSP90AB1, it is considered to have greater diagnostic potential in AD. To gain deeper insights into current research directions on EGFR in AD, we carried out a bibliometric analysis. We retrieved relevant literature from the Web of Science Core Collection (WoSCC)24,25 on July 17, 2025, using a predefined search strategy as detailed below: [TS = (“Alzheimer Syndrome” OR “Alzheimer Disease” OR “Alzheimer Type Dementia (ATD)” OR “Alzheimer-Type Dementia (ATD)” OR “Alzheimer's Diseases” OR “Dementia, Alzheimer-Type (ATD)” OR “Alzheimers Diseases” OR “Alzheimer Diseases” OR “Dementia, Alzheimer” OR “Alzheimer's Disease” OR “Alzheimer Dementia” OR “Alzheimer Dementias” OR “Senile Dementia” OR “Dementia, Senile” OR “Alzheimer Type Dementia” OR “Senile Dementia, Alzheimer Type” OR “Dementia, Alzheimer Type” OR “Primary Senile Degenerative Dementia” OR “Alzheimer Type Senile Dementia” OR “Sclerosis, Alzheimer” OR “Alzheimer Sclerosis” OR “Dementia, Primary Senile Degenerative” OR “Senile Dementia, Acute Confusional” OR “Presenile Dementia” OR “Alzheimer Disease, Early Onset” OR “Dementia, Presenile” OR “Acute Confusional Senile Dementia” OR “Presenile Alzheimer Dementia” OR “Early Onset Alzheimer Disease” OR “Late Onset Alzheimer Disease” OR “Alzheimer Disease, Late Onset” OR “Alzheimer's Disease, Focal Onset” OR “Focal Onset Alzheimer's Disease” OR “Familial Alzheimer Disease (FAD)” OR “Alzheimer Disease, Familial (FAD)” OR “Familial Alzheimer Diseases (FAD)”) AND TS = (“ErbB Receptors” OR “Receptors, ErbB” OR “Receptor, Transforming Growth Factor alpha” OR “Receptor, Transforming-Growth Factor alpha” OR “Transforming Growth Factor alpha Receptor” OR “Urogastrone Receptor” OR “Receptor, Urogastrone” OR “Receptor, TGF-alpha” OR “TGF-alpha Receptor” OR “Receptor, TGF alpha” OR “ErbB Receptor” OR “Epidermal Growth Factor Receptor Kinase” OR “Epidermal Growth Factor Receptor Protein-Tyrosine Kinase” OR “Receptor, ErbB” OR “HER Family Receptors” OR “Epidermal Growth Factor Receptor Protein Tyrosine Kinase” OR “Receptors, HER Family” OR “Family Receptors, HER” OR “Family Receptor, HER” OR “HER Family Receptor” OR “EGF Receptors” OR “Receptor, HER Family” OR “Epidermal Growth Factor Receptor” OR “Receptors, EGF” OR “Receptors, Epidermal Growth Factor Urogastrone” OR “Receptors, Epidermal Growth Factor-Urogastrone” OR “EGF Receptor” OR “Receptor, EGF” OR “Epidermal Growth Factor Receptor Family Protein” OR “Epidermal Growth Factor Receptor Family Proteins” OR “Receptors, Epidermal Growth Factor” OR “Receptor, ErbB-1” OR “Receptor, Epidermal Growth Factor” OR “Receptor, ErbB 1” OR “ErbB-1 Receptor” OR “Receptor Tyrosine protein Kinase erbB 1” OR “Receptor Tyrosine-protein Kinase erbB-1” OR “c-ErbB-1 Protein, Proto-oncogene” OR “Proto-oncogene c-ErbB-1 Protein” OR “erbB-1 Proto-Oncogene Protein” OR “Proto oncogene c ErbB 1 Protein” OR “Proto-Oncogene Protein, erbB-1” OR “erbB 1 Proto Oncogene Protein” OR “c-erbB-1 Protein” OR “c erbB 1 Protein”)]. To ensure relevance and quality, the literature search was limited to English-language publications and limited to two document types: “Articles” and “Reviews”. A total of 124 eligible records were retrieved, including 98 original research articles (79.03%) and 26 review papers (20.97%). These records were downloaded from the WoSCC and subsequently analyzed using bibliometric and visualization software, including CiteSpace 26 and VOSviewer, 27 to explore research trends, hotspots, and collaboration networks in the field.
Results
Exploration to identify potential targets and biological effects of BaP in AD
A total of 473 BaP-related target genes were identified by integrating results from the PharmMapper, SEA, and ChEMBL databases (Figure 1A). In parallel, 3724 AD-associated genes were retrieved from the GeneCards, DisGeNET, and OMIM databases (Figure 1B). By intersecting these two gene sets, 253 overlapping genes potentially linking BaP and AD were identified for further analysis (Figure 1C).

Identification of potential targets and functional enrichment analysis of benzo[a]pyrene (BaP)-induced Alzheimer's disease (AD). (A) Venn diagram illustrating BaP-related target genes shared between ChEMBL, PharmMapper and SEA databases. (B) Venn diagram illustrating AD-related target genes shared between GeneCards, OMIM and TTD databases. (C) Venn diagram analysis identifying potential target genes associated with both BaP exposure and AD. (D) The network of potential target genes associated with both BaP exposure and AD. (E) Protein-protein interaction (PPI) network of BaP-AD-related targets constructed using STRING database. (F) The PPI network was further visualized and analyzed using Cytoscape 3.9.3. Nodes are colored and sized according to their degree values, with darker colors and larger circles indicating stronger interactions.
To visualize the potential interactions between the 253 overlapping targets of BaP and AD, a gene interaction network was constructed using Cytoscape (Figure 1D). Additionally, a PPI network was generated utilizing the STRING database, resulting in a network comprising 258 nodes and 1091 edges (Figure 1E). The topological characteristics of the PPI network were further analyzed using Cytoscape, and node importance was ranked based on degree centrality. The top 10 hub genes identified were TP53, EGFR, SRC, HSP90AA1, INS, TNF, HSP90AB1, PIK3R1, ALB, and GRB2 (Figure 1F).
GO enrichment analysis of the 253 overlapping genes revealed their involvement in multiple biological functions (Figure 2A-C). In the BP category, these genes were mainly associated with response to xenobiotic stimulus, intracellular receptor signaling pathway, response to oxygen levels, and response to peptide hormones. Regarding CC, enrichment was observed in the secretory granule lumen, vesicle lumen, membrane microdomain, membrane raft, and lysosomal lumen. Regarding MF, the genes were predominantly related to heat shock protein binding, protein tyrosine kinase activity, ligand-activated transcription factor activity, and nuclear receptor activity. Further KEGG pathway analysis demonstrated that these overlapping targets were remarkably enriched in pathways comprises the MAPK signaling pathway, lipid and atherosclerosis, Ras signaling pathway, T cell receptor signaling pathway, and PI3K-Akt signaling pathway, suggesting potential mechanisms through which BaP may influence AD pathogenesis (Figure 2D-F).

(A-C) Represent the GO enrichment analysis of BaP-AD targets. (D-F) Represented KEGG enrichment analysis of BaP-AD targets.
Differential analysis of GEO data and machine learning screening for target genes
Differential expression analysis was performed utilizing an AD gene expression dataset obtained from the GEO database, leading to the identification of 1111 DEGs. These DEGs were then intersected with the 253 previously identified overlapping targets between BaP and AD. This analysis yielded 21 shared core genes (Figure 3A, B).

(A) Venn diagram displaying the intersection of differentially expressed genes (DEGs) from AD datasets in the GEO database and the common target genes associated with both BaP and AD. (B) Sankey diagram of gene number interactions for DEG_geneList and IntersectionGenes. (C) Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression technique. The abscissa represents the number of model genes corresponding to different (λ) values, with 21 genes identified at the minimum λ value. (D) Core genes were identified using the Support Vector Machine (SVM)-Recursive Feature Elimination (RFE) algorithm, where 19 candidate genes were selected. (E) Diagnostic model differential analysis of 17 genes obtained by the intersection of two algorithms. (F) Diagnostic effectiveness of the 21 core targets in distinguishing normal and AD samples assessed using receiver operating characteristic (ROC) curves. Molecular docking analysis of BaP with 5 core target proteins.
To further identify the core target genes among the 21 BaP-induced AD-related candidates, we applied both SVM and LASSO logistic regression algorithms. The LASSO regression model selected 18 genes as potential diagnostic markers (Figure 3C), while the SVM algorithm identified 19 candidate genes (Figure 3D). By integrating the results of both methods, a total of 17 overlapping genes were ultimately determined as robust core targets associated with BaP-induced AD (Figure 3E). To evaluate the diagnostic efficacy of these 17 core targets, ROC curves were plotted and the area AUC was calculated. Among them, GABRG2, EGFR, PGF, EEF1A2, and PON2 showed relatively high diagnostic performance, with AUC values of 0.791, 0.781, 0.778, 0.776, and 0.761, respectively (Figure 3F). These results suggested that the identified core genes, particularly those with higher AUC values, may serve as promising biomarkers for the diagnosis and pathogenesis of AD.
Molecular docking
To further refine the identification of key diagnostic targets, we intersected the top 10 hub genes with the 17 previously identified core targets. This analysis revealed that EGFR and HSP90AB1 were the most central and overlapping genes. Considering its superior area under the curve (AUC = 0.781) relative to HSP90AB1 (AUC = 0.697), EGFR was prioritized as a more promising diagnostic biomarker for AD (Figure 4A-D). Subsequently, molecular docking analyses were performed to evaluate the binding interactions between BaP and these two targets. The results demonstrated that BaP exhibited strong binding affinities to both EGFR and HSP90AB1, with docking scores of −8.4 kcal/mol and −11.7 kcal/mol, respectively (Table 1) (Figure 4E, F). The results displayed that EGFR and HSP90AB1 play pivotal roles in mediating the neurotoxic effects of BaP in AD and may serve as valuable targets for elucidating underlying mechanisms and developing therapeutic interventions.

(A) Box line plot of the distribution of EGFR gene expression in the AD group and normal control group. (B) Box line plot of the distribution of HSP90AB1 gene expression in the AD group and normal control group. (C) ROC curve for EGFR gene diagnosis of AD. (D) ROC curve for HSP90AB1 gene diagnosis of AD. (E) Molecular docking outcomes showing the binding affinity of BaP with EGFR. (F) Molecular docking outcomes showing the binding affinity of BaP with HSP90AB1.
Molecular docking information of core target proteins with Benzo[a]pyrene (BaP).
Single-cell expression profiles of core target genes in AD tissues
Single-cell sequencing was performed on the GSE157827 10x dataset, and all samples were classified into AD and NC groups. A total of seven distinct cell populations were annotated, including endothelial cells, astrocytes, microglia, oligodendrocyte progenitor cells, excitatory neurons, inhibitory neurons, and oligodendrocytes.
In the UMAP clustering visualization, brain tissues from AD patients (Figure 5A) exhibited dispersed and heterogeneous cell clusters, with astrocyte populations showing a fragmented morphology and neuronal subtypes, particularly excitatory neurons, displaying increased spatial dispersion. In contrast, the NC samples (Figure 5B) demonstrated compact cell clustering, clear boundaries between cell types, and an orderly distribution pattern. UMAP-based gene expression mapping revealed that EGFR was markedly upregulated in astrocytes and oligodendrocyte progenitor cells within the AD group, whereas HSP90AB1 showed elevated expression in astrocytes and neurons (Figure 5C). In NC tissues, EGFR expression appeared more uniform, and HSP90AB1 expression was low and sparsely distributed. These findings suggested aberrant EGFR-dependent glial cell activation in AD pathology (Figure 5D). In parallel, upregulation of HSP90AB1 may reflect impaired protein homeostasis, potentially contributing to tau misfolding and other neurodegenerative processes. Quantitative analyses using violin and dot plots further confirmed these observations. In the AD group (Figure 5E, G), EGFR expression in astrocytes and OPCs was remarkably elevated compared to the NC group (Figure 5F, H), indicating that these glial populations may serve as key effectors in EGFR-driven AD progression. Similarly, HSP90AB1 expression was significantly higher in excitatory neurons and astrocytes in AD samples, supporting the notion that neuronal proteostasis disruption and glial activation act synergistically to drive AD pathogenesis.

Single-cell sequencing analysis. (A) The cell type annotation for AD brain tissue in the GSE157827 dataset shows seven identified cell types. (B) The cell type annotation for NC brain tissue in the GSE157827 dataset shows seven identified cell types. (C) Heterogeneous expression of the 2 target genes in different cell subtypes of AD brain tissue. (D) Heterogeneous expression of the 2 target genes in different cell subtypes of NC brain tissue. (E) Violin plots of the heterogeneous expression of the 2 target genes in different cellular subtypes of AD brain tissue. (F) Violin plots of the heterogeneous expression of the 2 target genes in different cellular subtypes of NC brain tissue. (G) Bubble plots of the heterogeneous expression of the 2 target genes in different cellular subtypes of AD brain tissue. (H) Bubble plots of the heterogeneous expression of the 2 target genes in different cellular subtypes of NC brain tissue.
Bibliometric analysis of EGFR and AD
Figure 6A illustrated that from 1997 to 2025, a total of 124 publications related to EGFR and AD have been indexed, with an overall upward trend observed. Specifically, the period before 2010 was characterized by a slow increase in publication volume, largely reflecting foundational studies focused on basic mechanisms. Post-2010, however, there was a marked acceleration in publication output, coinciding with the integration of emerging technologies such as multi-omics and network pharmacology, and a broadening of research perspectives. Figure 6B and 6C depicted the clustering of research hotspots. Core clusters include “#0 network pharmacology”, “#1 tumor suppressor”, “#2 epidermal growth factor”, “#4 blood-based protein biomarker”, and “#6 secretase-dependent processing”. These clusters are interconnected through themes of AD pathogenesis, diagnosis, and therapeutic strategies, collectively illustrating a translational research trajectory from basic science to clinical application. Figure 6D presented the keyword co-occurrence network. Central nodes include “Alzheimer's disease” and “amyloid precursor protein”, with major associations centering around “epidermal growth factor receptor”. Notably, the green cluster is centered on terms such as “expression”, “EGFR”, and “cancer”, while the blue cluster is enriched in keywords like “neuregulin-1” and “schizophrenia”, indicating interdisciplinary connections. Figure 6E showed the temporal evolution of keywords in this field. Earlier research focused on terms like “tyrosine phosphorylation” and “amyloid precursor protein”, whereas more recent studies emphasize keywords such as “biomarker” and “network pharmacology”, reflecting a shift from mechanistic investigation to translational and application-oriented research. Figure 6F highlighted the top 25 emerging keywords. Among them, “activation” and “cancer” are expected to continue their high growth momentum through 2025. The keyword with the highest burst intensity is “amyloid precursor protein”, indicating its sustained prominence in the field.

(A) The annual number and the cumulative number of publications. (B) The keyword topic cluster View by CiteSpace. (C) The timeline of keywords by CiteSpace. (D) The clustering of keywords by Vosviewer. (E) The time-overlapping visualization of keywords by Vosviewer. (F) The top 25 keywords with the highest outbreak intensity.
Discussion
BaP is a prevalent environmental pollutant with well-established carcinogenic properties and potent neurotoxic effects. 28 Accumulating evidence suggested a strong association between BaP exposure and an increased risk of neuropsychiatric disorders. Animal studies have demonstrated that BaP can induce anxiety-like behaviors in mice, likely mediated by BaP-induced alterations in brain metabolism. 29 Moreover, chronic BaP exposure has been reported to increase DNA methylation at the promoter region of the NR2B gene, resulting in downregulation of NR2B expression and contributing to impaired behavioral performance. 30 These results revealed BaP as a potent neurotoxicant and warrant further mechanistic exploration of its role in AD.
Chronic exposure to environmental metals and biotoxins produced by bacteria, molds, and viruses has been implicated in cognitive decline and may contribute to AD-related pathophysiological changes. 31 The underlying mechanisms are thought to involve toxin-induced apoptosis, neuroinflammation, and oxidative stress. For instance, elevated exposure to air pollutants including PM2.5, NO2/NOx, and CO has been implicated in a heightened risk of dementia. 32 Furthermore, increased levels of toxic elements—including vanadium, lead, cadmium, arsenic, and strontium—have shown positive correlations with cognitive impairment.33–35 These findings collectively underscored the significant role of environmental toxicants in the pathogenesis of AD.
We began by identifying 253 potential toxicological targets implicated in the synergistic interaction between BaP and AD, and subsequently constructed a PPI network to explore their interrelationships. Functional enrichment analysis showed that these targets were enriched in BP such as response to xenobiotic stimuli, localized to CC like the vesicle lumen, and involved in MF including nuclear receptor activity. These genes were involved in key signaling pathways, such as MAPK and PI3K-Akt, which are closely associated with the pathological mechanisms of BaP-induced AD. Subsequently, two key genes—EGFR and HSP90AB1—were identified through differential gene expression analysis using GEO datasets, combined with machine learning-based feature selection. Molecular docking analyses were performed to evaluate the binding affinities of BaP to these target proteins. Single-cell RNA sequencing was employed to elucidate the cellular distribution of EGFR and HSP90AB1 in brain tissues. ROC curve analysis demonstrated that EGFR exhibited superior diagnostic potential compared to HSP90AB1. Furthermore, bibliometric analysis of EGFR-related studies in AD revealed a temporal shift from basic mechanistic research toward translational and clinical applications. Taken together, this study employed a multifaceted and integrative approach to comprehensively investigate the molecular mechanisms by which BaP contributes to AD. Our findings highlight EGFR as a central target, with significant implications for the prevention, early diagnosis, and therapeutic intervention in AD.
EGFR and HSP90AB1 were identified as potential target genes for AD through our integrative multi-omics analysis. Notably, large-scale plasma proteomics studies have demonstrated that EGFR is a key protein dysregulated in the AD population. 36 Mechanistically, EGFR may contribute to AD pathogenesis by promoting aberrant peptide deposition, which in turn leads to sustained phosphorylation of downstream signaling pathways, ultimately resulting in excessive production of Aβ1−42 and hyperphosphorylation of tau proteins. 37 Proteomic analyses have revealed that HSP90AB1 is differentially expressed in the hippocampus of individuals with AD. 38 The regional and temporal distribution of chaperone proteins, including HSP90AB1, varies across different brain areas and disease stages, suggesting their potential as predictive biomarkers or therapeutic targets in AD. Dysregulation of HSP90AB1 may contribute to impaired protein clearance and increased accumulation of misfolded pathological proteins, both hallmarks of AD pathogenesis. 39 Furthermore, HSP90AB1 expression is strongly associated with astrocytes, implicating a role in glial-mediated neurodegenerative processes. 40
GO enrichment analysis disclosed that the overlapping targets of BaP and AD were remarkably enriched in biological processes such as “response to xenobiotic stimulus” and “reactive oxygen species metabolic process”. These findings were consistent with the classical toxicological mechanisms of BaP as an environmental pollutant, which include the induction of oxidative stress and activation of the aryl hydrocarbon receptor pathway. 41 In terms of molecular function, enrichment in categories such as “nuclear receptor activity” and “heat shock protein binding” may reflect disturbances in proteostasis, an established pathological hallmark of AD.42,43 This suggested that BaP may exacerbate AD progression by disrupting protein quality control mechanisms. KEGG pathway enrichment analysis further demonstrated that these shared targets are participated in several signaling cascades relevant to AD pathogenesis, including the MAPK, PI3K-Akt, and Ras pathways. The MAPK pathway serves as a central node in inflammatory signaling and is known to regulate neuronal apoptosis and synaptic plasticity.41,44 Likewise, dysregulation of the PI3K-Akt pathway has been directly linked to impaired Aβ clearance and aberrant tau phosphorylation.45,46 These findings collectively suggested that BaP may contribute to AD neurodegeneration through a dual mechanism of inflammatory amplification and survival pathway inhibition.
Single-cell transcriptomic analysis revealed distinct cellular alterations in AD brain tissue compared with NC. AD samples exhibited disorganized cellular clustering, indistinct cell boundaries, and particularly fragmented astrocytic morphology. Additionally, there were increased spatial dispersion among neuronal subtypes, including excitatory neurons. These features were consistent with the hallmark pathological processes of AD, characterized by glial cell activation and disintegration of neuronal networks.47,48 These findings suggested that BaP exposure may aggravate the dysregulation of the brain microenvironment by disrupting intercellular communication. Single-cell gene expression mapping further demonstrated that EGFR was markedly upregulated in astrocytes and oligodendrocyte precursor cells, while HSP90AB1 was highly expressed in astrocytes and neurons in AD brain tissue. In contrast, both genes were expressed at uniformly low levels in the NC group. These expression patterns provided direct support for our previous findings from molecular docking and pathway enrichment analyses, highlighting EGFR-mediated glial activation and HSP90AB1-associated proteostasis imbalance as key cellular events in BaP-induced AD pathology.
EGFR exhibited greater diagnostic potential than HSP90AB1, as indicated by its higher AUC value. To further investigate the relevance and translational potential of EGFR in AD, we performed a bibliometric analysis to systematically map research trends and hotspots in the field of AD and EGFR. The number of publications in this area has shown a marked increase since 2010, accompanied by a growing emphasis on multi-omics approaches and translational medicine. Keyword clustering revealed several emerging research focuses, such as “network pharmacology” and “blood-based protein biomarker”, which directly align with the mechanistic and diagnostic framework of this study. Additionally, frequent keywords including “activation” and “biomarker” reflect a notable shift in the research landscape from investigating EGFR's basic molecular functions toward its clinical applications. These trends underscored that the role of EGFR in our study is not limited to fundamental mechanisms but extends to practical diagnostic utility. Furthermore, the continued burst of keywords such as “activation” and “cancer” projected through 2025 suggests sustained interest in the intersection between cancer and AD. This reinforces the hypothesis that EGFR may serve as a key molecular link between the two diseases, highlighting its broader relevance and potential as a therapeutic and diagnostic target.
This study employed an integrated approach combining molecular docking, network toxicology, machine learning, single-cell transcriptomic analysis, and bibliometric methods to comprehensively elucidate the molecular mechanisms through which BaP may contribute to the development of AD. These multi-dimensional strategies provided novel insights into BaP-induced AD pathogenesis from both a systems-level and cellular-level perspective. However, several limitations should be acknowledged. First, the study lacked experimental validation through in vitro or in vivo models, which restricts the biological interpretation and translatability of our findings. Second, the mechanistic associations identified were correlative in nature; no interventional experiments were conducted to establish a causal relationship between the candidate targets and AD pathology. Third, although EGFR emerged as a promising diagnostic marker, its evaluation was based on a relatively small sample size and lacked validation in multi-center, ethnically diverse cohorts. Moreover, its diagnostic performance has not been systematically compared with established AD biomarkers, leaving its clinical applicability to be further clarified. Future research should address these gaps by expanding clinical cohorts, performing rigorous in vivo and in vitro experiments, and conducting causal inference studies. These efforts will be essential for fully elucidating the mechanistic link between BaP exposure and AD, as well as for advancing candidate targets such as EGFR toward clinical translation.
Conclusion
In this study, we systematically investigated the relationship between BaP exposure and AD pathogenesis by integrating multiple complementary approaches, including network toxicology, molecular docking, single-cell transcriptomics, and bibliometric analysis. Our findings identified EGFR and HSP90AB1 as core molecular targets implicated in BaP-induced AD. Single-cell transcriptomic analysis revealed cell-type-specific expression patterns, with EGFR significantly upregulated in astrocytes and oligodendrocyte precursor cells, and HSP90AB1 highly expressed in astrocytes and neurons, underscoring their distinct roles in AD pathophysiology. Furthermore, ROC curve analysis demonstrated that EGFR exhibited greater diagnostic potential than HSP90AB1. In addition, bibliometric analysis revealed an evolving research focus on EGFR in the context of AD, transitioning from basic mechanistic studies toward translational applications. This trend highlighted the growing interest in EGFR as a clinically relevant target for early diagnosis, prevention, and therapeutic intervention in AD.
Footnotes
Acknowledgements
The authors have no acknowledgments to report.
Ethical considerations
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Author contribution(s)
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the Natural Science Foundation of Heilongjiang Province (grant number QC2018096).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data will be made available on request.
