Intelligence model for Alzheimer’s disease detection with optimal trained deep hybrid model

Abstract

Alzheimer’s disease (AD), a neurodegenerative disorder, is the most common cause of dementia and continuing cognitive deficits. Since there are more cases each year, AD has grown to be a serious social and public health issue. Early detection of the diagnosis of Alzheimer’s and dementia disease is crucial, as is giving them the right care. The importance of early AD diagnosis has recently received a lot of attention. The patient cannot receive a timely diagnosis since the present methods of diagnosing AD take so long and are so expensive. That’s why we created a brand-new AD detection method that has four steps of operation: pre-processing, feature extraction, feature selection, and AD detection. During the pre-processing stage, the input data is pre-processed using an improved data normalization method. Following the pre-processing, these pre-processed data will go through a feature extraction procedure where features including statistical, enhanced entropy-based and mutual information-based features will be extracted. The appropriate features will be chosen from these extracted characteristics using the enhanced Chi-square technique. Based on the selected features, a hybrid model will be used in this study to detect AD. This hybrid model combines classifiers like Long Short Term Memory (LSTM) and Deep Maxout neural networks, and the weight parameters of LSTM and Deep Maxout will be optimized by the Self Updated Shuffled Shepherd Optimization Algorithm (SUSSOA). Our Proposed SUSSOA-based method’s statistical analysis of best values such as 57%, 53%, 28%, 25%, and 21% is higher than the other models like SSO, BMO, HGS, BRO, BES, and ISSO respectively.

Keywords

Improved data normalization mutual information-based features improved Chi-square long short term memory deep maxout neural network self updated shuffled shepherd optimization algorithm

1. Introduction

A neurodegenerative illness known as Alzheimer’s disease (AD) causes b-amyloid peptide (Ab), and neurofibrillary tangles, including neuronal degeneration to accumulate in the brain tissues over time [7]. Additionally, it affects mental and memory processes [6,33]. There are currently no effective treatments for AD, and those that are available simply slow or stop it from progressing. Because of this, managing and halting the progression of AD requires early detection. Numerous genes as well as pathways perform crucial functions in the treatment of AD and the detection use of biomarkers. The gene that has undergone the biggest shift is PDHA1, which is a target gene for the treatment of AD. The primary morphological differences of brain structures addressed for a prompt and precise AD diagnosis include the size of the ventricles, the form of the hippocampus, the cortical thickness, as well as the volume of the brain [19,24,36]. A little medial subcortical brain area called the hippocampus seems to be important for both short- and long-term memory [13].

Mild Cognitive Impairment (MCI) is a term used to describe people who are in the early stages of Alzheimer’s disease (AD), whereas not every MCI patients go on to acquire AD. MCI is indeed a stage between normal as well as AD when an individual has subtle cognitive abnormalities that are visible to them and their relatives but who is still capable to carry out daily tasks. The existence of relevant genes inside a person’s genome & personal history is indeed the two biggest risk indicators for AD. A medical assessment and an in-depth discussion with the patients as well as their relatives serve as the foundation for an AD diagnosis. However, an autopsy has been required to provide a “ground truth” diagnosis of AD, that isn’t clinically important [8,15].

Detecting AD is often challenging as clear manifestations often don’t appear until several years after onset. Current forms of AD diagnosis are both time-consuming and expensive, which might explain why almost half of those living with AD do not receive a timely diagnosis [5,23]. Many neuroimaging techniques have been developed for exploiting brain functions and structures, such as diffusion tensor imaging, electroencephalography, magnetic resonance spectroscopy, and magnetic resonance imaging (MRI) [17,35].

Recently, MRI has become increasingly popular in studying brain nerve connections. MRI has shown tremendous promise as one type of well-developed brain imaging technology in providing detailed information for the diagnosis of high-level neurological disorders, such as depression and schizophrenia. Rapid developments in neuroscience and machine learning (ML) are widely used for automatic pattern recognition of clinical image data [25,27,30].

Detecting AD is difficult, and successful classification calls for a strong ability to discriminate certain features among similar brain image patterns [14]. Recently, several deep neural strategies for diagnosing AD were developed, and they function more effectively than traditional machine learning techniques. However, the majority of the already available computer-aided diagnostic algorithms ignored the medical and biological data of the patients and instead utilized neuroimaging characteristics for diagnosis. Because of this, the AD diagnosis has been inaccurate. The conventional methods of AD diagnosis have certain issues: AD does not receive a timely diagnosis. Although the implementation in DA appears straightforward, tweaking also requires parametric quantity. Therefore, the traditional methods are also not very effective. In order to overcome these issues and speed up execution, this research proposes a unique optimization strategy based on a feature selection. This strategy determines the constraints such as reduced error, computational time (complexity), and accurate AD detection. As a consequence, we created a novel and accurate AD detection technique that offers the following contributions.

To provide an efficient and accurate AD detection, we have developed a novel method with the hybridization of classifies such as LSTM and Deep Maxout neural network.

To provide better quicker data execution we have developed improved data normalization in our work.

In addition to statistical and mutual information-based features, we applied improved entropy-based features to achieve an accurate classification.

An improved chi-square test is developed in this work, which reduces the computational time by an effective feature selection.

In order to maximize the detection accuracy and reduce the error rating by tuning the weight parameter of neural networks, we have developed a Self Updated Shuffled Shepherd Optimization Algorithm (SUSSOA).

The following describes how this paper gets structured: Section 2 mentions specific literary works that are relatable to AD detection, Section 3 elaborates on the thorough operation of our proposed AD detection approach, Section 4 presents our work’s implementation outcomes, and Section 5, about the work’s conclusion, following that references are provided for this work.

2. Literature survey

The following is a description of a few recent types of research that are associated with AD diagnosis.

Deep learning (DL) was utilized by JananiVenugopalan et al. [31] to comprehensively examine imaging, and genetics, including clinical testing, sets & categorize people into MCI, AD, as well as controls (CN). Diagnostic as well as genomic information was utilized to retrieve features, whereas imaging data were processed employing 3D-convolutional neural networks (CNNs) plus stacked noise removal auto-encoders. To find the best-performing features that the neural nets using clustering, as well as perturbation assessment, learned a new data interpreting technique was also created.

Junghyun Koo et al. [21] have used a neural network and a modest database to recognize Alzheimer’s disease utilizing several multi-modal characteristics taken via pre-trained networks. A redesigned CRNN-based architecture was combined with those multi-modal characteristics to handle categorization as well as regression operations at the same time and thus is able to compute talks of varying lengths. The test findings beat the baseline by 18.75%, while the validation outcome for the regression problem reveals the potential of categorizing four types of cognitive disability having an accuracy of 78.70%.

Weiming Lin et al. [22] created a paradigm for AD multi-label assessment using a linear discriminant analysis (LDA) score technique to most economically integrate multimodal information. Pre-processing of positron emission tomography MR imaging, cerebrospinal fluid biomarkers, as well as genetic characteristics, began with age adjustment, and extraction of features, including feature reductions. They are then separately rated with LDA to determine the ratings that reflect the AD pathogenic progression in multiple modalities. Ultimately, utilizing these scores, an extreme learning machine-dependent decision tree was created to do multi-label diagnosis.

Qi Ying et al. [37] suggested a novel multi-modal DNN for Alzheimer’s illness assessment that combines both MRI as well as SNPs. The SNP branch serves to strengthen the effectiveness of the imaging branch whenever the prognosis confidence was poor. This concept is based on clinical practice, which dictates that if a healthcare professional is unsure about their choice, more tests are typically requested to aid in the decision-making process. Findings suggest that when every patient has both MRI as well as SNP data; the suggested methodology enhances Alzheimer’s diagnostic effectiveness to 93.5% AUC & 96.1% AP, correspondingly.

To recognize Alzheimer’s disease, Ning Wang et al. [32] established a modular multimodal strategy that employs four topologies that use CNN as well as multi-head attention just on the ADReSSo Trial training set. This approach makes use of 3 distinct characteristic sets: acoustic characteristics, linguistic, as well as embeddings. Some architecture only utilizes the acoustical characteristics, some only employ the linguistic features, some only employ the embeddings, and others incorporate all of those aspects.

A Multimodal Alzheimer’s Disease Diagnostic paradigm (MADDi) was published by Michal Golovanevsky et al. [18] to reliably identify the existence of AD including moderate cognitive impairment (MCI) via neuroimaging, genetic, as well as medical information. Cross-modal attention, which records communication amongst modalities, is indeed a novel feature of MADDi. Therefore, multi-class categorization was used, a difficult undertaking gave the close resemblances between MCI as well as AD.

Silvia Basaia et al. [10] used MRI data to construct a Deep Learning scheme to diagnose Alzheimer’s disease (AD) including moderate cognitive impairment. CNN had been performed on MRI scans, as well as its performance in differentiating AD, c-MCI, as well as s-MCI was evaluated. All classes demonstrated excellent rates of accuracy, only with the best rates attained in the AD vs HC categorization tests utilizing both the ADNI as well as the merged ADNI + non-ADNI datasets.

Abul Basher et al. [11] proposed a method that blends a CNN paradigm with a deep neural network (DNN) architecture. The left, as well as right hippocampi, were independently localized employing a 2-step ensemble Hough-CNN. The pre-processed 2-D patches are utilized to retrieve volumetric data out of each slice via a discrete volume estimating CNN (DVE-CNN) framework. The gathered volumetric characteristics were employed to train and evaluate the categorization network. Table 1 shows A review of recent literature on the diagnosis of AD

Sharma et al. [29] proposed a hybrid-based AI-based model that combines permutation-based machine learning (ML) voting classifier and transfer learning (TL) in terms of two fundamental stages. Two TL-based models, DenseNet-121 and DenseNet-201, are used in the first phase of implementation for extracting features, and three different ML classifiers, SVM, Nave base, and XGBoost, are used in the second phase for classification.

Balaji et al. [9] suggested a hybrid Deep Learning Approach. Magnetic resonance imaging (MRI), positron emission tomography (PET), and conventional neuropsychological test results are combined in a strategy for early AD identification utilizing multimodal imaging and Convolution Neural Network with the Long Short-Term Memory technique.

Table 1
A review of recent literature on the diagnosis of AD

Citation Method Features Challenges

JananiVenugopalan et al. [31] Multimodal DL Superior to single modality DL Applicable to a small dataset

Junghyun Koo et al. [21] Convolutional Recurrent Neural Network Model don’t need any metadata Only obtains 81% accuracy

Weiming Lin et al. [22] LDA-based scoring strategy approach AD diagnosis performance was enhanced by multimodal data The practical implementation of this strategy will be constrained by the demand of large number modalities.

Qi Ying et al. [37] Multi-modal deep neural network Need less training time SNP model has less performance

Ning Wang et al. [32] C-Attention network Attains 80.28% accuracy C-Attention-Embedding restrains the system performance

Michal Golovanevsky et al. [18] Multimodal Alzheimer’s Disease Diagnosis framework (MADDi) Provide higher performance Specific parts of the brain were not visible to the model.

Silvia Basaia et al. [10] 3D-CNN Applicable to previously undisclosed patient data It is impossible to isolate the possibility of future c-MCI within s-MCI patients.

Abol Basher et al. [11] CNN + DNN It is totally autonomous and achieves greater accuracy. The accuracy of previously developed Hough-CNN as well as DVE-CNN techniques has a significant impact on performance.

Sharma et al. [29] Hybrid based AI-based model Increased accuracy and specificity Testing and training are similar

Balaji et al. [9] Hybrid deep learning approach Identify the illness accurately.
Higher accuracy. Less effort
Results are not enhanced.

Citation	Method	Features	Challenges
JananiVenugopalan et al. [31]	Multimodal DL	Superior to single modality DL	Applicable to a small dataset
Junghyun Koo et al. [21]	Convolutional Recurrent Neural Network	Model don’t need any metadata	Only obtains 81% accuracy
Weiming Lin et al. [22]	LDA-based scoring strategy approach	AD diagnosis performance was enhanced by multimodal data	The practical implementation of this strategy will be constrained by the demand of large number modalities.
Qi Ying et al. [37]	Multi-modal deep neural network	Need less training time	SNP model has less performance
Ning Wang et al. [32]	C-Attention network	Attains 80.28% accuracy	C-Attention-Embedding restrains the system performance
Michal Golovanevsky et al. [18]	Multimodal Alzheimer’s Disease Diagnosis framework (MADDi)	Provide higher performance	Specific parts of the brain were not visible to the model.
Silvia Basaia et al. [10]	3D-CNN	Applicable to previously undisclosed patient data	It is impossible to isolate the possibility of future c-MCI within s-MCI patients.
Abol Basher et al. [11]	CNN + DNN	It is totally autonomous and achieves greater accuracy.	The accuracy of previously developed Hough-CNN as well as DVE-CNN techniques has a significant impact on performance.
Sharma et al. [29]	Hybrid based AI-based model	Increased accuracy and specificity	Testing and training are similar
Balaji et al. [9]	Hybrid deep learning approach	Identify the illness accurately. Higher accuracy.	Less effort Results are not enhanced.

According to the review results, the previously existing methods for AD diagnosis have some shortcomings like reduced accuracy, lesser effort, simulation environment was complicated, impossible to isolate the possibility and specific parts are not visible, and so on. This article is driven to address the aforementioned issues with the extant models. Thus we have created a Self Updated Shuffled Shepherd Optimization Algorithm to adjust the weight parameter of neural networks in order to increase detection accuracy and decrease error rating (SUSSOA). The specifics of our suggested strategy are described in depth in the next section.

3. Alzheimer’s disease detection with optimal trained deep hybrid model

Alzheimer’s disease (AD) is indeed a neurological condition that can lead to dementia as well as other mental health issues in people. According to reports, there aren’t any recognized drugs or therapies which can stop or stop the spread of AD. As a consequence, it’s crucial to identify AD early on and create a treatment strategy to halt its development. We have developed a novel AD detection method that includes four working stages. In the initial pre-processing stage the input data get normalized by improved data normalization, for quicker and better detection. These pre-processed data get subjected to feature extraction where features such as statistical, Improved entropy-based, and mutual information-based features are extracted. An improved chi-square technique is developed in this work in order to provide an effective feature selection. In the final stage, the hybridization of classifiers including LSTM and Deep Maxout Networks was used in this stage for accurate AD detection and the parameter tuning of these classifiers was conducted by using SUSSOA, which is an improved form of Shuffled Shepherd Optimization Algorithm (SSOA). Figure 1 shows The architecture of our proposed SUSSOA-based AD detection has given below.

Fig. 1.

The architecture of the proposed SUSSOA-based AD detection.

3.1. Pre-processing

Input data get pre-processed in this initial stage. Data manipulation or deletion before usage in an attempt to guarantee or improve performance is generally described as data preprocessing. In our work, we applied an improved data normalization technique to turn the data into formats suited for the AD detection process.

3.1.1. Improved data normalization

An attribute’s data is scaled via normalization to fall within a narrower range. When handling various attributes on multiple scales, normalization is usually necessary; else, the performance of a significant and extremely important attribute (on a smaller scale) could be diminished since other attributes have values on a greater scale. Z-Score, decimal scaling, and Min-Max were the three data normalization methods. In our research, we have used improved min-max normalization.

Min-max scaling works identically to z-score normalization because it uses a formula to exchange column’s each value with a newer value [1]. In this instance, the equation is: $\begin{matrix} (1) & M = (z - z_{min}) / (z_{max} - z_{min}) \end{matrix}$

Here m represents the new value, the original cell value gets indicated by x, the column’s minimum value gets signified by $z_{min}$ and the column’s maximum value gets denoted by $z_{max}$ .

Our improved data normalization uses the following equation (2), instead of using the conventional equation (1) $\begin{matrix} (2) & M = \frac{(z - z_{min})}{(z_{max} - z_{min})} \times ω_{a} \end{matrix}$

Here $ω_{a}$ signifies the weight function which is computed using the logistic map $\begin{matrix} (3) & C_{b + 1} = 4 C_{b} (1 - C_{b}) \end{matrix}$

3.2. Feature extraction

The approach’s second stage is feature extraction which involves turning patterns into features that have been viewed as a compressed form. The preprocessed data are then put through this feature extraction process, where features including statistical, improved entropy-based, and mutual information-based features are extracted. Below is a description of the entire procedure [16].

Statistical features

The following are the statistical features used in this work.

Arithmetic mean

The arithmetic mean μ describes the average of the numerals ${c_{1}, c_{2}, \dots, c_{d}}$ presented within an interval of time. It was derived utilizing Equation (4). $\begin{matrix} (4) & μ = \frac{1}{d} \sum_{i = 1}^{d} c_{i} \end{matrix}$

Standard Deviation

Equation (5) was employed to estimate the standard deviation σ as just a way of gauging how far spread out the values ${c_{1}, c_{2}, \dots, c_{d}}$ are. $\begin{matrix} (5) & σ = \sqrt{\frac{1}{d} \sum_{i = 1}^{d} {(c_{i} - μ)}^{2}} \end{matrix}$

Standardized moment

It involves standardizing the K-th instant with regard to the mean. This was computed utilizing the formula. $\begin{matrix} (6) & \frac{μ_{K}}{σ_{K}} \end{matrix}$

$μ_{K}$ stands for the K-th moment about the mean. Both the fourth standardized moment (kurtosis), as well as the third standardized instant (skewness), were estimated and adopted as features.

Kurtosis

Equation (7) was employed to determine kurtosis, which serves as a gauge of how skewed the data’s likelihood function is. $\begin{matrix} (7) & Kur = \frac{μ_{4}}{σ^{4}} \end{matrix}$ where $μ_{4}$ seems to be the 4th instant in relation to the mean and is consequently determined by: $\begin{matrix} (8) & μ_{4} = \frac{1}{d} \sum_{i = 1}^{d} {(c_{i} - μ)}^{4} \end{matrix}$

Skewness

Skewness has been utilized to quantify the data’s asymmetries. Equation (9) was implemented to compute it. $\begin{matrix} (9) & Skew = \frac{μ_{3}}{σ^{3}} \end{matrix}$ where $μ_{3}$ would be the 3rd instant associated with the mean, which is established by $\begin{matrix} (10) & μ_{3} = \frac{1}{d} \sum_{i = 1}^{d} {(c_{i} - μ)}^{3} \end{matrix}$

Extracted statistical features were denoted as $F_{s}$ and $F_{s} = {μ, σ, Kur, Skew}$

3.2.1. Improved entropy-based features

To quantify the impurities connected to a random variable, entropy has been used. Shannon entropy has been employed in our work. It gauges the degree of randomness and expresses the expected value of the data inside a message, often in bit units.

A random variable Z’s Shannon entropy could be calculated using Eqns. (11) and (12), wherein $x_{k}$ represents the kth feasible value of X out of n while $A_{k}$ representing the probability that X will be equal to $x_{k}$ [34]. $\begin{aligned} (11) & H (Z) = H (A_{1}, \dots, A_{n}) = - \sum_{k = 1}^{n} A_{k} {log}_{2} A_{k} \\ (12) & A_{k} = Pr (Z = x_{k}) \end{aligned}$

Our improved entropy uses the following formula (13), instead of utilizing the formula (11), $\begin{matrix} (13) & H (Z) = - \sum_{k = 1}^{n} \frac{A_{k} {log}_{2} A_{k}}{| B |} \end{matrix}$

|B| is the cardinality of B, which refers to the number of elements in the given dataset.

To get an accurate AD detection, we have used these improved entropy-based features along with the conventional features in this work.

3.2.2. Mutual information-based features

A metric used to quantify the mutual dependence of two different variables would be called mutual information. This mutual information “ $I (D : E)$ ” naturally quantifies the information regarding D that E shares.

If D & E are both independent, D has no information of E and vice versa, hence its mutual information equals 0. If D & E were similar, then all data sent by D gets shared by E: realizing D discloses nothing unusual regarding E & vice versa, thus mutual information is identical to data given by D (or E) individually, specifically D’s entropy. Mutual information measures the gap between the joint allocation of E and D as well as the product of its marginal allocations in a certain manner [2]. $\begin{aligned} (14) & I (D : E) = H (D) + H (E) - H (D E) \\ (15) & I (D; E) = \sum_{β \in E} \sum_{α \in X} p (α, β) log \frac{p (α, β)}{f (α) g (β)} \end{aligned}$ where p denotes the joint likelihood distribution form of D as well as E and f and g denote the marginal likelihood distribution forms of D and E.

The retrieved features were symbolized as $R_{F} = {F_{s}, H (Z), I (D : E)}$

3.3. Feature selection

The extracted features were passed through the feature selection process. When creating a detection model, feature selection comprises the act of minimizing the count of input variables. It is preferable to decrease the count of input variables in order to lower modelling computational costs and performance improvement. We have used an improved Chi-square technique for an effective feature selection.

3.3.1. Improved chi-square technique

The chi-squared analysis determines the gap between the observed as well as anticipated values. Chi-Square illustrates or checks the link between two category variables that might be computed utilizing the provided observed as well as expected frequencies [3].

Chi-square formula $\begin{matrix} (16) & χ^{2} = \sum \frac{{(O V_{i t} - E V_{i t})}^{2}}{E V_{i t}} \end{matrix}$

Where

$O V_{i t}$ = Observed value

$E V_{i t}$ = Expected value

Our improved chi-square technique uses the following expression (17), instead of using the equation (16). $\begin{aligned} (17) & χ^{2} = \sum \frac{{(O V_{i t} - E V_{i t})}^{2}}{E V_{i t}} \times J (G, L) \\ (18) & Where J (G, L) = \frac{| G \cap L |}{| G \cup L |} \end{aligned}$

$| G \cap L |$ regarded as set G

$| G \cup L |$ Regarded as set L

The selected feature set gets denoted as $R_{F}^{'}$

3.4. Disease detection

The final disease detection depends on those chosen features, and the hybrid classifiers including Long Short Term Memory (LSTM) as well as Deep Maxout neural networks have been used, and the weighting parameter of such classifiers have been optimized by Self Updated Shuffled Shepherd Optimization Algorithm (SUSSOA). The detailed AD detection process has been described below.

3.4.1. Optimized LSTM

This Optimized LSTM neural network receives the chosen features as input. This LSTM cell at the preceding time step gets depicted by the cell on the left, whereas the cell at the following time step has been indicated by the cell on the right. It is the midpoint of the present time step. The cell is entered by three lines. The output first from the preceding timestep is received in the bottom left corner, along with the input $χ_{t}$ , as well as the output from the preceding layer is known as the hidden state in RNNs, referred to as ht-1 [26].

Before it enters the four gates, the input $χ_{t}$ , as well as the hidden layer, $λ_{t - 1}$ were combined. The 3rd input that the cell acquires from the preceding cell travels through the upper portion of the cells like a straight arrow. It is the cell state, which allows the LSTM to recall long-term connections with a far lower likelihood of the vanishing as well as exploding gradient issues encountered in standard RNN.

The cell state could be viewed as a data highway that travels throughout the entire chain of cells only with a few linear contacts, even if mathematically it is nothing more than a vector. It can be among the most essential elements, as it enables the LSTM to retain long-term input dependency. To such an internal memory, data can be read, written, and deleted. The new data must be added, never multiplied, towards the cell state in an attempt to overcome the vanishing gradient problem. This chain rule will never operate inside the backpropagation as well as addition distributes gradients uniformly. Four neural net layers, each having a unique purpose, make up the LSTM cell. The sigmoid function which has all three of these tiers produces a matrix having numbers from 1 to 0. The state of the cell being reset in Keras.

The forget gate seems to be the initial gate. It takes into account the outcome from the prior timestep, ht-1, as well as the present inputs, $χ_{t}$ . A neural layer’s product of the weights as well as the current input gets compressed into the sigmoid function, which turns this layer into a matrix having integers between 1 to 0. The forget gate is then employed to multiply the cell state first from the preceding cell element-wise. The forget gate can be compared to a filter that eliminates or reduces values from a prior cell state that we would like to eliminate or degrade (memory).

If a matrix having values around 0 to 1 is produced by the forget gate: $\begin{matrix} (19) & F G (t) = δ (W_{i n} \times [λ_{t - 1}, χ_{t}] + l_{F G}) \end{matrix}$

This is then passed up to the Cell state to be updated (the cross above the arrow). If the preceding timestep’s state becomes $P_{t - 1}$ [19,24,33], as well as the aforementioned computation yields $F G (t) = [1, 0, 1]$ , then the integer 4 is removed from storage via component-wise multiplication.

Uploading new data to the cell state: The gate containing the sigmoid as well as the tan h gate are the next steps. Specifically, the initial as well as second input gates. The input gate, like the forget gate, serves as a filter on the tan h layer. It has a value ranging from 0 to 1 and determines how much of the data must be preserved. $\begin{matrix} (20) & i n_{t} = δ (W_{i n} \times [λ_{t - 1}, χ_{t}] + l_{i n F G}) \end{matrix}$

Candidates for the new cell state values are being created by the second gate only with tanh activation function. Evidently, since the cell state might add as well as subtract data as fresh candidate entries were presented to it. The following is the formula of the new candidate values $P_{t}^{'}$ . A deeper inspection indicates that this formula is exactly equivalent to the central component of the earlier illustrated simple recurrent network. $\begin{matrix} (21) & P_{t}^{'} = tanh (W_{i n} \times [λ_{t - 1}, χ_{t}] + l_{P}) \end{matrix}$

With the aforementioned “input filter,” one then estimates the component-wise multiplication of the potential candidate values. $\begin{matrix} (22) & P_{t}^{i n} = P_{t}^{'} * i n_{t} \end{matrix}$

Essentially in the calculations so far, what we have done is compute some neural nets with the same inputs. We have chosen which values to erase from the memory (the previous cell state) and we have decided which values to write to the memory and we have filtered these values with a sigmoid. Then we are ready to update the cell state, by simply adding the old state modified by the forget gate, element-wise with the new candidate values $P_{t}^{i n}$ . $\begin{matrix} (23) & P_{t} = P_{t}^{F G} + P_{t}^{i n} \end{matrix}$

In total the modifications to the cell state are: $\begin{matrix} (24) & P_{t} = F G_{t} * P_{t - 1} + i n_{t} * P_{t}^{'} \end{matrix}$

Finally, we choose what data should be output from the cell. A classic neural layer receives the inputs as well as the previously hidden state. Before this output, a sigmoid activation is applied, and indeed the cell state that had been compressed in a Tanh layer gets multiplied point-wise to produce a vector of values between −1 and 1. The output of this operation is filtered by the cell status. The prior output as well as the current input appear to be the most crucial factors, however, the cell state might change the ultimate output by multiplying this by either positive or negative values. Finally, we can detect whether the AD is cognitively Normal (CN), Dementia, or Mild Cognitive Impairment (MCI) as output.

3.4.2. Optimized deep maxout neural network

The output from the feature selection stage is given as the input of this neural network. Every neuron inside a maxout neural network does have a grouping made up of r candidate pieces. The neuron activation has been determined to be the highest value obtained from all r components. Symbolize the ℓ-th hidden layers v-the node $ℏ^{v j}$ and its respective pieces as $y^{v j}$ their relationship satisfies the following criteria [12]: $\begin{matrix} (25) & ℏ_{ℓ}^{v} = max_{j \in 1, \dots, r} y_{ℏ}^{v j} \end{matrix}$ where $y_{ℓ}^{v j}$ is acquired from forward propagation first from underneath layer, i.e. $\begin{matrix} (26) & y_{ℓ} = w_{ℓ - 1}^{T} ℏ_{ℓ - 1} + B V_{ℓ} \end{matrix}$

Components of the ℓ the layer’s vector to be max pooled were $y_{ℓ}^{v j}$ and the v-th layers bias vector denotes as y.

The gradient for every max out neuron throughout training has always been 1, while only the weights for the component having the highest level of activation inside every group, $y_{ℓ}^{v j}$ , were modified.

The max out neuron is indeed a generalization of the ReLU neuron, the max-out nonlinearity is a universal approximator, as well as the max-pooling operation, offers comprehensive AD detection.

The max-pooling procedure, which has been initially used with convolutional networks, seems comparable to a winner-take-all operation. A highly active neuron gets chosen as a depiction of a location made up of r candidate neurons, with the remaining candidates being discarded. These candidate neurons in a maxout network could be thought of as many feature maps, each of which contains a distinct component of data from the layer beneath. The classifier is robust thanks to the maxout neuron, which chooses the most useful feature. To the ReLU nonlinearity, max-pooling occurs among a solitary feature map as well as 0, which functions like a maxout neuron having two components but only one of them is always 0. As opposed to ReLU, which simply discards data, maxout units intelligently choose features. The neurons instantly pick up the activation functions during the training of the maxout network and produce a useful output as to whether the AD is cognitively Normal (CN), Dementia, or Mild Cognitive Impairment (MCI).

The neural network weight optimization has been done in this work using the SUSSOA, which provides effective AD detection by minimizing detection errors, and the detailed weight optimization process is described below.

3.4.3. Self-updated shuffled shepherd optimization algorithm (SUSSOA)

During training, parameter tuning will be done by using SUSSOA, which is the enhanced form of the Shuffled shepherd optimization algorithm (SSOA). The primary source of motivation for the SSOA seems to be shepherds’ herding habits. Throughout time, people have discovered that they may harness animal traits to their advantage. Shepherds make an effort to lead their flocks in the correct direction. Shepherds typically use horses or herding dogs for such an objective, using the animals’ natural herding instincts to guide the herd and protect it from theft as well as predators. The information necessary to create the SSOA algorithm is based on this behavior. Illustrations of the herd as well as the shepherd’s mathematical formula are shown below [20,28].

When we split the flock of sheep into the “HE” count of herds with the “SH” count of sheep within every herd that we observe in ecology, we get the formula $n = HE * SH$ sheep (agents). To divide the sheep (agents) into every herd, we initially classify the sheep as per their objective function values in ascending order. Next, we arbitrarily place the very first sheep within every herd before selecting the second h sheep as well as placing them in a herd once more. This procedure is carried out again until every sheep gets grouped into a herd.

Shepherds work to guide the sheep within every herd in the direction of the horse. The sheep were arranged for every herd in ascending order by the ratings of their objective functions. From the initial to the final sheep, the sheep are chosen. The chosen member is usually assumed as a shepherd to determine the step size of such sheep’s motion, and this is indicated via the symbol p. Of course, there really are sheep that are superior to and inferior to the ones that were chosen. Horses are indeed the superior sheep. As a result, each shepherd has a few horses as well as sheep. It is possible to determine the motion vector utilizing nature’s law. In the natural environment, the shepherd guides the sheep toward the horse. As a result, two animals are chosen at random: one horse plus one sheep from the leftover sheep. The first shepherd approaches the chosen sheep and afterward rushes toward the horse to guide the flock. Consequently, the following is how the motion vector gets acquired: $\begin{matrix} (27) & {stepsize}_{p} = ϕ \times rand \circ (Y_{e} - X_{p}) + γ \times rand \circ (Y_{q} - Y_{p}) \end{matrix}$

Where $X_{p}$ , $Y_{e}$ , as well as $Y_{q}$ are indeed the solution vectors for the chosen sheep, horse, as well as shepherd inside an m-dimensional search area, respectively; and is indeed a random vector which indicates that every element is within the range $[0, 1]$ ; this same number of elements has been predicated on the count of elements of the solution descriptors; an attribute is equivalent to 0 just at beginning of the heuristic, afterward reduces by the algorithm’s iteration count to zero but also could be approximated by

The first component of the step size equals 0 for the initial sheep chosen from the herd as there are not any sheep there in the herd which is superior to it, it is also zero for the final sheep chosen first from the herd since it doesn’t contain any sheep inside the herd which are worse than it. A reduction in an as well as a rise in b eventually limits inquiry and boosts algorithmic exploitation. $\begin{aligned} (28) & γ = γ_{0} - \frac{γ_{0}}{max iteration} \times iteration \\ (29) & ϕ = ϕ_{0} + \frac{ϕ_{max} - ϕ_{0}}{max iteration} \times iteration \end{aligned}$

The following formula is employed to determine the temple solution vector for every sheep inside a herd after determining the step size for every sheep in the herd: $\begin{matrix} (30) & Q_{p}^{temple} = Q_{p}^{old} + {stepsize}_{p} \end{matrix}$

The location of the sheep gets changed if the temple objective function was not inferior to the old objective function; so we get $Q_{p}^{new} = Q_{p}^{temple}$ else, $Q_{p}^{new} = Q_{p}^{old}$ .

Our SUSSOA uses the following equation (31), instead of applying the equation (30), $\begin{aligned} (31) & Q_{p}^{temple} = \frac{Q_{p}^{old} + {stepsize}_{p}}{S d_{w t}} \\ (32) & S d_{w t} = \sqrt{\frac{\sum_{p = 1}^{N} (Q_{p} - {\bar{Q}}_{ϖ})}{(N^{'} - 1) \sum_{p = 1}^{N} ϖ_{p}}} \end{aligned}$

$ϖ_{p}$ is the weight of each observation of the sheep updated position

Q denotes the observation value

$\bar{Q}$ is the weighted average

$N^{'}$ is the count of non-zero observation

SUSSOA steps:

Initialization The starting location of the ith sheep gets picked at random in an m-dimensional search area using the SSOA attributes and also the accompanying formula. $\begin{matrix} (33) & Q_{p}^{0} = Q_{min} + rand (Q_{max} - Q_{min}) p = 1, 2, \dots, n \end{matrix}$ where n represents the count of sheep, rand is indeed a randomized vector having every element falling within the range $[0, 1]$ , $Q_{p}^{0}$ seems to be the starting solution vector for the p-th sheep, $Q_{max}$ as well as $Q_{min}$ are indeed the bounds of the model parameters, but each element of rand being randomly distributed within the range $[0, 1]$ .

Evaluations: Every sheep’s rating of the objective function gets assessed.

Construct herds: Utilizing the method shown above, the sheep were organized into herds.

Determine the step size: Formula (27) is employed to compute the step size for every sheep.

Determine the vector for the temple solution: Equation (31) is applied to compute the temple solution vector, which is obtained by evaluating the objective function.

Update the agent then combine: The location of the sheep gets updated as well as blended inside the herds if this same temple objective function was not inferior to the old objective function.

Update the attributes: Applying equations (28) as well as (29) to alter the values of γ and ϕ.

Termination scenario: Until the stated maximum count of iterations has been achieved, steps 3 to 7 were repeated.

3.4.4. Objective function as well as solution encoding

The objective function of our SUSSOA-based AD detection is the minimization of detection error which is given in the following Eq. (34). $\begin{matrix} (34) & F_{Objective} = Min (E) \end{matrix}$

Here $F_{Objective}$ denotes the objective function, and E signifies the error.

The solution encoding obtained is given in the following Fig. 2, here $W_{1}, W_{2}, \dots W_{n}$ were the weight parameters of LSTM, while the weight parameters of Deep maxout neural network were denoted by $w_{1}, w_{2}, \dots, w_{n}$ .

Fig. 2.

Solution encoding of proposed SUSSOA-based AD detection.

4. Results

Our work implemented in Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] on win32 PYCHARM – 2021.2.3 and the dataset used is IDA, and is downloaded from [4]. Table 2 displays the details of the hardware. We have used three classes of AD in our work, which are MCI, CN, and Dementia. The results of our work are compared with techniques including SSO, BMO, HGS, BRO, BES, Bi-GRU, DBN, CNN, RF, 3-D CNN, and DNN+CNN to prove the superiority of our method.

Table 2
Hardware details

Device Specification

Device Name DESKTOP-9FPAHOE

Processor AMD Ryzen 5 3450U with Radeon Vega Mobile Gfx 2.10 GHz

Installed RAM 16.0 GB (13.9 GB usable)

Device ID OFBA11F4-A5BO-44E7-B3F8-EBB51E779A20

Product Id 00327-36322-31369-AAOEM

System type 64-bit operating system, *64-based processor

Pen and touch No pen or touch input is available for this display

Device	Specification
Device Name	DESKTOP-9FPAHOE
Processor	AMD Ryzen 5 3450U with Radeon Vega Mobile Gfx 2.10 GHz
Installed RAM	16.0 GB (13.9 GB usable)
Device ID	OFBA11F4-A5BO-44E7-B3F8-EBB51E779A20
Product Id	00327-36322-31369-AAOEM
System type	64-bit operating system, *64-based processor
Pen and touch	No pen or touch input is available for this display

4.1. Convergence analysis

We have conducted the convergence analysis of our proposed SUSSOA for 0–50 iterations and the results were given in Fig. 3. Initially for 0–30 iterations the cost function rating of HGS, BMO and BRO attain the lowest cost function rates i.e, between 1.04 to 1.05. Although the initial cost function rating of our SUSSOA is high i.e, approximately 1.07 in 0 to 10 iterations, it got reduced to 1.055 at (0–30) iterations. At that time SSO and BES still retained their highest cost function rates which are 1.088 and 1.075. When iteration increases from 30–50, our SUSSOA provides the lowest cost function value which is approximately 1.043, which makes sure that SUSSOA can offer efficient AD detection.

Fig. 3.

Convergence analysis.

4.2. Performance analysis

Performance was evaluated using three categories of measures (“positive, negative, and miscellaneous”). Positive measures comprise “precision, specificity, accuracy, as well as sensitivity.” These positive indicators must be kept at higher levels. The findings achieved from our SUSSOA was depicted in Fig. 4. Detection accuracy is crucial among all these metrics. High detection accuracy was obtained using our SUSSOA. Surprisingly, for each difference in the learning percentage (LP), the detection accuracy of SUSSOA is demonstrated to be better than that of the extant techniques. SUSSOA’s accuracy value is significantly greater than 90% in all LRs. The detection accuracy observed via the SUSSOA at 60, 70, 80, as well as 90th LR, was approximately 91%, 95%, 93%, and 94%. Furthermore, the SUSSOA’s accuracy, specificity, as well as sensitivity were revealed for being higher than 90%. When SSO, HGS, BES, BMO, & BRO have precision ratings less than 0.8 for all LPs, but our SUSSOA offers rates of 0.91, 0.94, 0.92, and 0.93, which demonstrate our SUSSOA’s higher performance.

Fig. 4.

Proposed work’s performance compared with other algorithms.

The “F-measure, NPV, as well as MCC” is indeed the miscellaneous matrices. For good performance, such metrics must be sustained at higher levels. Furthermore, as seen in Fig. 5, the SUSSOA achieved the greatest F-measure of 96% at the 70th LR. In the 70th LR, the f-measure observed by the existing system is less than 90%. Not only in 70 LP, but also 60, 80, and 90 LPs, our SUSSOA has superior f_measure ratings of 0.9, 0.93, and 0.92, respectively, whilst other algorithms achieve lower rates (i.e, 0.6–0.85). Similarly, four SUSSOA attains recall of 0.9, 0.95, 0.92, and 0.94 approximately (from Fig. 5) whereas other techniques attain rates which are below 0.9. Furthermore, the SUSSOA has a larger NPV & MCC than the earlier models as demonstrated in Fig. 5. At all LPs, our SUSSOA achieves greater NPV and MCC ratings i.e, approximately >90%, which shows the superiority of our SUSSOA.

Fig. 5.

Proposed work’s matrices including f_measure, MCC, NPV, and recall comparison.

The “FPR and FNR,” are among the negative measures that must be as minimal as feasible. From Fig. 6, we can tell the SUSSOA has obtained the lowest FNR and FPR for each change in the LR. The FNR rating measured by the SUSSOA just at 70th LP is 0.04, the lowest value when comparing SSO = 0.15, HGS = 0.10, BES = 0.13, BMO = 0.32, and BRO = 0.20. Furthermore, the FPR of the SUSSOA has been discovered to be less than that of the earlier models under all LP variations.

Fig. 6.

Proposed work’s negative measure comparison.

4.3. Statistical analysis

Statistical analysis was conducted in our work, in order to prove the effectiveness of our SUSSOA and is compared with algorithms such as SSO, BMO, HGS, BRO, and BES which is given in Table 3. Our SUSSOA obtains the highest best and worst measures i.e, > 0.9, whereas other methods obtain the approximate rates between 0.5–0.88. Similarly while analyzing standard deviation, median, and mean measures also our SUSSOA gets the highest rates of 0.9563, 0.9578, and 0.0113 which proves our SUSSOA method can offer higher accuracy rates in the AD detection task. Additionally, the best of the proposed SUSSOA-based method’s statistical analysis is 57%, 53%, 28%, 25%, and 21% higher than the other models like SSO, BMO, HGS, BRO, BES, and ISSO respectively. Therefore, the proposed model is suggested as an optimal approach for Alzheimer’s disease detection.

Table 3
Proposed SUSSOA-based method’s statistical analysis comparison

Algorithm BEST WORST MEAN MEDIAN STND DEV

SSO 0.5980 0.8805 0.7538 0.7684 0.1037

BMO 0.6124 0.7549 0.6867 0.6898 0.0620

HGS 0.7336 0.8666 0.8037 0.8074 0.0590

BRO 0.7498 0.8173 0.7869 0.7903 0.0243

BES 0.7742 0.8468 0.8216 0.8327 0.0281

ISSO 0.9393 0.9704 0.9563 0.9578 0.0113

Algorithm	BEST	WORST	MEAN	MEDIAN	STND DEV
SSO	0.5980	0.8805	0.7538	0.7684	0.1037
BMO	0.6124	0.7549	0.6867	0.6898	0.0620
HGS	0.7336	0.8666	0.8037	0.8074	0.0590
BRO	0.7498	0.8173	0.7869	0.7903	0.0243
BES	0.7742	0.8468	0.8216	0.8327	0.0281
ISSO	0.9393	0.9704	0.9563	0.9578	0.0113

4.4. Classifier comparison

When analyzing our SUSSOA-based AD detection to other classifiers such as Enhanced Math Optimizer Accelerated Arithmetic with Ensemble Classifier (EMOAOA+EC), Bi-GRU, DBN, CNN, RF, 3D-CNN, as well as CNN + DNN which is given in Table 4. From that, we discovered that our SUSSOA-based AD detection yields greater positive measure ratings of 0.9704, 0.9631, 0.9800, and 0.9843, whilst EMOAOA+EC gets slightly lower values than that. Our SUSSOA-based AD detection outperforms not only EMOAOA+EC but also other strategies in terms of positive measures. While analyzing the measures including f_measure and recall, we have discovered that our SUSSOA-based AD detection can yield higher ratings (0.9631 and 0.9736) than other methodologies. While Bi-GRU, DBN, CNN, RF, 3D-CNN, and CNN+DNN offers the lowest MCC (between 0.38–0.68) and NPV(0.68–0.86) values. Not only in positive measures but in terms of negative measure also our SUSSOA-based AD detection offer superior outcomes i.e, 0.02 and 0.0369 than other techniques.

Table 4
Proposed SUSSOA-based AD detection method’s performance comparison with distinct classifiers

SUSSOA-based AD detection EMOAOA+EC Bi-GRU DBN CNN RF 3D-CNN CNN+DNN

accuracy 0.9704 0.9542 0.7352 0.8296 0.7480 0.6819 0.8375 0.7380

sensitivity 0.9631 0.9612 0.7685 0.7671 0.7680 0.8372 0.8497 0.6987

specificity 0.9800 0.9440 0.7081 0.9056 0.7212 0.5336 0.8263 0.7862

precision 0.9843 0.9612 0.6815 0.9081 0.7870 0.6315 0.8194 0.8007

recall 0.9631 0.9612 0.7685 0.7671 0.7680 0.8372 0.8497 0.6987

f_measure 0.9736 0.9612 0.7224 0.8317 0.7774 0.7200 0.8342 0.7462

mcc 0.9403 0.9052 0.4741 0.6712 0.4874 0.3879 0.6755 0.4827

npv 0.9532 0.9440 0.7902 0.7617 0.6986 0.7744 0.8556 0.6798

fpr 0.0200 0.0560 0.2919 0.0944 0.2788 0.4664 0.1737 0.2138

fnr 0.0369 0.0388 0.2315 0.2329 0.2320 0.1628 0.1503 0.3013

	SUSSOA-based AD detection	EMOAOA+EC	Bi-GRU	DBN	CNN	RF	3D-CNN	CNN+DNN
accuracy	0.9704	0.9542	0.7352	0.8296	0.7480	0.6819	0.8375	0.7380
sensitivity	0.9631	0.9612	0.7685	0.7671	0.7680	0.8372	0.8497	0.6987
specificity	0.9800	0.9440	0.7081	0.9056	0.7212	0.5336	0.8263	0.7862
precision	0.9843	0.9612	0.6815	0.9081	0.7870	0.6315	0.8194	0.8007
recall	0.9631	0.9612	0.7685	0.7671	0.7680	0.8372	0.8497	0.6987
f_measure	0.9736	0.9612	0.7224	0.8317	0.7774	0.7200	0.8342	0.7462
mcc	0.9403	0.9052	0.4741	0.6712	0.4874	0.3879	0.6755	0.4827
npv	0.9532	0.9440	0.7902	0.7617	0.6986	0.7744	0.8556	0.6798
fpr	0.0200	0.0560	0.2919	0.0944	0.2788	0.4664	0.1737	0.2138
fnr	0.0369	0.0388	0.2315	0.2329	0.2320	0.1628	0.1503	0.3013

4.5. Ablation study

To demonstrate the efficacy of our proposed SUSSOA-based AD detection approach, we performed an ablation test utilizing 3 scenarios of our proposed method, as shown in Table 5. If we do not employ optimization, our method only has an only accuracy of 0.8438, but our proposed SUSSOA-based AD detection has a rate of 0.9704. When utilizing conventional normalization, our method achieves accuracy as well as sensitivity levels of 0.9306 and 0.9334, respectively, but our proposed method achieves ratings of 0.9704 and 0.9631, demonstrating the superiority of our improved data normalization strategy. To demonstrate the performance of our improved entropy-based feature extraction, our proposed methodology was tested with conventional entropy, yielding lower positive measure ratings of 0.7620, 0.7281, 0.8, as well as 0.8060. In terms of other measures such as recall, precision, MCC, NPV, and f_measure also our SUSSOA-based AD detection offers higher ratings i.e, >90%. When assessing the negative measure ratings, our SUSSOA-based AD detection yields lower ratings, notably 0.02 and 0.0369, but other scenarios of our proposed approach attain higher ratings.

Table 5
Proposed SUSSOA-based AD detection method’s ablation analysis

SUSSOA-based AD detection Proposed without optimization Proposed with conventional normalization Proposed with conventional entropy

accuracy 0.9704 0.8438 0.9306 0.7620

sensitivity 0.9631 0.8718 0.9334 0.7281

specificity 0.9800 0.8107 0.9267 0.8006

precision 0.9843 0.8452 0.9458 0.8060

recall 0.9631 0.8718 0.9334 0.7281

f_measure 0.9736 0.8583 0.9396 0.7651

mcc 0.9403 0.6849 0.8581 0.5280

npv 0.9532 0.8422 0.9103 0.7213

fpr 0.0200 0.1893 0.0733 0.1994

fnr 0.0369 0.1282 0.0666 0.2719

	SUSSOA-based AD detection	Proposed without optimization	Proposed with conventional normalization	Proposed with conventional entropy
accuracy	0.9704	0.8438	0.9306	0.7620
sensitivity	0.9631	0.8718	0.9334	0.7281
specificity	0.9800	0.8107	0.9267	0.8006
precision	0.9843	0.8452	0.9458	0.8060
recall	0.9631	0.8718	0.9334	0.7281
f_measure	0.9736	0.8583	0.9396	0.7651
mcc	0.9403	0.6849	0.8581	0.5280
npv	0.9532	0.8422	0.9103	0.7213
fpr	0.0200	0.1893	0.0733	0.1994
fnr	0.0369	0.1282	0.0666	0.2719

We performed a comparative exploration to demonstrate the performance of our improved chi-square feature selection, and the outcomes have been shown in Table 6. When we use our proposed approach with traditional chi-square, we get positive measure ratings of 0.7894, 0.8303, 0.7451, and 0.7787, but when we use improved chi-square, we get ratings of 0.9704, 0.9631, 0.9800, and 0.9843. Instead of improved chi-square our method was conducted with LDA and PCA also, but the positive measure outcomes were very low. Our proposed method using improved chi-square offered excellent performance when analyzing recall, f measure MCC, as well as NPV ratings. Not only in terms of positive measures but in terms of negative measures also our proposed method with improved chi-square provided superior outcomes i.e, 0.02 and 0.0369.

Table 6

Performance comparison of the proposed method with diverse feature selection techniques

	Proposed with improved Chi-square	Proposed with conventional Chi-square	Linear Discriminant Analysis (LDA)	Principal Component Analysis (PCA)
accuracy	0.9704	0.7894	0.9118	0.7396
sensitivity	0.9631	0.8303	0.9317	0.7034
specificity	0.9800	0.7451	0.8847	0.7812
precision	0.9843	0.7787	0.9164	0.7863
recall	0.9631	0.8303	0.9317	0.7034
f_measure	0.9736	0.8037	0.9240	0.7425
mcc	0.9403	0.5784	0.8191	0.4839
npv	0.9532	0.8026	0.9053	0.6970
fpr	0.0200	0.2549	0.1153	0.2188
fnr	0.0369	0.1697	0.0683	0.2966

5. Computational time analysis

The complexity analysis of the proposed SUSSOA model over the conventional methods such as SSO, BMO, HGS, BRO, and BES is illustrated in Table 7. based on the results, the proposed SUSSOA model computational time has reached the lowest value of approximately 55.36, which is inferior to the conventional schemes. This demonstrates that the proposed SUSSOA model is computationally efficient.

Table 7
Time analysis

Methods Time

SSO 69.534

BMO 78.572

HGS 100.578

BRO 106.972

BES 91.324

SUSSOA 55.367

Methods	Time
SSO	69.534
BMO	78.572
HGS	100.578
BRO	106.972
BES	91.324
SUSSOA	55.367

6. Practical implications

Wearable devices that enable the diagnosis and prognosis of small sensors and biomedical devices have been developed in recent years as a result of the enormous advancements in electronics, biocompatible materials, and non materials. This has significantly improved the quality and effectiveness of healthcare services. Future patient monitoring and clinical care will be built on effective and economical wearable device solutions, making it possible to monitor patients remotely and for an extended period of time in homes and communities, which was previously unattainable. The creation of individualized medical treatment and the expense of healthcare for the aged population are two areas where wearable technology is anticipated to have a considerable impact.

7. Conclusion

This work has developed a novel AD detection with 4 stages such as pre-processing, feature extraction, feature selection, and AD detection. An improved data normalization approach was used for pre-processing the input data during the pre-processing stage. These pre-processed data will then be put through a feature extraction process in which features including statistical, improved entropy-based, as well as mutual information-based features were extracted. Using the improved Chi-square approach, suitable features were chosen from these extracted features. LSTM and Deep Maxout neural network-based hybrid model was developed for AD detection that trained with the chosen features, and also the weight parameters of LSTM and Deep Maxout were optimized by the SUSSOA. Unsupervised and self-monitoring methods are developing study fields in medical pictures as a result of the scarcity of medical data. The success of deep learning technology cannot be discounted, despite the fact that the majority of the problems in the field of AD categorization remain unresolved. Its ability to detect AD can sometimes outperform that of medical professionals. We will keep researching deep learning-based AD diagnosis techniques in the future. The final evaluation has proved the betterment of the proposed work in terms of different performance measures.

References

https://www.oreilly.com/library/view/feature-engineeringmade/9781787287600/aa5580ee-6fb7-4ac2-a1fe-369d95b70168.xhtml.

https://www.quantiki.org/wiki/mutual-information.

https://www.cuemath.com/chi-square-formula.

https://ida.loni.usc.edu/pages/access/studyData.jsp?categoryId=43&subCategoryId=94#.

Al-Shoukry,

T.H.

Rassem and

N.M.

Makbol, Alzheimer’s diseases detection by using deep learning algorithms: A mini-review, IEEE Access 8 (2020), 77131–77141. doi:10.1109/ACCESS.2020.2989396.

Altinkaya,

Polat and

Barakli, Detection of Alzheimer’s disease and dementia states based on deep learning from MRI images: A comprehensive review, Journal of the Institute of Electronics and Computer 1(1) (2020), 39–53.

Amini,

M.M.

Pedram,

Moradi,

Jamshidi and

Ouchani, Single and combined neuroimaging techniques for Alzheimer’s disease detection, Computational Intelligence and Neuroscience (2021).

Balagopalan,

Eyre,

Rudzicz and

Novikova, To BERT or not to BERT: Comparing speech and language-based approaches for Alzheimer’s disease detection, 2020, arXiv preprint arXiv:2008.01551.

Balaji,

M.A.

Chaurasia,

S.M.

Bilfaqih,

Muniasamy and

L.E.G.

Alsid, Hybridized deep learning approach for detecting Alzheimer’s disease, Biomedicines 11(1) (2023), 149. doi:10.3390/biomedicines11010149.

10.

Basaia,

Agosta,

Wagner,

Canu,

Magnani,

Santangelo,

Filippi and Disease Neuroimaging Initiative, Automated classification of Alzheimer’s disease and mild cognitive impairment using a single MRI and deep neural networks, NeuroImage: Clinical 21 (2019), 101645. doi:10.1016/j.nicl.2018.101645.

11.

Basher,

B.C.

Kim,

K.H.

Lee and

H.Y.

Jung, Volumetric feature-based Alzheimer’s disease diagnosis from sMRI data using a convolutional neural network and a deep neural network, IEEE Access 9 (2021), 29870–29882. doi:10.1109/ACCESS.2021.3059658.

12.

Cai,

Shi and

Liu, Deep maxout neural networks for speech recognition, in: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, IEEE, pp. 291–296.

13.

Carmo,

Silva,

Yasuda,

Rittner and

Lotufo, Alzheimer’s disease neuroimaging initiative, Heliyon 7(2) (2021), e06226. doi:10.1016/j.heliyon.2021.e06226.

14.

Ebrahimi,

Luo,

Chiong and Disease Neuroimaging Initiative, Deep sequence modelling for Alzheimer’s disease detection using MRI, Computers in Biology and Medicine 134 (2021), 104537. doi:10.1016/j.compbiomed.2021.104537.

15.

M.A.

Ebrahimighahnavieh,

Luo and

Chiong, Deep learning to detect Alzheimer’s disease from neuroimaging: A systematic literature review, Computer methods and programs in biomedicine 187 (2020), 105242. doi:10.1016/j.cmpb.2019.105242.

16.

Esmael,

Arnaout,

Fruhwirth and

Thonhauser, A statistical feature-based approach for operations recognition in drilling time series, International Journal of Computer Information Systems and Industrial Management Applications 4(6) (2012), 100–108.

17.

Folego,

Weiler,

R.F.

Casseb,

Pires and

Rocha, Alzheimer’s disease detection through whole-brain 3D-CNN MRI, Frontiers in bioengineering and biotechnology 8 (2020), 534592. doi:10.3389/fbioe.2020.534592.

18.

Golovanevsky,

Eickhoff and

Singh, Multimodal Attention-based Deep Learning for Alzheimer’s Disease Diagnosis, 2022, arXiv preprint arXiv:2206.08826.

19.

H.A.

Helaly,

Badawy and

A.Y.

Haikal, Toward deep mri segmentation for Alzheimer’s disease detection, Neural Computing and Applications 34(2) (2022), 1047–1063. doi:10.1007/s00521-021-06430-8.

20.

Kaveh and

Zaerreza, Shuffled shepherd optimization method: A new meta-heuristic algorithm, Engineering Computations. (2020).

21.

Koo,

J.H.

Lee,

Pyo,

Jo and

Lee, Exploiting multi-modal features from pre-trained networks for Alzheimer’s dementia recognition, 2020, arXiv preprint arXiv:2009.04070.

22.

Lin,

Gao,

Du,

Chen and

Tong, Multiclass diagnosis of stages of Alzheimer’s disease using linear discriminant analysis scoring for multimodal data, Computers in Biology and Medicine 134 (2021), 104478. doi:10.1016/j.compbiomed.2021.104478.

23.

Liu,

Li,

Luo,

Yang,

Li and

Bi, Alzheimer’s disease detection using depthwise separable convolutional neural networks, Computer Methods and Programs in Biomedicine 203 (2021), 106032. doi:10.1016/j.cmpb.2021.106032.

24.

Nadal,

Coupé,

Helmer,

J.V.

Manjon,

Amieva,

Tison,

J.F.

Dartigues,

Catheline and

Planche, Differential annualized rates of hippocampal subfields atrophy in aging and future Alzheimer’s clinical syndrome, Neurobiology of Aging 90 (2020), 75–83. doi:10.1016/j.neurobiolaging.2020.01.011.

25.

Odusami,

Maskeliūnas,

Damaševičius and

Krilavičius, Analysis of features of Alzheimer’s disease: Detection of early stage from functional brain changes in magnetic resonance images using a finetuned ResNet18 network, Diagnostics 11(6) (2021), 1071. doi:10.3390/diagnostics11061071.

26.

Øyri, Long Short-term Memory (LSTM) recurrent neural networks for urban hydrological modelling, 2020, Master’s thesis.

27.

Petti,

Baker and

Korhonen, A systematic literature review of automatic Alzheimer’s disease detection from speech and language, Journal of the American Medical Informatics Association 27(11) (2020), 1784–1797. doi:10.1093/jamia/ocaa174.

28.

RS, Early detection of Alzheimer’s disease with Ensemble-of-Classifiers, In communication.

29.

Sharma,

Gupta,

Altameem,

A.K.J.

Saudagar,

R.C.

Poonia and

S.R.

Nayak, HTLML: Hybrid AI based model for detection of Alzheimer’s disease, Diagnostics 12(8) (2022), 1833. doi:10.3390/diagnostics12081833.

30.

S.P.

Singh,

Wang,

Gupta,

Goli,

Padmanabhan and

Gulyás, 3D deep learning on medical images: A review, Sensors 20(18) (2020), 5097. doi:10.3390/s20185097.

31.

Venugopalan,

Tong,

H.R.

Hassanzadeh and

M.D.

Wang, Multimodal deep learning models for early detection of Alzheimer’s disease stage, Scientific reports 11(1) (2021), 1–13. doi:10.1038/s41598-020-79139-8.

32.

Wang,

Cao,

Hao,

Shao and

K.P.

Subbalakshmi, Modular multi-modal attention network for Alzheimer’s disease detection using patient audio and language data, in: Interspeech, 2021, pp. 3835–3839.

33.

Wen,

Thibeau-Sutre,

Diaz-Melo,

Samper-González,

Routier,

Bottani,

Dormont,

Durrleman,

Burgos and

Colliot, Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation, Medical image analysis 63 (2020), 101694.

34.

Wu,

J.P.

Noonan and

Agaian, Shannon entropy based randomness measurement and test for image encryption, 2011, arXiv preprint arXiv:1103.5520.

35.

Yamanakkanavar,

J.Y.

Choi and

Lee, MRI segmentation and classification of human brain using deep learning for diagnosis of Alzheimer’s disease: A survey, Sensors 20(11) (2020), 3243. doi:10.3390/s20113243.

36.

Yang,

Diao,

Wang,

Sun,

Zhou and

Xie, Identification of key regulatory genes and pathways in prefrontal cortex of Alzheimer’s disease, Interdisciplinary Sciences: Computational Life Sciences 12(1) (2020), 90–98.

37.

Ying,

Xing,

Liu,

A.L.

Lin,

Jacobs and

Liang, Multi-modal data analysis for Alzheimer’s disease diagnosis: An ensemble model using imagery and genetic features, in: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE, pp. 3586–3591.

Intelligence model for Alzheimer’s disease detection with optimal trained deep hybrid model

Abstract

Keywords

1. Introduction

2. Literature survey

3.1.1. Improved data normalization

3.2. Feature extraction

3.2.1. Improved entropy-based features

3.2.2. Mutual information-based features

3.3. Feature selection

3.3.1. Improved chi-square technique

3.4. Disease detection

3.4.1. Optimized LSTM

3.4.2. Optimized deep maxout neural network

3.4.3. Self-updated shuffled shepherd optimization algorithm (SUSSOA)

3.4.4. Objective function as well as solution encoding

Table 7 Time analysis Methods Time SSO 69.534 BMO 78.572 HGS 100.578 BRO 106.972 BES 91.324 SUSSOA 55.367

7. Conclusion

References

Table 7
Time analysis

Methods Time

SSO 69.534

BMO 78.572

HGS 100.578

BRO 106.972

BES 91.324

SUSSOA 55.367