Abstract

Dear Colleague:
Welcome to volume 29(6) of the Intelligent Data Analysis (IDA) Journal.
Dear reader, welcome to the sixth and final issue of IDA's 29th year. In this issue, we have selected papers that present both theoretical research and practical contributions to the field of Intelligent Data Analysis. The first part of this issue features theoretical manuscripts, while the second part concentrates on applied topics.
The first part, which includes the theoretical contributions, begins with three separate papers on clustering techniques. First, Yang et al. introduce ML-CCHDB, a new framework for multi-level co-location pattern mining, which enhances the detection of global and local co-location patterns by optimising column calculations and using HDBSCAN clustering. Experimental results confirm that ML-CCHDB improves efficiency and effectiveness in identifying common sub-areas and patterns in spatial datasets. The second paper by Jiao et al. introduces LMSCAW. This low-rank multi-view subspace clustering model improves stability and efficiency by decomposing self-expression matrices into low-rank products and adaptively weighting multiple views. Experimental results on benchmark datasets demonstrate that LMSCAW outperforms existing methods, showcasing its effectiveness in multi-source data partitioning. In the last paper on clustering by Zhang et al., the authors introduce GGLSP, a graph-guided local structure propagation method for tensorial multi-view subspace clustering that improves local detail capture and incorporates prior tensor information. Experimental results across multiple datasets show that GGLSP outperforms existing approaches, confirming its effectiveness and efficiency.
The next group of theoretical contributions cover new methods on time series forecasting. In the first of these papers, Cho et al. present TSDNet, a two-stage hybrid deep neural network that improves long-term time series forecasting by separately managing trend and seasonal components with specialised modules. Experiments demonstrate that TSDNet consistently improves forecast accuracy, particularly for long-term predictions, confirming its effectiveness over existing methods. In the second paper on this topic, Wang et al. propose DPL-Net, a deep learning framework that simultaneously develops a classification model and class-discriminative prototypes using an automatic heuristic-based initialisation method. Experiments demonstrate that DPL-Net achieves competitive accuracy and improved interpretability compared to existing models across multiple time series datasets.
The third group of papers are new proposed methods for classification and supervised learning. First, Utukuru & Krishna propose a Resilient Decision Tree classifier that effectively manages incomplete datasets by utilising subspace classifiers on different feature subsets without imputation, aligning with Responsible AI principles. Experimental results show that this weighted ensemble method improves prediction accuracy on datasets with missing values, thereby increasing the robustness of classification models. The second paper by Li et al. presents FedEP, a federated learning approach that utilises ensemble class prototypes to address issues arising from long-tailed data and class imbalance across clients. Experimental results demonstrate that FedEP effectively enhances model performance and stability, surpassing other baseline methods.
The last group of theoretical contributions covers the areas of language models, natural language processing, and text mining. We begin this group with two papers on sentiment analysis, starting with the paper by Pande & Vishwakarma, where the authors introduce VyAnG-Net. This multimodal model integrates textual, visual, and acoustic features with attention mechanisms to enhance sarcasm recognition in conversations. Extensive testing demonstrates that it has higher accuracy compared to current methods and confirms its versatility across various datasets. The second paper on this topic, conducted by Malik et al., presents an interpretable framework for detecting threatening speech on Twitter, utilising Pattern Structures and Abstract Meaning Representations to support classification and provide explanations. Experimental results on a newly created English Twitter corpus demonstrate that the framework maintains consistent accuracy and offers meaningful, trustworthy interpretations, thereby aiding social media regulation efforts. We conclude this group of papers with a final contribution on entity recognition. The research by Belbekri et al. introduces a synonym-based oversampling technique for NLP named entity recognition that employs pre-trained Word2Vec embeddings to produce semantically consistent synthetic examples for underrepresented classes. Experimental results demonstrate enhanced recognition accuracy across various categories, showcasing the effectiveness of semantic oversampling in tackling data imbalance in NLP tasks.
The second part of this issue addresses applied problems and use cases. The first of them is a computer vision problem. Liang et al. present a paper that introduces ADKD, a novel knowledge distillation framework incorporating modules for adaptive generative guidance and reused head distillation to enhance object detection models. Extensive experiments on two well-known datasets demonstrate that ADKD outperforms existing methods, effectively improving student model performance while lowering complexity.
The next group of papers are health applications of intelligent data analysis techniques. First, Swetha et al. present a deep learning ensemble approach to enhance the accuracy of melanoma skin cancer detection, incorporating modifications to pre-trained models and various ensemble techniques. The results show that the weighted average ensemble attains a notable peak melanoma classification accuracy, demonstrating the effectiveness of ensemble methods in skin cancer image classification. The second study, conducted by Yu et al., develops a multipath deep learning model that combines multiple MRI sequences and synthesised samples to accurately distinguish between muscle-invasive and non-muscle-invasive bladder cancer. The model exhibits superior performance to medical professionals in internal tests and approaches the level of expert radiologists in external evaluations, facilitating improved preoperative diagnosis. The following medical application by Ndiaye et al. presents the Inner-Inter City Learner (IIL), a model that combines inter-city mobility graphs and intra-city transmission dynamics to forecast COVID-19 cases at the city level. Experimental results show that IIL surpasses existing methods across various forecast horizons, offering a valuable tool for localised epidemic response. The last paper from this group, by Shirodkar et al., is not a medical application per se but a neuroscience use case. This study enhances motor imagery EEG signal classification by combining optimised frequency filtering, wavelet-based time-frequency representations, and a vision transformer deep learning model, achieving high accuracy on benchmark datasets. The approach significantly outperforms existing methods, advancing BCI applications for communication in individuals with mobility issues.
We conclude this applied part of the issue with an industrial application. Aksoy & Genc's contribution develops and compares various deep learning models, such as BiLSTM and ConvLSTM2D, for accurately forecasting renewable energy generation at a national scale, outperforming traditional machine learning methods. The findings indicate that memory-based deep learning architectures, particularly BiLSTM, significantly enhance prediction accuracy and reliability, supporting better energy management and grid integration.
With our best wishes,
Dr. J.M. Peña
Editor-in-Chief
