Abstract

Dear Colleague:
Welcome to volume 30 (5) of the Intelligent Data Analysis (IDA) Journal.
Dear reader, welcome again to this fifth issue of IDA's 30th year. This issue contains a selection of both theoretical and applied contributions to data analysis. The first part covers theoretical contributions, while the second part includes papers on various real-world applications.
The first paper in the group of theoretical contributions covers generic methods, starting with clustering techniques. Yang et al. (a) introduce SDCG, a parameter-free, semi-supervised density-based clustering algorithm that effectively handles datasets with varying densities and complex shapes. By using granular balls for initial data segmentation and an adaptive mutual nearest-neighbour voting mechanism for label propagation, the proposed method assigns clusters adaptively and significantly outperforms existing approaches in both accuracy and efficiency. The second theoretical paper, by Bobbu et al., proposes a novel deception-detection model that reduces ensemble complexity and improves accuracy by combining a Quantum-Inspired Genetic Algorithm (QGA) with Integrated K-means clustering. By using clustering to diversify data bags across 40 base learners and leveraging QGA to select the optimal classifiers, the model successfully outperforms state-of-the-art approaches in criminal analysis testing.
In the third paper on methods and theory, we focus on reinforcement learning. Cao et al. tackle the problem of low exploration efficiency and sparse early rewards caused by high-dimensional spaces in multi-agent reinforcement learning. This paper proposes DE-QMIX, an optimisation method that integrates Differential Evolution with the QMIX algorithm. By combining evolutionary operations with gradient descent to optimise the hyperparameters of the mixing hypernetwork, the framework improves the accuracy of the joint action-value function, outperforming established baselines like MAVEN and VDN in win rates and global rewards on the StarCraft Multi-Agent Challenge platform.
A second group of theoretical contributions covers graph and association analysis. We start with an interesting contribution in the domain of recommendation problems. Here, Wu et al. introduce the Tridimensional Sequential Fusion Network (TriSeqNet), a new method to address the limitations of traditional recommender systems, which overlook multidimensional relationships. The proposed method models user-item, user-user, and item-item interactions simultaneously and, by comprehensively capturing interest evolution, item behaviour sequences, and interaction feature embeddings, significantly enhances click-through rate prediction accuracy, outperforming several baseline models across three public datasets. The next paper in this group addresses the document analysis problems, for which He et al. introduce DRAIN, a document-level relation extraction model designed to improve cross-sentence reasoning by combining graph aggregation with an attention mechanism. By utilising Relational-Graph Convolutional Networks (R-GCN) to construct a document structure diagram and incorporating a key-sentence extraction method, the framework effectively reduces noise. It enhances entity interactions, outperforming existing models on the DocRED and CDR datasets.
We complete the theoretical methods with another graph-related problem, in particular, time-related graph analysis. The contribution by Yang et al. (b) identifies a novel problem of periodic (α, β)-biclique enumeration to capture recurrent group behaviours in temporal bipartite graphs using a θ-periodic biclique model. To address computational complexity, the authors propose an edge-based reduction framework supported by two optimised data structures – a linked edge list and a Trie-based detector – which together achieve up to a 10× speedup in efficiency and scalability compared to baseline methods.
In the applied contributions section, we start with several use cases in the healthcare and medical domain. First, Huang et al. propose an intelligent hospital financial management system that leverages the Long Short-Term Memory (LSTM) algorithm and IoT devices to overcome challenges such as data silos and delayed queries in traditional property management. By analysing real-time data to track and optimise equipment status, resource consumption, and financial expenditures, the proposed system demonstrates high stability. It achieves a 92.5% prediction accuracy, proving its practical value for hospital operations. The second paper, by Malarvizhi Kumara et al., is a medical diagnosis problem, in particular, a new method to detect multiple seizure occurrences and seizure clusters using a robust epilepsy detection framework that combines a Fractional integro-differential Duffing Oscillator (Fid-DO) with a Meta-step Rootsig Long Short-Term Memory (MsRs-LSTM) network. By utilising advanced signal decomposition and feature extraction techniques on EEG data, the proposed system successfully identifies complex seizure patterns with an impressive classification accuracy. We conclude the clinical application problems with another challenge, which we address using language models. Fan & Jiyu present a study that develops a generative dialogue model for traditional Chinese medicine by continuous pretraining, supervised fine-tuning, and DPO optimisation of a generic open LLM model using the extensive Chinese Medical Code database. Evaluated through both automatic metrics and manual review, the model demonstrates robust generative capabilities that outperform existing methods, highlighting its potential to enhance digital hospital management and patient engagement tools.
The next application paper switches domains but also uses the language model, in his case, Lin, et al. present an interesting IDA application in the legal domain, in particular, to address the overlooked aspect of interpretability in Similar Case Matching (SCM), this study proposes an integrated four-module pipeline that identifies, matches, and aligns crucial judicial feature sentences between legal cases while resolving any conflicting data. This transparent approach not only provides clear evidence of case similarity but also establishes a new benchmark, achieving a final score of 76.91% and outperforming baseline models by 30%.
The following article is also an applied contribution, in this case in the domain of software engineering, where Malhorta & Pandey's paper demonstrates that the Bat Algorithm (BA), a search technique inspired by bat echolocation, can be successfully applied to automatic test suite generation by using mutants and mathematical constraints to ensure correctness. Experimental results show that BA significantly outperforms other established search-based algorithms, reducing the test suite size by a significant factor compared to Genetic Algorithms and showing clear advantages over PSO and ABC methods.
The next group of applied papers is all related to the problem of anomaly detection. First, Wang et al. address the limitations of traditional methods that only detect overall trajectory anomalies. This paper introduces a novel method for sub-trajectory anomaly detection by segmenting, merging, and grouping trajectory data. By utilising cosine distance clustering, minimum bounding rectangles, and density clustering, the proposed approach accurately identifies localised anomalous segments and successfully outperforms existing methods in detection accuracy. The second paper in this group, by Kadiyala et al., focuses on a specific security application, namely an approach to address security and privacy challenges in rapidly growing IoT networks. This paper proposes a decentralised intrusion detection system that integrates blockchain technology, homomorphic encryption, and deep learning models like ConvLSTM and GRU. By combining edge preprocessing, SHA-256 hashing, and L-Diversity privacy protection, the framework ensures secure data integrity and anonymity, achieving outstanding anomaly-detection accuracy while optimising energy efficiency and system stability. Also in the domain of cybersecurity, Kareem et al. introduce a robust intrusion detection system based on anomaly detection for network and sensor platforms that combines Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks with fuzzy weighting algorithms and attention mechanisms. Optimised using Chaos Game Optimisation (CGO) and Gradient-Based Optimiser (GBO), the system achieves impressive accuracy on the BoT-IoT dataset while maintaining the low resource consumption required for real-time threat mitigation in dynamic networks.
The next paper in the section of applied real-world problems is in the domain of energy consumption prediction. In their paper, Xiao et al. address gaps in estimating human energy expenditure (EE) across age groups by introducing a deep learning model that accurately accounts for differences between elderly and young populations. By utilising a multi-branch network with attention-enhanced spatial and temporal streams, the proposed method efficiently captures complex sensor data dependencies, achieving state-of-the-art accuracy with strategically placed wearable devices.
We conclude our issue with a final application paper on computer vision by Akar et al., which offers an interesting study evaluating real-time personal protective equipment (PPE) detection by comparing baseline YOLOv13 models with optimised YOLOv8 variants across two custom datasets. While Optuna-based hyperparameter tuning achieved absolute gains of up to 2.5 percentage points in mAP50, an analysis revealed that these performance improvements are highly dependent on dataset scale and variability, and that they remain stable across different training runs.
With my best regards,
