Abstract
With the speedy prosperity of artificial intelligence and sensor technology, the application of action recognition in football is becoming increasingly widespread. However, due to the rapid changes, complex dynamics, and diverse characteristics of football movements, traditional recognition methods face significant challenges in terms of real-time performance and accuracy. Based on this background, a football recognition model combining network wearable sensors and improved support vector machine classification algorithm is proposed. Firstly, by integrating the attitude data of accelerometers and gyroscopes, real-time dynamic features such as pitch angle, roll angle, and yaw angle are calculated, and principal component analysis is used for dimensionality reduction processing. Subsequently, the support vector machine model is optimized based on Gaussian kernel function and dynamic weighting strategy to improve classification accuracy and stability. Finally, a football action recognition model is constructed by combining wearable sensors with an improved support vector machine classification algorithm. The experiment outcomes show that the improved support vector machine algorithm achieves an action recognition accuracy of 93.8%, a recall rate of 92.5%, an F1 value of 0.92, and an inference time of 10.2 ms, which are significantly better than the comparative algorithm. In practical applications, the recognition model built has an accuracy rate of over 90% in recognizing four types of actions: standing, running, passing, and shooting, with an average recognition time as low as 9.4 ms. The research provides an efficient solution for intelligent football action recognition technology and lays the foundation for the practical application of multi-modal data fusion.
Introduction
With the deep integration of artificial intelligence (AI) and sports science, action recognition technology using computer vision and multi-sensor data has gradually emerged as a focal point of research in the domain of motion analysis.1,2 Football action recognition, as an important branch, mainly extracts the movement characteristics of athletes to classify and recognize common actions such as shooting, passing, running, dribbling, tackling, and heading. These complex dynamic behaviors often involve rapid changes in posture, direction, and collision interactions, which pose higher demands on real-time sensing and classification accuracy. Currently, this technology has important application value in intelligent referee assistance, tactical analysis, and athlete training evaluation. However, existing football action recognition methods still suffer from issues such as insufficient multi-modal data fusion, inadequate dynamic feature extraction, and difficulty balancing real-time performance and accuracy. 3 Traditional action recognition models such as Support Vector Machine (SVM) are often applied in static or low-dimensional settings and rely on fixed feature inputs. In high-frequency dynamic scenarios like football, traditional SVM has difficulty adapting to temporal changes, resulting in low robustness, sensitivity to noise, and limited generalization when handling time-series signals. Furthermore, most traditional SVM models lack the ability to capture latent temporal dependencies across multi-sensor input streams, making them unsuitable for real-time and complex motion contexts. 4 In recent years, significant progress has been made in multi-modal action recognition. State-of-the-art approaches have integrated sensor fusion, pose estimation, and attention mechanisms to capture temporal-spatial relationships more effectively. For example, hybrid deep learning frameworks combining inertial sensor data and convolutional architectures have achieved high accuracy in controlled conditions. However, these methods often require large-scale training data, suffer from high computational costs, and are less adaptable to edge devices. Therefore, it is necessary to develop lightweight and interpretable models that enhance dynamic adaptability and robustness without relying on deep neural networks. For example, Convolutional Neural Networks (CNNs) can extract spatial features, while Recurrent Neural Networks (RNNs) improve their ability to capture dynamic features through time series modeling. 5 However, these methods still face challenges such as insufficient feature representation ability and high model training complexity when dealing with high-dimensional and multi-source data. To solve the above problems, multi-modal fusion technology has been introduced into football action recognition. For example, by combining video data and wearable sensor data fusion methods, researchers can improve the stability and accuracy of recognition in dynamic scenes. 6 In addition, the combination of pose estimation algorithms and machine learning models provides a new solution for the recognition of complex actions.
Compared to vision-based pose estimation methods such as OpenPose, which extract joint coordinates from RGB video frames, wearable sensors offer distinct advantages in terms of real-time performance and robustness to occlusions. While OpenPose and similar frameworks can provide detailed skeletal information, they are highly sensitive to lighting conditions, camera angles, and occlusions by other players or environmental objects. Moreover, video-based methods often require high computational power and GPU acceleration, making them less feasible for edge deployment in real-time training or competition environments. In contrast, wearable inertial sensors such as accelerometers and gyroscopes can directly capture body movement signals with high temporal resolution, regardless of visual conditions. These sensors are lightweight, cost-effective, and immune to visual occlusions, allowing for continuous and stable action tracking even during intensive motion or crowded scenes. Therefore, integrating wearable sensors into football action recognition systems provides a reliable and scalable solution for achieving high-accuracy and low-latency motion analysis in real-world scenarios. Based on the above background, an improved SVM classification algorithm combining network wearable sensors and attitude angle calculation is proposed, aiming to achieve efficient and accurate football action recognition. The novelty of this study resides not only in optimizing the dynamic adaptability of traditional SVM classification algorithms by integrating multi-source sensor data and temporal features but also in combining Principal Component Analysis (PCA) for dimensionality reduction, improving the accuracy and computational efficiency of feature expression, and providing a new solution for the field of football action recognition.
Related work
Football action recognition is a technology that combines sensors, machine vision, and deep learning techniques to automatically recognize and classify different actions by analyzing the movement characteristics of athletes during football matches or training. Currently, many experts have conducted research in this field. X. Zhao proposed a shooting action recognition method based on Bayesian classification to deal with the issues of poor precision and poor real-time recognition of shooting actions by existing football players. This method extracted shooting features through a Gaussian mixture model, estimated Gaussian parameters to obtain the optimal state sequence, and used Bayesian classification to achieve action recognition. The experiment outcomes showed that this method could accurately recognize shooting actions and had good real-time performance. 7 M. M. Hassan et al. proposed a football event recognition strategy grounded on object detection and robot perception for handball incidents in football matches. This method first detected two main objects, the foot and the ball, through training the model, and then determined whether it is an intentional ball through gaze recognition model. The experiment outcomes showed that the model had a high accuracy in gaze recognition for different football movements. 8 M. Amsaprabhaa proposed a hybrid optimization multi-modal spatiotemporal feature fusion model based on skeleton information. This model extracted features through multi-channel one-dimensional convolutional neural networks and two-dimensional convolutional neural networks, and used a connected fusion method to input the features into a dual gate recursive unit model for temporal feature extraction. Finally, the bald eagle search optimizer was used to optimize the network weights. The experiment was conducted on multiple football video datasets, with accuracies of 0.9813, 0.9506, and 0.9733, respectively. 9 In college football, coaches can utilize data analysis to assess the conditions of players and opponents, in order to develop corresponding tactics. However, currently most of the data requires manual recording and statistics on-site or after the game, which inevitably leads to omissions and other issues during the process. To address this issue, C. Yang et al. proposed a soccer motion recognition approach grounded on spatiotemporal graph convolution, which combined machine vision and action recognition techniques to extract the joint movements of soccer players through pose estimation and obtained motion recognition results. The outcomes showed that the recognition accuracy reached 98%, which was nearly 5% higher than existing methods. 10
D. Weber et al. proposed a neural network-based inertial attitude estimation method that does not require specific parameter adjustments and can perform attitude estimation in real-time and efficiently, suitable for different motion dynamics, environments, and sampling rates. The results indicated that this method had better generalization ability compared to existing methods under various motion conditions. Even with individual tuning on each test dataset, this method still did not perform as well. 11 M. Shalaby et al. proposed a three-dimensional relative position estimation method grounded on ranging data, which aimed to solve the localization problem caused by insufficient observability in multi-agent systems. By using technologies such as ultra wideband radio and attitude angle algorithm to obtain low-cost distance measurement data between intelligent agents, and combining accelerometers, gyroscopes, and magnetometers, a sufficient observability condition was ultimately proposed. The simulation outcomes indicated that the positioning accuracy of this method was high. 12 In traditional football training motion recognition methods, there are problems such as insufficient data collection and poor accuracy in motion capture recognition. To address this issue, S. Wang designed an SVM-based training action recognition method and constructed a machine learning algorithm model. The experiment outcomes showed that the model could achieve a recognition accuracy of 90%. 13 S. U. Yunas and K. B. Ozanyan proposed a deep learning-based multi-modal fusion method to improve gait classification accuracy from lower limb wearable sensors. This method combined data from ground reaction force sensors and mobile inertial sensors to automatically extract features of gait activity. Through the integration of spatiotemporal information sourced from two distinct sensor types via deep learning networks, the issue of spatiotemporal accuracy degradation inherent to each individual modality was effectively addressed, and a notable enhancement in classification performance was achieved. The experiment outcomes validated the effectiveness of the model. 14
In summary, existing football action recognition methods have made certain progress in specific fields, such as achieving efficient recognition of shooting, handball events, and complex actions through techniques such as Bayesian classification, Gaussian mixture models, deep learning, and multi-modal fusion. However, there are still limitations such as low recognition accuracy and long recognition time in complex scenarios. In response to these issues, an improved SVM algorithm based on network wearable sensors and attitude angle calculation is innovatively proposed, aiming to combine multi-modal sensor data and dynamic feature extraction technology to construct an efficient and real-time football action recognition model, providing a new solution for football action recognition tasks.
Football action recognition based on network wearable sensors and improved SVM
Aiming at the problems of insufficient multi-modal data fusion and low accuracy of action classification in existing football action recognition, this study proposes an improved SVM algorithm that combines wearable sensors and attitude angle calculation from two aspects: attitude angle calculation and feature classification recognition. Firstly, the SVM algorithm is improved by combining attitude angle calculation and PCA. Secondly, a final football action recognition model is constructed by combining network wearable sensor technology with improved SVM algorithm.
Improved SVM algorithm design combined with attitude angle calculation
With the popularization of smart wearable devices and the development of sports science, football action recognition has become an important research field in sports analysis and training. Traditional football analysis relies on professional coaches and manual observation, which has limitations such as strong subjectivity and low efficiency. In recent years, with the help of wearable sensor technology and AI algorithms, athletes’ movements can be monitored and analyzed in real-time and with high precision, greatly promoting the development of sports science, sports training, and competition analysis. Wearable sensors can accurately capture athletes’ body posture changes, movement speed, angles, and other data, providing detailed information on their sports behavior. After appropriate algorithm processing and classification, these data can provide objective feedback to coaches and athletes, helping to improve sports performance and reduce injury risks. Football action recognition refers to identifying and classifying different actions by analyzing the posture changes of athletes during matches or training. At present, football action recognition mainly relies on data collected by wearable sensors and infers specific action types by calculating and solving the angle changes of various parts of the athlete’s body.15,16 The common football postures and movements are shown in Figure 1. Action diagram of football posture.
Figure 1 shows the common touch points and action types in football, including various touch methods such as medial foot touch, lateral foot touch, and dorsum foot touch, as well as action types such as inward and outward pitch, and lateral step. These postures comprehensively reflect the common forms of movement of football players during training and competition. Attitude angle calculation is a method of extracting object attitude changes from sensor data. This method can measure the acceleration and angular velocity of an object through sensors such as accelerometers and gyroscopes, and then calculate the relative angles of each joint or part of the object using a kinematic model.17–19 In attitude angle calculation, the transformation form of attitude angle is shown in Figure 2. The attitude angle rotation process diagram.
Figure 2 shows three commonly used angles in attitude angle calculation, namely, pitch angle, roll angle, and yaw angle, all of which are used to describe the rotational state of an object in three-dimensional space. In football recognition tasks, attitude angle calculation can be combined with acceleration and angular velocity data collected by sensors to calculate the changes in these attitude angles and reconstruct the motion trajectory of the object. The attitude angle calculation is derived from the static gravity-based projection principle of triaxial accelerometers. Under the assumption that the system is in a quasi-static state, the only acceleration detected by the sensor is due to gravity. The measured acceleration components along the three axes can then be used to estimate the tilt of the device with respect to the gravitational vector. The initial attitude angle calculation formula in attitude angle calculation is shown in equation (1).20–22 Flow chart of PCA.
The PCA data dimensionality reduction process in Figure 3 is mainly divided into five steps, namely, data standardization, calculation of covariance matrix, feature decomposition, selection of principal components, and feature mapping. Firstly, the feature matrix is standardized to eliminate the influence of dimensionality and numerical size. Secondly, the covariance matrix is calculated to measure the correlation between features. Then, the covariance matrix is subjected to eigenvalue decomposition, and the principal component directions and their corresponding importance are extracted. Next, the principal components whose cumulative variance contribution rate reaches the threshold are selected to complete feature screening. Finally, the data is projected onto the principal component direction to generate a dimensionality reduced feature matrix. The dimensionality reduction process is expressed using a data formula, as shown in equation (6).
25
Schematic diagram of SVM optimal hyperplane.
In Figure 4, SVM continuously searches for an optimal hyperplane to separate samples of different categories. By maximizing the gap between the hyperplane and support vectors, the robustness and accuracy of the classification model can be further improved. The mathematical model for football action classification using SVM is shown in equation (7). Flow chart of the AAF-SVOA.
In Figure 5, AAF-SVOA first uses data fusion between accelerometers and gyroscopes to calculate the attitude angle of football movements in real time, extract key features such as pitch angle, roll angle, and yaw angle, and construct a feature matrix to describe the dynamic changes of the movements. Next, the PCA method is used to reduce the dimensionality of the feature matrix, remove redundant information, and extract the main features. Subsequently, based on the reduced dimensional feature data, an SVM classification model is constructed. By introducing optimization strategies such as Gaussian kernel and category weights, the accuracy and robustness of the classification are improved. Finally, using dynamic confidence evaluation to output the classification results of each football action, the football action recognition task is completed.
Construction of a football action recognition model integrating wearable sensors with improved SVM algorithm
Traditional wearable sensor devices rely on a single sensor node to collect data, which results in limited data collection range, short transmission distance, and difficulty in real-time fusion of multi-sensor data, making it difficult to meet the needs of complex dynamic action recognition. Especially in the field of football action recognition, athletes’ movements usually have rapid changes and high-frequency dynamic characteristics. Traditional sensor devices have insufficient performance in sampling rate, transmission rate, and data synchronization, which can easily lead to a decrease in action recognition accuracy.
28
In addition, most existing sensor devices can only capture data in a single dimension, such as acceleration or angular velocity. When fusing multi-dimensional motion information, the recognition effect is often affected by inconsistent data or processing delays. The current sensor technology is mainly applied in football recognition for training and competition scenarios, capturing dynamic information of athletes through wearable devices, and analyzing motion features and providing motion feedback. However, these devices typically rely on offline computing, have poor real-time performance, and have limited connectivity and data sharing capabilities when used by multiple athletes simultaneously.29,30 In response to these issues, the combination of IoT technology and wearable sensors provides a new solution for soccer motion recognition. Through the construction of networked sensor devices, it becomes feasible to not only enhance the efficiency of data transmission but also accomplish the real-time collection and fusion of multi-dimensional data, thereby enabling more comprehensive and accurate data-driven decision-making processes. The composition structure of network wearable sensor devices combined with the IoT is shown in Figure 6. Structure of the network wearable sensor device.
As shown in Figure 6, the structure of wearable sensor devices combined with the IoT consists of multiple modules working together, ultimately forming a complete intelligent football data acquisition and sensing system. Wireless wearable devices are the core data acquisition unit of the system, including multi-modal sensors integrated with accelerometers, gyroscopes, and magnetometers, used to capture three-dimensional dynamic information of athletes. In this module, the device sends real-time data to the gateway device through a low-power wireless communication module, such as Wi-Fi or Bluetooth. The data transmission equipment is composed of edge computing equipment or data gateway, which is responsible for collecting the data of multiple sensor nodes for preliminary processing, such as denoising, synchronization, and compression. The processed data is transmitted to the cloud platform through wireless networks to ensure real-time and stable data flow. 31 The cloud platform is the core computing and storage device of the system, mainly responsible for high-performance computing tasks. Meanwhile, the cloud platform also stores historical data for long-term action analysis and pattern mining. User interaction devices can provide intuitive interfaces for displaying action classification results and real-time feedback data. In this section, users can view the recognized action types and accuracy in real-time through mobile applications, tablets, or computer interfaces.
Through the collaborative work of the aforementioned network devices and sensors, a deep integration of IoT and wearable sensors can be achieved, forming a complete closed-loop system for collecting, transmitting, processing, and displaying football motion data. However, relying solely on data collection and transmission modules is not enough to complete action recognition tasks.
32
It is necessary to further combine the collected multi-modal data with optimized classification algorithms using the AAF-SVOA model to build an efficient and intelligent football action recognition platform. The final football action recognition model framework is shown in Figure 7. Flowchart of football action recognition.
The overall process of the football action recognition and classification system is shown in Figure 7, and the platform constructs a complete process from data collection to action classification through five modules. Firstly, the data acquisition module collects initial data and action labels from wireless wearable devices. Real time capture of athletes’ dynamic information through sensors, including acceleration, angular velocity, and magnetic data, forming a complete raw input. Secondly, the data preprocessing module processes the obtained data, including denoising, synchronization, and normalization operations, to ensure data quality. Then, key features such as pitch angle, roll angle, and yaw angle are extracted through the attitude angle calculation model, and the main features are further extracted using PCA method to provide efficient input for subsequent classification. Next, the data partitioning module divides the preprocessed data into a training set and a testing set. The training set is used for model construction, while the testing set is used to evaluate the model’s generalization performance. Subsequently, the model operation module utilizes the AAF-SVOA classification model to process the input data. This module improves the classification accuracy of actions by running SVM algorithm and combining improved feature weights and dynamic confidence evaluation model. Finally, the classifier module outputs the classification results and associates the action labels with the corresponding data for storage. Meanwhile, the recognition results are displayed in real-time through an interactive interface, providing users with intuitive action classification feedback.
In the data preprocessing module, the data from accelerometers, gyroscopes, and magnetometers are integrated through weighted averaging to ensure effective fusion of multi-modal sensor data. The data fusion formula is shown in equation (14).
Results
To confirm the capability of AAF-SVOA, an experimental environment suitable for football sports scenarios was first established, and the collected football action data were preprocessed, with a portion of the data used for model training. Secondly, the performance of the model in terms of action classification accuracy, stability, and real-time performance was comprehensively tested through comparative experiments, error analysis, and multi-index evaluation. Subsequently, simulation tests were conducted in actual football game scenarios to verify the applicability and application effectiveness of the model in real sports environments.
AAF-SVOA benchmark performance test
Experimental environment and equipment parameters table.
Table 1 presents the environmental configuration and model parameters for this experiment. To evaluate the performance of the AAF-SVOA model, SVM, an improved Kernel-SVM (Kernel-SVM) combined with nonlinear kernel functions, and an efficient neural network model-B3 (EfficientNet-B3) were selected for comparative experiments. Firstly, the Mean Average Precision (mAP) of the four algorithms on the training and testing sets was tested, as shown in Figure 8. The mAP values of the four algorithms. (a) Training set. (b) Test set.
Figure 8(a) and (b) show the mPA values of SVM, Kernel-SVM, EfficientNet-B3, and AAF-SVOA in the training and testing sets, respectively. From the results of the training set in Figure 8(a), as the number of iterations increased, the mAP value of AAF-SVOA significantly improved and reached 0.94 after 250 iterations, which was better than Kernel-SVM’s 0.89, EfficientNet-B3’s 0.82, and SVM’s 0.77. The results in Figure 8(b) of the test set showed that AAF-SVOA performed the best among all algorithms, with an mAP value of 0.97, while Kernel-SVM, EfficientNet-B3, and SVM were 0.91, 0.80, and 0.78, respectively. The experiment results showed that the AAF-SVOA method outperformed other compared algorithms in terms of accuracy and convergence speed, verifying its significant advantage in improving recognition performance. Next, the loss values of the four algorithms on two datasets were tested, as shown in Figure 9. Loss curves of SVM, Kernel-SVM, EfficientNet-B3, and AAF-SVOA models on training and testing datasets over iterative optimization. (a) Loss curves of four algorithms in training set. (b) Loss curves of four algorithms in the test set.
To evaluate the loss trend of the four algorithms, the loss values of SVM, Kernel-SVM, EfficientNet-B3, and AAF-SVOA were compared on the training and testing sets as a function of iteration times. According to Figure 9(a), the loss value of AAF-SVOA decreased the fastest in the initial stage and stabilized at 0.04 after 40 iterations. In contrast, the loss value of Kernel-SVM was 0.10 after stabilization, and EfficientNet-B3 was about 0.08, while SVM had the highest loss value and ultimately stabilized at 0.15. As shown in Figure 9(b), AAF-SVOA also exhibited the best convergence performance, with its loss value decreasing to 0.02 after 52 iterations. The experimental results in Figure 9 indicated that AAF-SVOA can quickly converge and achieve the lowest loss value on both the training and testing sets, outperforming other compared algorithms. The recognition errors of four algorithms were compared, as shown in Figure 10. Action recognition error fluctuation results for the four algorithms. (a) Training set. (b) Test set.
Benchmark performance of the different algorithms.
According to Table 2, AAF-SVOA had significant advantages in classification performance, with a classification accuracy of 93.8%, which was the best among all algorithms. The recall rate and F1 value were 92.5% and 0.92, respectively, indicating that the algorithm performed well in accuracy and category balance. In contrast, the classification performance of EfficientNet-B3 was slightly inferior. Although its accuracy and F1 value were high, at 91.5% and 0.89, respectively, its inference time was longer and its memory usage was as high as 450 MB, indicating a high demand for hardware resources. Traditional SVM and Kernel-SVM had limitations in classification performance. Although SVM had the fastest inference time at only 5.6 ms/sample, its classification accuracy and recall were both below 85%. Kernel-SVM improved in performance, with accuracy and F1 value reaching 85.9% and 0.84, respectively, but inference time significantly increased to 12.8 m/sample. Overall, AAF-SVOA maintained high classification accuracy while balancing real-time performance and rational utilization of memory resources, demonstrating more comprehensive advantages.
To evaluate the stability of the proposed model under dynamic motion changes in football scenarios, the study conducted a temporal consistency analysis using continuous motion sequences such as dribbling, sudden directional changes, sprint-then-stop transitions, and jump-heading combinations. The AAF-SVOA model was tested on sequences composed of multiple chained actions with varying dynamics. A sliding window-based frame-wise prediction consistency check was implemented to detect output fluctuations. The results showed that AAF-SVOA maintained over 92.4% label continuity across frames, with misclassification bursts shorter than three consecutive frames, which were effectively corrected by confidence smoothing. In contrast, Kernel-SVM exhibited more instability during high-motion transitions, especially during tackle and turn-back actions, with label fluctuation rates exceeding 11.6%. These results indicated that AAF-SVOA could track dynamic changes in motion patterns with high temporal stability, thanks to the fusion of high-frequency inertial features and dynamic confidence evaluation. The system avoided action jitter and transient mislabeling that often occur in edge-based classification systems without temporal integration.
Analysis of the application effect of football action recognition model based on AAF-SVOA
Performance of the four models to identify different soccer movements.
According to Table 3, AAF-SVOA performed the best in all action categories. The accuracy rates in recognizing passing, tackling, and dribbling actions reached 94.1%, 93.0%, and 92.8%, respectively, which were significantly higher than those of other models, demonstrating excellent action recognition ability. In complex actions such as heading and dribbling, the accuracy of traditional SVM was relatively low, only 73.8% and 75.6% respectively, while AAF-SVOA increased to 91.3% and 92.8%, respectively, indicating that it was more suitable for dynamic and complex scenarios. In contrast, EfficientNet-B3 was close to AAF-SVOA in recognition accuracy, but its average recognition time generally exceeded 14 ms, which was not conducive to real-time applications. While ensuring high accuracy, AAF-SVOA maintained an excellent response speed. The average recognition time for all actions was controlled within 10 ms, such as 9.4 ms for standing actions, 9.9 ms for dribbling, and 9.6 ms for heading the ball, demonstrating superior real-time performance. The overall performance of Kernel-SVM lied between that of SVM and deep learning models. Although the accuracy improved, the reasoning time consumption was relatively long. Overall, AAF-SVOA achieved the best balance between action recognition accuracy and real-time performance, especially when dealing with high-frequency and highly complex football actions, it had obvious advantages, verifying its practical value in actual football training and match environments. The classification performance of four models in recognizing four different types of football action data was compared: standing, running, passing, and shooting, as shown in Figure 11. Action classification effects of the four models. (a) SVM. (b) Kernel-SVM. (c) EfficientNet-B3. (d) AAF-SVOA.
Figure 11 shows the performance of SVM, Kernel-SVM, EfficientNet-B3, and AAF-SVOA models in soccer action classification tasks, including the recognition distribution of four actions: standing, running, passing, and shooting. As shown in Figure 11(d), compared with other methods, the AAF-SVOA model had the best recognition and classification performance for the four types of actions, with a more compact and clearly separated distribution of each category. The classification performance of Kernel-SVM in Figure 11(b) was second to that of EfficientNet-B3 in Figure 11(c), but there was still some overlap between categories. The SVM model in Figure 11(a) had the worst classification performance, and there was significant mixing in the category distribution. AAF-SVOA exhibited higher accuracy and classification ability in action classification tasks. The recognition of actual football movements is shown in Figure 12. The four models identify the actual performance of the different football movements. (a) SVM. (b) Kernel-SVM. (c) EfficientNet-B3. (d) AAF-SVOA.
Model performance under lighting variation and multi-target interference.
In Table 4, the results showed that, compared to traditional SVM and Kernel-SVM, AAF-SVOA maintained a high classification accuracy with only a 3.2% drop under strong lighting variations, while EfficientNet-B3 dropped by 5.9%, and SVM dropped by over 8%. Moreover, the error fluctuation of AAF-SVOA remained within ±0.03 even when multiple players performed overlapping actions. These findings demonstrated that the integration of attitude angle modeling and sensor fusion provided better resilience to visual or sensor noise. Although the current model was not tested in large-scale multiplayer field matches, preliminary results confirmed its capability to sustain robust performance under dynamic and visually complex environments. Future studies will further investigate the model’s generalization in uncontrolled open-field conditions and introduce dynamic interference modeling as part of the training augmentation process. To further verify the generalization ability of the proposed AAF-SVOA model beyond the SoccerNet dataset, additional public datasets were introduced in this study, including the Football Action Dataset (FAD) and the Stanford-IMU SoccerSet. The FAD dataset contains video and inertial sensor data from semi-professional training environments, while the SoccerSet includes multi-player IMU recordings under various tactical drills. The same model configuration was applied without retraining, and the model performance was evaluated directly to assess its transferability.
Model accuracy on multiple datasets.
In practical deployment scenarios, football recognition systems were expected to operate on embedded or edge computing platforms such as NVIDIA Jetson Nano, Raspberry Pi 4, or ARM Cortex-based microcontrollers. To evaluate the feasibility of deploying AAF-SVOA in such constrained environments, a resource consumption test was conducted using Jetson Nano (4 GB RAM) as the test platform. The average memory usage was measured at 178 MB, with CPU utilization around 38% and power consumption under 7.5 W during real-time operation. These results indicated that the model could maintain efficient recognition while staying within acceptable resource limits, outperforming EfficientNet-B3, which required over 430 MB and caused thermal throttling during continuous inference. Furthermore, a long-duration stability test was performed over a 24-h cycle with continuous multi-class recognition tasks. AAF-SVOA maintained consistent output without memory leaks or cumulative delay, confirming its stability under sustained usage. Nevertheless, for long-term deployment in outdoor or mobile scenarios, factors such as sensor drift, battery degradation, and network instability may still affect performance. Future research will explore lightweight pruning strategies and hardware-specific optimization (e.g., TensorRT and quantization) to further reduce computational overhead. In addition, incorporating self-diagnostic modules and adaptive re-calibration mechanisms will help enhance the model’s reliability and robustness over extended periods.
Conclusion
Aiming at the problems of insufficient accuracy, slow recognition speed, and limited multi-modal data fusion ability in football action recognition, a recognition model combining network wearable sensors and AAF-SVOA was proposed. The benchmark performance test results of AAF-SVOA showed that it had a high mPA value. When the number of iterations reached 250, its mPA value reached 0.97, which was much higher than SVM’s 0.78, Kernel SVM’s 0.80, and EfficientNet-B3’s 0.91. In terms of loss curve, AAF-SVOA only needed 40 iterations to reach a stable state in the training set and maintain the loss value at 0.04. In terms of recognition error, the error fluctuation range of AAF-SVOA was the smallest, only ±0.02. In benchmark performance testing, the classification accuracy, recall, and F1 value of AAF-SVOA were as high as 93.8%, 92.5%, and 0.92, respectively, while the inference time for a single sample of the algorithm was as low as 10.2 ms. In practical applications, the football action recognition model combining network wearable sensors and AAF-SVOA could achieve recognition accuracies of 93.2%, 92.5%, 94.1%, and 91.7% in four types of actions: standing, running, passing, and shooting, respectively. Moreover, the average recognition time in standing actions was the lowest, only 9.4 ms. In the actual classification and recognition tasks of four types of actions, the AAF-SVOA model could also achieve better classification and recognition results. Despite significant research achievements, there are still certain limitations in applying the proposed model to real-world football competition scenarios. For instance, in high-intensity matches, the presence of sweat, physical collisions, or sensor displacements may affect the stability and accuracy of wearable devices. Real-time data transmission also faces challenges such as network latency and packet loss, especially in outdoor stadiums with high interference or limited connectivity. Moreover, when deployed in multi-player and multi-target scenarios, the current system may encounter synchronization difficulties and increased computational load, which could compromise real-time responsiveness. Therefore, future work will focus on improving the physical robustness and wearability of sensor devices, enhancing the fault-tolerant design of the data transmission module, and introducing lightweight edge computing models to support faster on-device inference. In addition, incorporating adaptive learning strategies to fine-tune recognition performance based on individual player patterns may further enhance personalization and generalization of the recognition framework.
While the proposed AAF-SVOA model demonstrated excellent recognition performance and real-time responsiveness, its algorithmic complexity and time cost under resource-constrained settings must be further evaluated. The current model includes multi-stage feature construction, PCA dimensionality reduction, and dynamic SVM optimization. Although suitable for mid-level edge devices, its inference latency on ultra-low-power microcontrollers remains a challenge. The average floating-point operations (FLOPs) per inference were estimated at 8.7 × 106, and the model size after compilation was 10.2 MB. For large-scale deployment or integration into wearable microchips, further lightweight strategies such as feature compression, kernel simplification, and fixed-point quantization should be investigated. Additionally, exploring online adaptive learning to adjust to different users’ motion styles without retraining may significantly reduce system retraining overhead. Apart from computational considerations, the gathering and handling of personal motion data, particularly when sourced from athletes and amateur users, give rise to substantial privacy and security dilemmas. These encompass the possible revelation of identity-linked patterns, the deduction of biometric information, and unauthorized behavioral profiling. To address these risks, it is crucial to enforce end-to-end encryption during data transmission (e.g., TLS/DTLS protocols), adopt secure data storage strategies (e.g., on-device federated learning or encrypted cloud storage), and anonymize identifiable attributes before any third-party processing. Moreover, user-informed consent protocols and compliance with standards such as GDPR and ISO/IEC 27001 should be embedded into system design. Future work will also explore decentralized privacy-preserving computing frameworks, including homomorphic encryption and differential privacy, to strengthen trustworthiness and compliance in real-world deployment scenarios.
