Autonomous real-time drone-to-drone visual detection by onboard hardware platform

Abstract

The proliferation of unmanned aerial vehicles (UAVs) in both civilian and military domains has intensified the need for autonomous counter-drone systems capable of operating without reliance on ground infrastructure. Existing ground-based and hybrid approaches suffer from high latency and complete failure under communication jamming or denial. This paper proposes a fully onboard (OB) architecture for autonomous drone-to-drone detection in both the visible (RGB) and thermal (IR) domains, where all perception and decision-making tasks are executed exclusively using the embedded computational resources of the unmanned aerial vehicle. In particular, this work analyzes the features of the physical components of the architecture (i.e., the Single-Board Computers or SBCs for short, the available sensors, and the UAV platforms) and their performances in the experimental settings. Various computational platforms are tested to assess their impact on the performance of the detection pipeline, evaluating specific parameters such as inference speed (fps), inference time (ms), power consumption (W) and operational autonomy. In order to enable a comprehensive evaluation, a ground-based (GB) counterpart was also implemented, where real-time video streams are transmitted from the drone to a ground station for processing and control commands are subsequently sent back. The onboard architecture offers significantly lower latency and complete independence from radio links and controllers, making it particularly suitable for applications requiring high robustness in communication-denied or contested environments. In particular, the findings highlight the advantages of the Jetson Orin Nano platform in achieving inference speeds up to 80.93 fps at 12.36 ms on YOLO v8n quantized models, overcoming state-of-the-art performances. According to our knowledge, this is the first fully onboard RGB-IR drone-to-drone visual detection architecture in the literature.

Keywords

onboard architecture drone-to-drone detection unmanned aerial vehicles (UAVs) detection ground control station real-time edge-computing autonomous flight

1. Introduction

The rapid proliferation of Unmanned Aerial Vehicles (UAVs) in both civilian and military sectors has significantly increased the risk of intrusions and accidents, highlighting the need for reliable drone-to-drone detection systems.^1,2 While ground-based processing of UAV video streams is common and easier to implement, due to its access to virtually unlimited computational resources, it suffers from high latency, single points of failure in communication links and a complete loss of functionality where communication is jammed or denied.³ Hence, the main novelty of this paper is the proposal of a full onboard architecture for real-time drone-to-drone visual detection, where all perception, detection, tracking and decision-making tasks are executed autonomously using embedded hardware on board the UAV. Indeed, the proposed system achieves a fully independent operational capability, without the need for external communications, ensuring a continuous operation even when the link with the ground station is interrupted or actively jammed.

In order to rigorously validate the performance of the onboard approach, different state-of-the-art onboard platforms for UAVs have been tested, each impacting the performance in terms of detection speed, inference time and operational autonomy. In addition, a ground-based version has been implemented for comparison, with the master drone (i.e., the in-air detector in this configuration) streaming real-time video to a ground control station (GCS) for processing. This is a configuration widely used in semi-autonomous UAV systems today. Both the onboard and ground-based systems share the same detection and tracking pipelines, to ensure a fair comparison of their respective strengths and weaknesses.

The experimental results, obtained from extensive practical tests, show that the onboard system significantly outperforms the ground-based approach in responsiveness and robustness, while keeping high performances (especially in the thermal domain) in real-time detection due to the optimized models adopted. This makes it particularly suitable for applications in contested environments.

Synopsis

This paper proposes and evaluates a fully onboard architecture for real-time drone-to-drone visual detection. All perception and decision-making tasks are performed autonomously on the UAV without relying on any ground infrastructure or real-time radio link, except for the startup phase of the UAV.

In order to rigorously quantify its benefits and limitations, the onboard solution is compared with a conventional ground-based system that offloads processing to a ground control station, an approach widely used in current systems that involve UAV navigation and interception.

The rest of the paper sections are organized as follows: Section 2 reviews the state of the art in drone-based detection systems, with particular emphasis on fully autonomous onboard approaches. Section 3 briefly presents the proposed system architecture within its main components. Section 3.1 describes the onboard physical architecture in detail, covering hardware selection and physical implementation while Section 3.2 shows the logical design of the system, presenting software pipeline, real-time processing and decision-making steps. Finally, Section 4 presents the experimental setup and results obtained from extensive tests using different onboard-SBCs and computer vision models. Beyond the implementation of a specific onboard detection platform, this work aims to distill broader architectural principles for autonomous UAV perception in communication-constrained environments. In particular, the study identifies transferable design guidelines concerning computation placement, modality integration, hardware selection, and model optimization under Size–Weight–Power (SWaP) constraints. These principles are intended to support future development of communication-independent aerial perception systems beyond the specific platform evaluated here (see Section 4.5).

The conclusive Section 5 summarizes the evidence obtained from the comparison of the different paradigms, discussing their respective strengths and weaknesses.

Therefore, this paper provides clear and actionable insights into the fundamental trade-offs between onboard and ground-based perception strategies, in the development of truly autonomous UAV systems for contested and communication-denied environments.

2. Related work

The rapid proliferation of UAVs has dramatically increased the need for reliable drone detection systems capable of operating in shared and contested airspace. In particular, drone-to-drone autonomous approaches are currently studied, since they could represent possible solutions to apply in dangerous areas where humans cannot operate. More broadly, recent research in robotics has explored the integration of learning-based techniques to improve autonomy in articulated robotic systems, including the optimization of locomotion patterns and the learning of joint motion constraints in complex mechanisms. Although these studies primarily address motion generation rather than perception tasks, they highlight the growing role of machine learning methods in enabling adaptive and intelligent robotic platforms.^4–6

2.1. UAV detection system architectures

This section reviews the state-of-the-art in airborne UAV detection, with particular emphasis on architectural choices and their implications for real-time performance, autonomy and deployment in communication-denied environments.

2.1.1. Ground-based

This kind of counter-UAV and air-traffic awareness systems (e.g., Dedrone,⁷ Robin Radar,⁸ Aaronia AARTOS⁹) rely on Radar, Radio Frequency (RF) and Camera-based networks installed on towers or fixed sites. They trade range and accuracy for high cost, limited coverage in remote/dynamic scenarios and inability to protect the detecting UAV itself when operating far from the protected perimeter.¹⁰ The latter is merely a flying camera, transmitting compressed video (usually H.264/H.265 over 4G/5G or dedicated RF links) to a ground control station (GCS) equipped with high-end GPUs.^11,12 Detection and tracking are performed on the GCS and commands are sent back via MAVLink or similar protocols, enabling the use of YOLOv8x,¹³ Faster R-CNN, etc., and achieving mAP@0.5 above 93%. However, they also introduce latencies of 200–800 ms and completely fail when the communication link is lost or jammed.^14,15

2.1.2. Hybrid and dual-stage

Computation is split between the drone and the ground station. Ntousis et al.¹⁴ proposed a dual-stage pipeline: lightweight detection onboard (YOLOv5n) for low-latency candidate generation, followed by heavy re-identification and tracking on the ground server when bandwidth permits. Similar ideas appear in Kwon and Lee,¹⁶ Liu et al.¹⁷ These systems still collapse to zero capability when the link is broken, and add architectural complexity.

2.1.3. Fully onboard

Recently, embedded neural processors and lightweight computer vision models^18,19 favored processing onto the UAV itself. Early works focused on general obstacle avoidance rather than drone-to-drone detection.^20–22 Vrba et al.²³ and subsequent works from the CTU Prague MRS group²⁴ shifted to LiDAR-based flying-object detection on the Intel Up Squared board with an Ouster OS1 sensor. Despite being robust at long ranges, these solutions added weight, power draw and cost, making them impractical for small-to-medium commercial platforms.

Chen et al.²⁵ demonstrated real-time object detection on a Snapdragon 855 platform with embedded NPU, achieving 28 fps with MobileNetV2-SSD, but only on static or slowly moving objects. Vision-only onboard systems using Jetson Nano appeared in recent studies.^26–28 Typical performance ranged between 8–25 fps with mAP@0.5 of 0.64–0.79 and power consumption of 10–25 W, at the expense of reduced flight time or reliance on ROS2-to-ground telemetry for decision-making, giving up with communication-independency.

Finally, some new research involves the integration of both audio and video sensors for improving UAV detection, localization and tracking in open environments.²⁹ Specifically, the cited work relies on audio detection of the target for a preliminary estimation of the direction of arrival, in order to command a rotation of the drone/camera towards the target (which is then detected by a CNN model), in order to enhance prediction confidence thanks to complementary information given by audio and video.

To the best of our knowledge, no prior work has demonstrated a fully onboard drone-to-drone detection system combining both RGB and IR modalities without any ground link dependency.

2.2. The importance of domain-specific datasets

The broader literature suggests that model initialization and adaptation strategies should not be selected solely on the basis of benchmark popularity. Evidence from adjacent computer vision domains shows that domain-specific transfer learning can outperform standard ImageNet-based pretraining when visual characteristics differ substantially from natural-image distributions. This observation is particularly relevant for aerial RGB–IR detection, where operational imagery often diverges from canonical datasets and may benefit from specialized fine-tuning protocols.^30,31 This is the main reason behind the choice of the dataset used in this work (see Section 4.1).

3. Proposed drone-to-drone detection system

The proposed system architecture integrates several key components, which are highlighted in Figure 1. In particular, the flow of the information starts from the UAV platform itself. It integrates a plug board to which it is possible to connect the main onboard controller (the Linux Single Board Computer) through the power connector and the two USB ports. Power is supplied to the whole system through the main UAV batteries via a dedicated DC-DC converter, ensuring stable operation even during high-thrust maneuvers. On the other hand, data exchange is made possible thanks to the two USB peripherals. One is exclusively dedicated to the information sharing between software development kit (SDK) and the drone, due to a USB-to-TTL (Transistor-Transistor Logic) adapter. The other USB cable is exploited for managing advanced sensors (such as RGB-IR multispectral cameras) and to capture live videos and photos. These can then be saved on SBC’s memory or shown directly in the SBC’s operating system if it also integrates a graphical user interface (GUI).

Figure 1.

Overall proposed architecture describing how autonomous system interacts in external environment.

3.1. The physical architecture

This section presents the fully onboard drone-to-drone detection architecture, also visible in Figure 2. The system is specifically designed to operate autonomously in communication-denied or contested environments, eliminating any real-time dependency on ground infrastructure for perception, detection, tracking or decision-making. All processing stages, from sensor data acquisition to adversary drone detection, are executed exclusively on embedded hardware carried by the UAV, ensuring continuous functionality even under active RF jamming or beyond-radio-line-of-sight conditions.

Figure 2.

Physical architecture of the system, in which the main components are highlighted and summarized, together with their connections.

The proposed solution has been implemented and evaluated on a robust and industrial-grade M200 DJI platform. Its modular payload interface and high payload capacity (more than 2 kg) make it an ideal testbed for real-world deployment of autonomous onboard perception systems on UAV platforms.

The architecture is deliberately sensor-agnostic, in order to maximize flexibility and applicability across operational scenarios. It supports seamless integration with a variety of payloads, including the ones cited before. However, our selection has been focused on multimodal optical sensors, such as the one presented in Section 3.1.1. In particular, the choice for a single and proprietary camera model stems from the high cost and limited flexibility of mounting multiple modules on the same UAV. Nevertheless, our dataset³² used for models training includes images captured by a variety of sensors, ensuring a broader and more generalized data representation. In particular it comprises over 20,000 RGB and IR image pairs collected across 20 distinct outdoor scenarios, with an 80/20 train/test split.

Moreover, the NVIDIA Jetson Orin Nano has been selected as the main onboard processing unit, instead of the Raspberry Pi 4B and Pi 5, due to its significantly higher computational performance while maintaining a similar form factor. As it will be seen in Section 4, Jetson Orin Nano provides onboard computing capabilities comparable to those of a ground station, making it a powerful solution for onboard real-time processing. The differences in computational performance and power consumption are quantitatively assessed in the same Section.

In addition, the modular design ensures that the same software stack can be deployed on both SBCs with minimal reconfiguration, facilitating comparative studies and technology transfer to lower-SWaP platforms. (SWaP stands for Size Weight and Power, a standard acronym used in aerospace engineering.)

3.1.1. Visual sensors

Modern UAV sensors could be broadly classified according to their hardware architecture and software ecosystem. A fundamental distinction exists between open-source payload sensors and closed-source (proprietary) payload sensors, each offering specific advantages and limitations depending on the intended application and use case. In the following section, we briefly introduce some popular ones, with respect to the goal of drone-to-drone detection.

Closed Source:

–
DJI: optical sensors such as DJI Zenmuse XT2 and DJI Zenmuse Z30 are considered as closed source since they have been developed for giving optimal performances in specific working environments. In addition, it is quite complicated to develop and integrate software on these cameras, as DJI removes support after a few years from their products first appearance in the market.
–
YUNEEC: standard RGB cameras, like E30Z/X (30 $\times$ optical zoom camera with Sony sensor for Yuneec H520/H520e) and E90/X (20mp camera with sony sensor for Yuneec H520/H520e) modules with large zoom quantities, but also hybrid modules that integrate both visible and infrared capabilities, such as E10T/X (dual thermal and rgb camera with hi-res Flir sensor) and CGOET/X (dual thermal and rgb camera with FLIR sensor) sensors.
–
FLIR: the most relevant unmanned payloads are the Teledyne FLIR Vue TV128 (dual thermal and visible camera payload) and the EO/IR MK-II (multi-sensor imaging payload).

Open Source: –
SIYI: open gimbal payloads such as the SIYI A8 mini (3-axis gimbal camera) and A2 mini (ultra wide angle FPV gimbal camera), which permit fluent movements allowed by a dedicated gimbal structure and they are also programmable for every necessity thanks to their open source ecosystem.
–
Pi Camera: one of the best known cameras in the market, due to its low cost and compatibility with many boards. However, due to its limited size, it does not guarantee high resolutions.

Furthermore, additional advanced payload modules enable drone detection through alternative sensing techniques, such as multi-channel image analysis. These include multispectral cameras, such as the MicaSense RedEdge-MX and Altum-PT, which support combined visible and near-infrared (VNIR) sensing. Such capabilities are particularly effective in cluttered or densely vegetated environments, where single-band optical sensors may suffer from reduced detection performance.

Beyond multispectral systems there also exist hyperspectral cameras, such as those produced by RESONON. These sensors provide even finer spectral resolution by capturing narrow spectral bands across a continuous wavelength range. In addition, they allow detailed material discrimination and advanced classification techniques. However, Resonon hyperspectral payloads cannot be considered fully open source. While the manufacturer provides an internal software development kit (SDK) to access data and control the modules, the underlying hardware design, firmware and schematics are proprietary and not publicly available. Finally, all sensors have been summarized in Table 3 of Appendix A, in order to complete the comparison.
3.1.2. Onboard hardware comparison

In order to develop an onboard architecture, it is also essential to evaluate and select the appropriate hardware components. In particular, main boards for data processing are a crucial element for developing an autonomous system, as they must offer good computational capabilities embedded into a small and energy saving form factor.

In particular, for UAV onboard computing, ARM-based SBCs are relatively low power draw compared to x86 systems, making them suitable for battery-constrained platforms, while offering adequate computing resources for control and perception tasks. Moreover, they feature a rich I/O interface for sensors (USB 3.0, CSI camera interface, GPIO, and networking simplifying integration with cameras, IMUs, GPS, and radios) with an extensive community, ROS support, and libraries for robotics and UAV development.

Dually, x86 systems provide significantly higher CPU throughput for tasks such as mapping, planning, or multi-sensor processing, still featuring broad compatibility. Finally, NVIDIA- platforms open the possibility of dedicated energy-efficient edge AI acceleration, balancing performance with UAV battery limits.

Thus, exploring the market, the single board computers (SBCs) reported in Table 4 in Appendix A have been identified as the main and most relevant solutions feasible for onboard systems. Although some of them have already been tested in previous research,²⁸ our comparison has been expanded over more and diverse architectures, while focusing on specific parameters for comparing the boards.

Raspberry Pi 4B+: this represents a cost-effective and low-power solution suitable for lightweight and optimized computer vision models (mostly nano and darknet versions of YOLO³³), which has been our first choice.

Raspberry Pi 5: an evolution of the previous board more suitable for AI tasks. The 8GB RAM version has been selected instead of the 16GB one, since models run mostly over GPU and CPU, without loading too much over RAM. In addition, it would have been possible to better compare the results obtained from tests.

Intel NUC 7: it is a small form-factor X64 PC built on an Intel APU, which guarantees higher performances but lacks autonomy and power saving features.

NVIDIA Jetson Orin Nano: it is usually selected as the primary platform due to its significantly higher performances, thanks to its CUDA-capable graphic module. For this reason, the board supports advanced tracking and prediction algorithms, with a comparable power consumption with respect to Raspberry Pi 5 (10w to 15w under load).

Although these solutions are quite promising in modern days for developing an autonomous system, many competitors still rely on ground based systems for drone monitoring, which often present very powerful stations for running heavy AI models. Most of them feature NVIDIA RTX GPUs, which are specifically designed with large and fast memories. An example is our X64 Dell workstation used in the experimental phase of the research, which features an Intel i9-11900 processor and an RTX 3090 GPU, which overcome in performance the other systems previously presented, while sacrificing consumption, form factor and portability.

3.2. The logical architecture

Figure 3 illustrates the end-to-end logical architecture of the proposed onboard processing pipeline. All stages delimited by the green area are executed entirely on the UAV, eliminating the need for any ground-based computation or communication links.

Figure 3.

Logical architecture and real-time processing pipeline. Once the UAV startup command is given by the user, the whole system operates autonomously booting up the OS and initiating the drone detection pipeline. However, due to EASA regulations the pilot remains ready to intervene in case of need.

Figure 4.

Comparison between raw frames (before) and aligned frames (after).

Furthermore, in order to clearly illustrate the whole process, the individual steps are described in detail in the subsequent paragraphs.

3.2.1. Startup and bootup

The pipeline begins after UAV startup by the operator. In this phase the onboard camera is also automatically calibrated. During take-off, the SBC’s operating system boots up enabling connections between its ports and UAV control unit. Only after this step, camera stream and video acquisition start through the UAV software development kit (SDK).

3.2.2. Stream acquisition and alignment

At this point the SBC is able to acquire the video stream and process each single frame through its internal computational unit, which enables GPU-accelerated processing. The live video feed is then captured by the software pipeline, which has also to align the RGB and infrared streams. Alignment operation is crucial and it depends on sensor features. In particular, it is based on the following steps:

the focal ratio is calculated starting from the focal lengths $f$ of the two cameras. It determines the difference in dimensions between the two frames and it is necessary for performing a proper scaling.

focal_ratio = \frac{f_{IR}}{f_{RGB}}

(1)

focal ratio should then be used to obtain the scaling factors $s$ on both $x$ and $y$ axis. Furthermore, the $θ$ angles that represent the FoV of the two cameras should be related as a division.

\begin{aligned} s_{x} & = focal_ratio \times \frac{θ_{RGB, x}}{θ_{IR, x}} \\ s_{y} & = focal_ratio \times \frac{θ_{RGB, y}}{θ_{IR, y}} \end{aligned}

(2)

after the previous step, the RGB image is rescaled with respect to the IR one using bicubic interpolation. $W^{'}$ and $H^{'}$ represent the new dimensions of the RGB image.

\begin{aligned} W_{RGB}^{'} & = W_{RGB} \times s_{x} \\ H_{RGB}^{'} & = H_{RGB} \times s_{y} \end{aligned}

(3)

since cameras focuses are physically separated, the distance between them (baseline distance) must be taken into account and converted from millimeters into a certain number of pixels. For this, the pixel pitch $p$ of the IR image is crucial, which defines the size of a single pixel in the IR frame. The following equation defines the number of pixels that must be added to the RGB image, in order to align it to the IR one.

pixel_shift = \frac{d_{baseline}}{p_{IR} \times 0.001}

(4)

before cropping the RGB image with respect to the IR one, the position where to crop should be computed using the following method.

\begin{aligned} x_{start} & = max (0, x_{RGB,center} - x_{IR,center}) \\ y_{start} & = max (0, y_{RGB,center} - y_{IR,center}) \end{aligned}

(5)

finally, the RGB frame is rescaled to the same resolution of the IR one in order to have the two frames aligned (Figure 4).

3.2.3. Detection and prediction

Finally, Ultralytics YOLO detector performs inference on each input image received.

Post-detection, non-maximum suppression and tracking modules refine the results, while Kalman filters predict object trajectories. In particular, these estimate the new state in which the UAV will be after a movement considering the system and its uncertainty. More details on the process, including Kalman Filters equations, can be explored further in Section D of the Appendix.

3.3. Technical implementation

The onboard software environment was established on an NVIDIA Jetson Orin Nano, selected as the embedded computer for the UAV platform, running a dedicated version of Ubuntu. All integration steps between software and UAV sensors closely followed the official UAV SDK documentation, which is quite standard and can be applied to many other drone platforms in the market.

In order to capture video from the drone camera, the legacy FFMPEG 2.8.15 library was chosen. Since it was no longer available in standard Debian repositories but still compatible and stable for our purposes, it was compiled from source.

Moreover, UART port access was granted by adding the main user of the SBC to the dialout group and by creating the necessary udev rule inside the path /etc/udev/rules.d/. The serial interface was subsequently enabled while keeping Bluetooth disabled, since it might occupy the communication bus reserved to UART.

The advanced sensing feature of our UAV SDK provides direct low-latency access to live video streams from both the FPV camera and the main payload camera via the USB-Serial link, bypassing the delays inherent in streaming through the UAV Radio Controller. Therefore, delay while capturing live video feeds has to be taken into account when building a new autonomous system from scratch.

Finally, particular attention was given to the optimization of drone detection models, in order to reach the highest performance possible on the board. Starting from PyTorch standard models saved in pt format, they have been exported in lighter and optimized versions, according to the hardware on which they were supposed to run. Ultralytics framework offers a specific function to perform this conversion:

Standard ONNX format is suitable for CPU-based computations on SBCs that do not integrate dedicated graphics processing units, whereas the Engine format enables GPU-accelerated execution using NVIDIA TensorRT libraries on compatible boards. Although PyTorch natively supports GPU inference through the model.to("cuda") function, converting models to these formats allows to reach state-of-the-art performances.

4. Experimental results

This section reports the experimental evaluation of all the embedded boards, focusing on evaluation procedure, system integration, power consumption, computational performance and detection accuracy.

4.1. Evaluation procedure and setup

First of all, several Ultralytics YOLO models have been trained using our own multimodal dataset for drone detection,³² which is available for public download together with the models tested here. These models have then been adopted for measuring the performance of the boards on UAV detection.

In order to evaluate the differences between the platforms, several tests have been conducted running all our pretrained models and performing inference on a 3-min MPEG4-compressed H264 30 FPS video, with a resolution of 640x512 pixels for a total flow of 5,400 frames. The test video may or may not contain a target drone (in our case a DJI Mavic Mini 4K), in order to verify the behavior of the models when the target UAV leaves the frame.

It is important to clarify that this benchmarking phase and the real onboard deployment correspond to two different but complementary stages of the evaluation. For the comparison of inference speed, latency and power consumption across five highly heterogeneous hardware platforms, the input stimulus had to be strictly identical; therefore, all boards were tested on the same prerecorded video stream. Running independent live flights for each board would have introduced uncontrolled environmental variability, such as different target trajectories, wind-induced payload oscillations and lighting changes, thus compromising the fairness of the comparison. At the same time, as described in Sections 3.1 and 3.2, the full system is effectively deployed onboard the DJI Matrice 210, powered directly by the UAV batteries and processing the live stream in real time through the DJI SDK.

The range of pixel resolutions of the target has also been calculated to have an additional reference of the actual pixel dimensions relative to the distance in meters between the target UAV and the camera. The following formulas were used to compute the real-world dimensions in pixels for different distances:

\begin{aligned} Width = 2 \times Distance \times \tan (\frac{ϕ_{horizontal}}{2}) \end{aligned}

(6)

\begin{aligned} Height = 2 \times Distance \times \tan (\frac{ϕ_{vertical}}{2}) \end{aligned}

(7)

\begin{aligned} {Spatial Resolution}_{x} = \frac{Width}{{Frame}_{x}} \end{aligned}

(8)

\begin{aligned} {Spatial Resolution}_{y} = \frac{Height}{{Frame}_{y}} \end{aligned}

(9)

\begin{aligned} Pixel Length = \frac{UAV Length}{{Spatial Resolution}_{x}} \end{aligned}

(10)

\begin{aligned} Pixel Height = \frac{UAV Height}{{Spatial Resolution}_{y}} \end{aligned}

(11)

Moreover, the pixel resolution of the target UAV at different distances (5, 25, 60 m) was calculated. The UAV’s real-world dimensions are

28.9

cm in length and

5.6

cm in height. The table showing the calculated results can be found in Section C of the Appendix.

As measurements are derived from a single full-pass evaluation over the 5,400-frame test sequence, run-to-run variance is not separately quantified; this is acknowledged as a limitation of the current evaluation protocol.

4.2. System integration and power consumption

Initial tests were conducted using a Raspberry Pi 4B as the main onboard SBC. However, due to its limited processing capabilities, the system achieved only 2–3 frames per second, highlighting the need for more powerful hardware in real-time applications. Therefore, other boards and a whole GCS have been tested, in order to have a complete setup for comparisons.

A critical aspect of the pipeline concerns frame alignment. In fact, our camera integrates two separate optical channels (an RGB sensor and an infrared sensor) which are physically displaced and operate at different zoom levels. As shown in Figure 2, the lenses are not co-aligned, requiring real-time geometric alignment of the RGB and IR frames. This step introduces a significant computational overhead, further reducing the effective throughput of the system.

Another important consideration is the additional power demand introduced by the detection pipeline. The Python script must simultaneously manage the real-time video stream provided by the UAV SDK, perform frame alignment between RGB and IR channels and execute object detection across both spectra. This workload results in a measurable increase in current consumption. For example, while idle operation requires approximately 0.59A with Raspberry Pi 4B, the full detection pipeline increases consumption to 1.09A, confirming the impact of onboard processing on energy efficiency.

Performances and power consumption differences across all the tested systems are also linked to the underlying hardware architectures. Intel NUC 7 and the workstation employed in our tests run on X64, a type of architecture following the CISC (Complex Instruction Set Computing) paradigm, which generally leads to higher power consumption. On the other hand, Raspberry Pi boards and NVIDIA Jetson Orin Nano work with ARM, which is a RISC (Reduced Instruction Set Computing) architecture, simpler and more efficient than CISC. This is why lower consumption has been obtained with them. As seen in our tests, Intel NUC 7 performs slightly better than Raspberry Pi 5 while sacrificing consumption, reaching over 38 Watts compared to the approximately 15 W of Raspberry Pi.

4.3. Computational performance

Furthermore, the mean inference speed in frames per second and the mean inference time per frame were measured, while monitoring each board’s power consumption. All results have been summarized in Table 1, which also reports the corresponding detection accuracy on both RGB and IR data in terms of mAP@50 and mAP@50–95.

Table 1.
Mean inference speeds, per-frame times, power consumption and detection accuracy for YOLO models across different hardware platforms.

Raspberry Pi 4B 8GB

Raspbian OS ARM64

Model Model size Inference speed Inference time Consumption RGB mAP@50/95 IR mAP@50/95

YOLOv8n ONNX (.onnx) 11.7 MB 2.35 fps 424.98 ms 8.4W 64.6%/37.3% 98.2%/75.1%

YOLOv8s ONNX (.onnx) 42.7 MB 0.83 fps 1208.97 ms 8.5W 66.8%/38.2% 98.1%/74.3%

YOLOv11n ONNX (.onnx) 10.1 MB 2.55 fps 392.65 ms 8.3W 64.5%/36.9% 98.2%/74.0%

YOLOv11s ONNX (.onnx) 36.2 MB 0.98 fps 1017.98 ms 8.4W 67.8%/38.1% 98.2%/75.5%

Raspberry Pi 5 8GB

Raspbian OS ARM64

Model Model size Inference speed Inference time Consumption RGB mAP@50/95 IR mAP@50/95

YOLOv8n ONNX (.onnx) 11.7 MB 7.09 fps 141.01 ms 14.9W 64.6%/37.3% 98.2%/75.1%

YOLOv8s ONNX (.onnx) 42.7 MB 2.56 fps 389.99 ms 15W 66.8%/38.2% 98.1%/74.3%

YOLOv11n ONNX (.onnx) 10.1 MB 7.33 fps 136.28 ms 14.6W 64.5%/36.9% 98.2%/74.0%

YOLOv11s ONNX (.onnx) 36.2 MB 2.87 fps 348.44 ms 14.7W 67.8%/38.1% 98.2%/75.5%

Intel NUC 7 (Next Unit of Computing)

Windows 10 Pro X64

Model Model size Inference speed Inference time Consumption RGB mAP@50/95 IR mAP@50/95

YOLOv8n ONNX (.onnx) 11.7 MB 13.30 fps 75.19 ms 38.46W 64.6%/37.3% 98.2%/75.1%

YOLOv8s ONNX (.onnx) 42.7 MB 4.54 fps 220.15 ms 38.59W 66.8%/38.2% 98.1%/74.3%

YOLOv11n ONNX (.onnx) 10.1 MB 13.59 fps 73.57 ms 38.44W 64.5%/36.9% 98.2%/74.0%

YOLOv11s ONNX (.onnx) 36.2 MB 4.40 fps 227.07 ms 38.63W 67.8%/38.1% 98.2%/75.5%

NVIDIA Jetson Orin Nano 8GB

Ubuntu 20.04 ARM64

Model Model size Inference speed Inference time Consumption RGB mAP@50/95 IR mAP@50/95

YOLOv8n CUDA (.pt) 6 MB 56.34 fps 17.75 ms 8.489W–9.105W 64.3%/37.0% 98.4%/75.9%

YOLOv8n TensorRT (.engine) 8.7 MB 80.93fps 12.36ms 4.519W–5.112W 64.5%/37.3% 98.2%/75.1%

YOLOv8s CUDA (.pt) 21.5 MB 52.28fps 19.13ms 9.939W–16.058W 66.5%/38.1% 98.2%/75.2%

YOLOv8s TensorRT (.engine) 24.1 MB 56.69 fps 17.64 ms 4.949W–6.300W 66.8%/38.2% 98.1%/74.3%

YOLOv11n CUDA (.pt) 5.2 MB 42.74 fps 23.40 ms 7.752W–8.120W 65.2%/37.0% 98.4%/74.6%

YOLOv11n TensorRT (.engine) 8.4 MB 73.56fps 13.59ms 4.841W–5.072W 64.4%/36.9% 98.2%/74.0%

YOLOv11s CUDA (.pt) 18.3 MB 42.88 fps 23.32 ms 7.565W–13.491W 68.1%/38.3% 98.4%/76.6%

YOLOv11s TensorRT (.engine) 21.5 MB 50.64fps 19.75ms 4.906W–6.259W 67.9%/38.2% 98.2%/75.6%

NVIDIA RTX 3090 24GB

Windows 11 Pro X64

Model Model size Inference speed Inference time Consumption RGB mAP@50/95 IR mAP@50/95

YOLOv8n CUDA (.pt) 6 MB 153.06 fps 6.53 ms 83W–127W 64.3%/37.0% 98.4%/75.9%

YOLOv8n TensorRT (.engine) 7.7 MB 621.67 fps 1.61 ms 72W–123W 64.5%/37.3% 98.2%/75.1%

YOLOv8s CUDA (.pt) 21.5 MB 149.75 fps 6.68 ms 124W–149W 66.5%/38.1% 98.2%/75.2%

YOLOv8s TensorRT (.engine) 23.8 MB 612.47 fps 1.63 ms 99W–143W 66.8%/38.2% 98.1%/74.3%

YOLOv11n CUDA (.pt) 5.2 MB 115.42 fps 8.66 ms 75W–123W 65.2%/37.0% 98.4%/74.6%

YOLOv11n TensorRT (.engine) 7.2 MB 505.80 fps 1.98 ms 81W–125W 64.4%/36.9% 98.2%/74.0%

YOLOv11s CUDA (.pt) 18.3 MB 114.68 fps 8.72 ms 98W–140W 68.1%/38.3% 98.4%/76.6%

YOLOv11s TensorRT (.engine) 20.6 MB 494.75 fps 2.02 ms 95W–138W 67.9%/38.2% 98.2%/75.6%

Raspberry Pi 4B 8GB
Raspbian OS ARM64
Model	Model size	Inference speed	Inference time	Consumption	RGB mAP@50/95	IR mAP@50/95
YOLOv8n ONNX (.onnx)	11.7 MB	2.35 fps	424.98 ms	8.4W	64.6%/37.3%	98.2%/75.1%
YOLOv8s ONNX (.onnx)	42.7 MB	0.83 fps	1208.97 ms	8.5W	66.8%/38.2%	98.1%/74.3%
YOLOv11n ONNX (.onnx)	10.1 MB	2.55 fps	392.65 ms	8.3W	64.5%/36.9%	98.2%/74.0%
YOLOv11s ONNX (.onnx)	36.2 MB	0.98 fps	1017.98 ms	8.4W	67.8%/38.1%	98.2%/75.5%
Raspberry Pi 5 8GB
Raspbian OS ARM64
Model	Model size	Inference speed	Inference time	Consumption	RGB mAP@50/95	IR mAP@50/95
YOLOv8n ONNX (.onnx)	11.7 MB	7.09 fps	141.01 ms	14.9W	64.6%/37.3%	98.2%/75.1%
YOLOv8s ONNX (.onnx)	42.7 MB	2.56 fps	389.99 ms	15W	66.8%/38.2%	98.1%/74.3%
YOLOv11n ONNX (.onnx)	10.1 MB	7.33 fps	136.28 ms	14.6W	64.5%/36.9%	98.2%/74.0%
YOLOv11s ONNX (.onnx)	36.2 MB	2.87 fps	348.44 ms	14.7W	67.8%/38.1%	98.2%/75.5%
Intel NUC 7 (Next Unit of Computing)
Windows 10 Pro X64
Model	Model size	Inference speed	Inference time	Consumption	RGB mAP@50/95	IR mAP@50/95
YOLOv8n ONNX (.onnx)	11.7 MB	13.30 fps	75.19 ms	38.46W	64.6%/37.3%	98.2%/75.1%
YOLOv8s ONNX (.onnx)	42.7 MB	4.54 fps	220.15 ms	38.59W	66.8%/38.2%	98.1%/74.3%
YOLOv11n ONNX (.onnx)	10.1 MB	13.59 fps	73.57 ms	38.44W	64.5%/36.9%	98.2%/74.0%
YOLOv11s ONNX (.onnx)	36.2 MB	4.40 fps	227.07 ms	38.63W	67.8%/38.1%	98.2%/75.5%
NVIDIA Jetson Orin Nano 8GB
Ubuntu 20.04 ARM64
Model	Model size	Inference speed	Inference time	Consumption	RGB mAP@50/95	IR mAP@50/95
YOLOv8n CUDA (.pt)	6 MB	56.34 fps	17.75 ms	8.489W–9.105W	64.3%/37.0%	98.4%/75.9%
YOLOv8n TensorRT (.engine)	8.7 MB	80.93fps	12.36ms	4.519W–5.112W	64.5%/37.3%	98.2%/75.1%
YOLOv8s CUDA (.pt)	21.5 MB	52.28fps	19.13ms	9.939W–16.058W	66.5%/38.1%	98.2%/75.2%
YOLOv8s TensorRT (.engine)	24.1 MB	56.69 fps	17.64 ms	4.949W–6.300W	66.8%/38.2%	98.1%/74.3%
YOLOv11n CUDA (.pt)	5.2 MB	42.74 fps	23.40 ms	7.752W–8.120W	65.2%/37.0%	98.4%/74.6%
YOLOv11n TensorRT (.engine)	8.4 MB	73.56fps	13.59ms	4.841W–5.072W	64.4%/36.9%	98.2%/74.0%
YOLOv11s CUDA (.pt)	18.3 MB	42.88 fps	23.32 ms	7.565W–13.491W	68.1%/38.3%	98.4%/76.6%
YOLOv11s TensorRT (.engine)	21.5 MB	50.64fps	19.75ms	4.906W–6.259W	67.9%/38.2%	98.2%/75.6%
NVIDIA RTX 3090 24GB
Windows 11 Pro X64
Model	Model size	Inference speed	Inference time	Consumption	RGB mAP@50/95	IR mAP@50/95
YOLOv8n CUDA (.pt)	6 MB	153.06 fps	6.53 ms	83W–127W	64.3%/37.0%	98.4%/75.9%
YOLOv8n TensorRT (.engine)	7.7 MB	621.67 fps	1.61 ms	72W–123W	64.5%/37.3%	98.2%/75.1%
YOLOv8s CUDA (.pt)	21.5 MB	149.75 fps	6.68 ms	124W–149W	66.5%/38.1%	98.2%/75.2%
YOLOv8s TensorRT (.engine)	23.8 MB	612.47 fps	1.63 ms	99W–143W	66.8%/38.2%	98.1%/74.3%
YOLOv11n CUDA (.pt)	5.2 MB	115.42 fps	8.66 ms	75W–123W	65.2%/37.0%	98.4%/74.6%
YOLOv11n TensorRT (.engine)	7.2 MB	505.80 fps	1.98 ms	81W–125W	64.4%/36.9%	98.2%/74.0%
YOLOv11s CUDA (.pt)	18.3 MB	114.68 fps	8.72 ms	98W–140W	68.1%/38.3%	98.4%/76.6%
YOLOv11s TensorRT (.engine)	20.6 MB	494.75 fps	2.02 ms	95W–138W	67.9%/38.2%	98.2%/75.6%

CUDA offers greater performances, thanks to its efficiency in YOLO models management. Ultralytics allows to move the model over CUDA and perform operations directly on GPU if it is available. The last two columns report, for each model, mAP@50 and mAP@50–95 on the RGB and IR datasets. All TensorRT models tested have been quantized in half-precision floating point (FP16). Power consumption has been measured using integrated software tools inside the operating systems, such as tegrastats for Jetson Orin Nano.

The disparity between Raspberry Pi and Jetson Orin Nano performances is quite evident. The latter is able to offer much higher throughput in terms of processed frames per second, while maintaining a compact form factor suitable for onboard deployment. However, Raspberry Pi 5 results are still promising, since the pretrained models have been exported in ONNX format, which enables substantially better performances than standard PyTorch execution on resource-constrained platforms. Moreover, there are still clear differences when comparing the Jetson Orin Nano board with an X64 machine with a dedicated CUDA GPU. However, this study is specifically related to onboard architecture development, which necessarily implies compromises with respect to a ground-based machine.

A key factor behind the large FPS increase observed for the TensorRT models is the adoption of half-precision floating point (FP16) inference. FP16 reduces the amount of data moved between memory and compute units, lowers bandwidth pressure and allows the NVIDIA GPU to exploit hardware-optimized execution paths, including Tensor Cores on supported devices. As a consequence, more operations can be processed in parallel and each frame requires less latency to traverse the pipeline. In our experiments, this effect is especially evident on the Jetson Orin Nano, where TensorRT FP16 versions consistently outperform the corresponding CUDA models while preserving nearly unchanged detection accuracy on both RGB and IR benchmarks.

4.4. Detection accuracy and comparison

Despite the evident differences across all the computer modules we tested, the detection accuracy of the models remained substantially consistent across the considered deployment scenarios. In particular, the $n$ and $s$ versions of YOLOv8 and YOLOv11 were selected because they proved to be the most balanced in terms of weight and performance compared to the other models validated in Tavaris et al.,³² in both RGB and IR spectra. The integrated view in Table 1 makes clear that the fastest configurations also preserve comparable accuracy, which is essential for selecting the most appropriate deployment profile. In addition, a summary table with also precision and recall metrics for each quantized model has also been added in Section E of the Appendix.

Table 2 summarizes the comparison of our proposal with the related work cited in Section 2. Notice that the IR channel performance is the one driving the detection, because the system fuses both modalities and IR provides the primary detection signal in the evaluated scenarios.

Table 2.
Comparison of the proposed system with state-of-the-art onboard UAV detection approaches.

Method Platform Modality fps Inference (ms) Power (W) mAP@0.5 Comm.-indep.

Ntousis et al.¹⁴ (2023) Ground server + UAV RGB N/A 200–800 N/A >93% No

Chen et al.²⁵ (2021) Snapdragon 855 (NPU) RGB 28 $\sim$ 36 $^{‡}$ N/A N/A Yes

Vrba et al.²³ (2025) Intel Up Squared LiDAR N/A N/A N/A N/A Yes

Rey et al.²⁸(2025) Jetson Nano RGB 8–25 N/A 10–25 64%–79% Partial

Ours (RPi 5, YOLOv11n ONNX) Raspberry Pi 5 RGB+IR 7.33 136.28 14.6 64.5% (RGB)/98.2% (IR) Yes

Ours (Jetson, YOLOv8n CUDA) Jetson Orin Nano RGB+IR 56.34 17.75 9.1 64.3% (RGB)/98.4% (IR) Yes

Ours (Jetson, YOLOv8n TensorRT) Jetson Orin Nano RGB+IR 80.93 12.36 5.1 64.5% (RGB)/98.2% (IR) Yes

Ours $^{†}$ (GCS, YOLOv8n TensorRT) RTX 3090 RGB+IR 621.67 1.61 72–127 64.5% (RGB)/98.2% (IR) No

Method	Platform	Modality	fps	Inference (ms)	Power (W)	mAP@0.5	Comm.-indep.
Ntousis et al.¹⁴ (2023)	Ground server + UAV	RGB	N/A	200–800	N/A	>93%	No
Chen et al.²⁵ (2021)	Snapdragon 855 (NPU)	RGB	28	$\sim$ 36 $^{‡}$	N/A	N/A	Yes
Vrba et al.²³ (2025)	Intel Up Squared	LiDAR	N/A	N/A	N/A	N/A	Yes
Rey et al.²⁸(2025)	Jetson Nano	RGB	8–25	N/A	10–25	64%–79%	Partial
Ours (RPi 5, YOLOv11n ONNX)	Raspberry Pi 5	RGB+IR	7.33	136.28	14.6	64.5% (RGB)/98.2% (IR)	Yes
Ours (Jetson, YOLOv8n CUDA)	Jetson Orin Nano	RGB+IR	56.34	17.75	9.1	64.3% (RGB)/98.4% (IR)	Yes
Ours (Jetson, YOLOv8n TensorRT)	Jetson Orin Nano	RGB+IR	80.93	12.36	5.1	64.5% (RGB)/98.2% (IR)	Yes
Ours $^{†}$ (GCS, YOLOv8n TensorRT)	RTX 3090	RGB+IR	621.67	1.61	72–127	64.5% (RGB)/98.2% (IR)	No

N/A indicates that the metric was not reported in the original work. $^{†}$ Ground-based system included for reference only. $^{‡}$ Computed as 1000/fps; not directly reported.

The results in Tables 1 and 2 allow a practitioner to navigate the configuration space across four key dimensions: inference speed, latency, power consumption and detection accuracy. For energy-constrained deployments, the Jetson Orin Nano with YOLOv8n TensorRT offers the best speed–power trade-off (80.93 fps, 5.1 W, mAP@0.5 of 98.2% on IR). When detection reliability is the primary constraint, YOLOv11s TensorRT provides marginally higher accuracy at a moderate cost in throughput. A full Pareto-front analysis across these dimensions remains an interesting direction for future work.

As to the reproducibility of the experiments, in Appendices A and B, we report all the details of the hardware components and configuration settings. The models and all the code scripts for the video pipeline and camera calibration will be provided on request, and the dataset is publicly available.³²

In conclusion, several tests were conducted under a range of outdoor conditions including different times of day, background types (trees, ground, sky), and zoom levels (1 $\times$ , 2 $\times$ , 4 $\times$ ). Extreme weather conditions such as heavy rain or snow were outside the scope of the current evaluation, as the drone platforms used are not certified for flight under such conditions; this remains an important direction for future validation, and some previews are shown in the images in Figure 5.

Figure 5.

Comparison of detections with different zoom levels backgrounds, in both RGB and IR spectra.

4.5. Design principles emerging from the study

Abstracting from the experience gained in implementing and experimenting our onboard architecture, we propose some general design principles highlighting good practices and design choices beyond the specific implementation described.

Principle 1—Compute locality should match mission criticality

Tasks requiring immediate response under contested or degraded communication conditions should be executed at the sensing edge, minimizing dependency on remote infrastructure.

Principle 2—SWaP-constrained optimization is a system-level problem

Hardware selection should not be based solely on inference speed, but on joint optimization across computational throughput, power draw, payload mass, and mission endurance.

Principle 3—Model optimization can outweigh hardware scaling

Software-level acceleration strategies (e.g., TensorRT conversion, FP16 quantization) can yield larger operational gains than upgrading to higher-power platforms.

Principle 4—Multimodal sensing improves robustness only if alignment overhead is controlled

The benefit of RGB–IR fusion depends on efficient cross-modal registration; otherwise, preprocessing costs may offset detection gains.

Principle 5—Architectural resilience requires communication independence

In adversarial or infrastructure-limited environments, operational continuity depends more on autonomy than on peak computational performance.

5. Conclusions

The proposed system architecture has been successfully implemented and validated, demonstrating promising performance in real-world counter-UAV scenarios. The overall detection pipeline, sensor fusion framework and onboard decision-making logic operate reliably under operational conditions, confirming the feasibility of a fully autonomous, vision-based interception system deployed on an aerial platform.

Nevertheless, real-time computational performance was constrained by some of the SBCs during tests. Although the Raspberry Pi 4B and Pi 5 represent capable single board computers for many embedded applications, their processing capabilities have been insufficient for sustaining high frame-rate, multi-object tracking and deep-learning inference. Indeed, 15 fps is the bare minimum for less critical monitoring, while the ideal range is 25–30 fps. For this reason, Raspberry Pis turned out to be insufficient for the most demanding operational configurations. On the other hand, NVIDIA Jetson Orin Nano board remains the most relevant and capable board of the ones analyzed and tested. In fact, it achieved promising performances and comparable results with our tested ground-based solution. In addition, thanks to models optimization conducted using TensorRT libraries, it has been possible to overcome current state of the art performances, as seen in the results presented.

Overall, having a distributed detection architecture, the traffic can be reduced to the minimum, just for telemetry/commands communication (video streams are not necessary). Even in heavily jammed environments drones can continue their pre-programmed missions, logging detection events on board. Such events can then be downloaded to the GCS when communication is restored, e.g., coming back towards the GCS. All the used SBCs are below 200 g of weight, and with an additional 4S LiPo Battery (14.8V) with 4000–5000 mAh a Jetson Orin Nano can operate for one hour, beyond the average flight time of most drones. Hence, there are no strict payload and battery impacts.

Therefore, future development will focus on further optimizing the entire software stack on this modern and high-performance edge-AI platform. For instance, a useful addition to the onboard detection software could be the capability of switching between different models, depending on the available computing resources. For example, a small UAV with a Raspberry Pi 5 could be used with a smaller/simpler model to monitor the external perimeter of a sensitive area, while more powerful drones could be used to actively patrol the inner more critical parts with more precise models. Moreover, implementing a policy to identify false positives and false negatives, giving feedback to the base system, could also be a worthwhile future development.

Finally, a more statistically grounded validation with repeated runs and variability analysis will be necessary to strengthen the robustness of the reported results.

Footnotes

Acknowledgements

This work was partially supported by the FSE fund, by the Departmental Strategic Plan (PSD) of the University of Udine – Interdepartmental Project on Artificial Intelligence (2020–25) and by the Italian Minister of Defence PNRM project “ARGOS” (2023–2025).

Ethical considerations

All the research meets the ethical guidelines and legal requirements specified by the Integrated Computer-Aided Engineering journal.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Author contributions

All authors have had an active part in the study and the manuscript preparation. All authors have approved the manuscript, and agree with its submission to the Integrated Computer-Aided Engineering journal.

Funding

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Alberto De Zan

Gian Luca Foresti

Ivan Scagnetto

Niki Martinel

Andrea Toma

Hardware

In the following tables are summarized the most known UAV sensors together with their own features and capabilities. In addition, another table has been built for resuming the hardware platforms adopted in our tests

Configuration details

Our proposed architecture has been developed over DJI Matrice 210 RTK v1.4 UAV platform together with a Zenmuse XT2 RGB-IR camera, whose specifications have been listed in the following table.

Pixel resolution

The table containing all the values obtained from pixel resolution calculation steps.

Kalman filters

The Kalman Filters predicts the new state of the autonomous UAV system by applying these steps and following these equations:

Appendix E. Validation metrics

The following table summarizes the validation metrics of the UAV quantized detection models that have been tested over our computer modules.

References

Sangam

Dave

Sultani

, et al. Transvisdrone: spatio-temporal transformer for vision-based drone-to-drone detection in aerial videos. In: IEEE international conference on robotics and automation (ICRA), pp. 6006–6013. DOI: 10.1109/ICRA48891.2023.10160345.

Saranovic

Pavlovski

Power

, et al. Interception of automated adversarial drone swarms in partially observed environments. Integr Comput Aided Eng 2021; 28: 335–348.

Abdelsamad

Abdelteef

Elsheikh

, et al. Vision-based support for the detection and recognition of drones with small radar cross sections. Electronics 2023; 12: 2235.

Song

Yun

. Optimizing snake robot locomotion with decomposed gait pattern representation. Integr Comput Aided Eng 2025; 32: 196–225.

Naya-Varela

Faiña

Romero

, et al. Understanding joint range of motion development in robotic learning. Integr Comput Aided Eng 2025; 32: 346–365.

Liu

Rong

Neri

, et al. Deep deterministic policy gradient with constraints for gait optimisation of biped robots. Integr Comput Aided Eng 2024; 31: 139–156.

Dedrone Inc. DedroneTracker.AI. https://www.dedrone.com/products/drone-detectionsoftware , 2026.

Robin Radar Systems. IRIS Drone Radar. https://www.robinradar.com/products/iris-radar , 2026.

Ritchings

et al. Drone detection and defense systems: survey and a software-defined radio-based solution. Sensors 2022; 22: 1453.

10.

Al-Emadi

Al-Muhtadi

. RF-based drone detection and identification using deep learning approaches: a survey. IEEE Access 2023; 11: 11312–11331.

11.

Opromolla

Inchingolo

Fasano

. Airborne visual detection and tracking of cooperative UAVs exploiting deep learning. Sensors 2022; 22: 1072.

12.

Carrió

Tordesillas

Vian

, et al. Onboard detection and localization of drones using depth maps. IEEE Access 2020; 8: 10131–10143.

13.

Jocher

Chaurasia

Qiu

. Ultralytics yolov8, 2023.

14.

Ntousis

Makris

Tsanakas

, et al. A dual-stage processing architecture for unmanned aerial vehicle object detection and tracking using lightweight onboard and ground server computations. Technologies 2023; 11: 35.

15.

Lane

Zhang

Kim

. Latency characterization of ground-based UAV video processing pipelines in contested environments. IEEE Trans Aerosp Electron Syst 2024; 60: 4123–4135.

16.

Kwon

Lee

. Edge-cloud collaborative UAV object detection: edge-embedded lightweight algorithm design and task offloading using fuzzy neural network. IEEE Internet Things J 2024; 11: 22178–22193.

17.

Liu

Lin

Cao

, et al. Computation offloading in delay-sensitive multi-satellite cooperative edge computing systems. IEEE Trans Mobile Comput 2024; 23: 10234–10249.

18.

Huang

Leu

, et al. Enhancing UAV visual landing recognition with YOLO’s object detection by onboard edge computing. Sensors 2023; 23: 8999.

19.

Zhang

Wang

. RTUAV-YOLO: a family of efficient and lightweight models for real-time object detection in UAV aerial imagery. Drones 2024; 8: 643.

20.

Loquercio

Maqueda

del Blanco