Abstract
Introduction
Detecting small marine targets using radar systems presents significant challenges due to the complex and dynamic nature of maritime environments, where clutter frequently obscure true targets. Clutter, in the context of marine surveillance, arises from wave reflections and other natural formations, often obscuring targets by blending them into the background, especially small, hard-to-detect ones. Similarly, clutter spikes (hereinafter, spikes), caused by brief reflections from small, reflective objects, appear as sharp signal anomalies that can mimic actual targets. The simultaneous occurrence of clutter further complicates target detection, as radar systems must distinguish between persistent background noise and transient anomalies. This interplay creates a particularly challenging scenario, making it difficult for radar systems to accurately differentiate between true targets and spikes.
Traditional methods for mitigating the impact of spikes in radar data primarily rely on signal processing techniques such as thresholding, averaging, filtering, and time-frequency analysis.
1
Thresholding techniques like Constant False Alarm Rate (CFAR) are employed to adjust thresholds to distinguish between noise and legitimate targets,
2
while Range-Doppler (
Despite these efforts, traditional radar processing techniques are limited in their ability to reliably differentiate between true targets and spikes, underscoring the need for more sophisticated approaches.
Machine learning (ML) techniques have increasingly been employed to enhance the detection of marine radar target detection, particularly in addressing challenges posed by noisy environments and clutter.
5
Existing literature reports the application of various ML models to radar target detection (RTD), with a significant focus on replacing traditional methods with ML approaches. Recent advancements in ML present promising avenues for improving the accuracy and reliability of radar target detection, even in the presence of significant noise and interference.6,7 These methods perform remarkably well when it comes to identifying large targets in a noisy setting. Nevertheless, identifying small targets poses a unique difficulty because the noisy surroundings may show up in the
In this paper, we address the challenge of distinguishing and classifying small marine targets and spikes in scenarios where traditional radar target detection (RTD) methods prove insufficient. Specifically, we focus on the use of airborne radar, a context that introduces unique challenges due to the movement of the radar platform.
To differentiate between spikes and small targets, we introduce two deep-learning classification models designed to classify targets and spikes using Range-Doppler (
To evaluate our method, we collected raw radar signal data, specifically in-phase and quadrature (
Our research makes two key contributions: First, from a physical perspective, we improve the capability to distinguish between small targets and spikes in noisy marine environments tracked by airborne radar, by integrating machine learning techniques. Second, from a machine learning perspective, we developed and tailored a CNN model specifically designed to handle low-height, highly asymmetrical dimensions (AsymCNN), and introduced a novel Rings-Based CNN model optimized for cylinder-shaped inputs (RbCNN).
Related work
This section reviews existing literature on various ML models employed for
Traditional machine learning techniques have been instrumental in early advancements in radar target detection. Callaghan et al. 9 explored the application of machine learning to suppress sea clutter, comparing k-nearest neighbors (k-NN) and Support Vector Machine (SVM). Their study concluded that k-NN performs better than SVM in distinguishing between targets and clutter. Similarly, Li et al. 6 focused on detecting small targets within sea clutter by extracting discriminative features from time and frequency domains. They employed a binary SVM algorithm for classification, which showed significant improvements in detection probability over classical detectors using the IPIX database.
The use of deep learning, particularly Convolutional Neural Networks (CNNs), has significantly advanced radar target detection. Su et al. 10 proposed a maritime target detection method based on CNNs using IPIX-measured sea clutter and target signal data. They trained LeNet and GoogLeNet models, demonstrating their high precision and effectiveness in feature extraction and recognition tasks.
Pan et al. 11 introduced a novel approach using Faster Region-based Convolutional Neural Network (Faster R-CNN) to extract features from pulse-distance two-dimensional images. Their results indicated that this method achieves higher detection probabilities than traditional CFAR methods. Mou et al. 12 further improved Faster R-CNN by incorporating advanced techniques such as soft-NMS and Precise ROI Pooling, achieving better accuracy and reliability in marine target detection.
CNNs can directly process raw radar data and have been shown to significantly improve detection accuracy. O’Shea and Clancy 13 applied CNNs to radar classification tasks, demonstrating their capability to handle complex radar data.
Combining different machine learning techniques have been proven to be robust approaches in radar target detection. Guo et al. 14 proposed a method using deep convolutional auto-encoders (DCAEs) for filtering sea clutter and logistic regression for classification, achieving higher detection accuracy with IPIX radar data. Ensemble methods were explored by Zhang et al. 15 presented a hybrid model combining CNNs and SVMs, leveraging the strengths of both methods for enhanced performance in clutter-rich environments.
Recent studies have introduced innovative approaches to enhance radar target detection in marine environments. Ma et al. 7 developed a multi-source input neural network (MSINN) and utilized the yolov3-tiny model for sea clutter pre-processing, significantly improving detection efficiency. Linghu et al. 16 proposed a deep neural network model to study sea clutter characteristics and parameter inversion, improving predictions by incorporating environmental features such as wind speed and wave height. Zhao et al. 17 introduced a machine learning-based processor combining artificial neural networks (ANN) and DBSCAN clustering, named DBSCAN-CFAR, which showed robust performance under varying clutter conditions.
Time-frequency analysis has been another significant approach. Tang et al. 18 proposed a method using deep learning with time-frequency characteristics for sea clutter suppression, employing discrete wavelet transform (DWT) and LeNet-5 neural networks to classify and identify sub-band signals, achieving higher recognition accuracy.
Generative Adversarial Networks (GANs) have been explored for their ability to generate synthetic radar signals, augmenting training datasets. Pei et al. 19 designed a clutter suppression network based on CycleGAN, demonstrating superior performance in both simulated and measured marine radar data. Wu et al. 20 also utilized CycleGAN for sea clutter suppression, showing improvements in signal-to-clutter ratio (SCR) and stability. In a related context, Chun et al. 21 combined StyleGAN2-ADA and YOLOv5 to boost buried pipe detection in GPR imagery by generating hard examples, achieving higher accuracy with fewer labeled samples. Similarly, Huang et al. 22 used SA-DenseCL and Mask R-CNN to improve GPR tunnel inspection, achieving higher precision than conventional pretraining methods. Beyond ground-based radar, Mohan and Simske 23 developed a cross-sensor maritime detection system using CNNs trained on optical and SAR imagery, demonstrating effective generalization across sensor types. In material inspection, Dao et al. 24 evaluated GPR and ANN models for predicting concrete strength, showing that nonparametric methods like GPR improve uncertainty-aware estimation based on radar-derived features.
In the context of these advancements, Rahman et al. 25 explored a machine learning-based approach for maritime target classification and anomaly detection using millimeter-wave radar Doppler signatures. Utilizing experimental data from W-band and G-band radars, the study targeted eight classes of maritime objects, including boats, swimmers, and buoys. By extracting features from Doppler spectra and spectrograms, the authors achieved validation and test accuracy of up to 93.3% and 88.7%, respectively. Their use of a one-class support vector machine (OCSVM) for anomaly detection also proved effective in identifying outliers with high accuracy. The research addresses the complexities posed by sea clutter and spikes, offering a machine-learning solution that complements and enhances traditional radar detection methods, highlighting the potential of millimeter-wave radars in maritime surveillance.
In conclusion, many attempts have been made to apply machine learning models to radar target detection (RTD). However, to the best of our knowledge, previous research has primarily focused on stationary radar systems, whereas our study explores the application of airborne radar in this context. Moreover, most existing studies have focused on replacing traditional RTD methods with ML techniques. Instead, our approach leverages ML as a secondary stage specifically to address scenarios where traditional RTD methods are insufficient. By focusing on these challenging cases, even small improvements in our model’s performance yield significant value, enhancing detection capabilities in complex environments.
Problem definition
Radar images are constructed from
The detection of small marine targets using radar systems poses significant challenges, primarily due to the complex and dynamic nature of maritime environments. The presence of clutter and spikes in the marine environment exacerbates this issue because the nature of sea spikes returns can be very similar to that of returns from a desired target.
Clutter, in the context of marine surveillance, manifests in various forms, such as wave reflections or other natural formations. Clutter can obscure targets entirely, blending them into the background and thwarting detection efforts. This phenomenon is particularly problematic for small targets, which are already challenging to distinguish in the marine landscape. Similarly, spikes, in the context of marine surveillance, typically appear as sudden, sharp signal anomalies caused by brief reflections from small, highly reflective objects. These spikes can temporarily mimic the appearance of actual targets, leading to confusion in target identification. This phenomenon becomes especially problematic in situations where rapid and accurate detection is crucial, as it can lead to delays in response or the misclassification of spikes as valid targets.
Using airborne radar for maritime target detection presents unique challenges not encountered with static radar systems located in fixed areas. These challenges arise primarily due to the rapid movement of the aircraft, which induces fast changes in sea state variability.
Dynamic sea conditions, such as wave height and roughness, change rapidly during flight. These changes can significantly alter the radar signature of the surface, making it difficult to distinguish between genuine targets and environmental noise. The behavior and characteristics of the sea significantly differ between the open ocean and near-coastal areas, further complicating target detection.
Additionally, the radar’s angle of incidence changes continuously with the aircraft’s flight path. These variations affect the strength and quality of the radar return, leading to inconsistent detection and classification of small targets. This inconsistency poses a significant challenge in maintaining reliable target identification in a dynamic maritime environment.
Traditional methods for mitigating the impact of spikes in radar data primarily rely on signal processing techniques such as thresholding, averaging, filtering, and time-frequency analysis. 1 Fixed and adaptive thresholding techniques, like Constant False Alarm Rate (CFAR), which will be explained in Step 4.2 in Section 4.2, adjust detecting thresholds to differentiate between noise and legitimate targets. However, these approaches can still be susceptible to false alarms caused by spikes that temporarily exceed the threshold as can be seen in Figure 1, which presents data from our experimental evaluation.

Traditional Radar Target Detection by CFAR.
To address the challenge of spikes crossing the CFAR threshold, traditional methods suggest comparing Range-Doppler (

Traditional Radar Target Detection by Sequence of Range-Doppler (
The figure presents a sequence of Range-Doppler (
This approach effectively reduces false alarms by filtering out brief anomalies, thereby enhancing detection accuracy in cluttered environments. However, it has some limitations. First, this approach may miss the detection of fast-moving or small targets. Also, it increases the computational load and the time required to process multiple sweeps. Finally, its sensitivity to environmental changes may cause variability in radar returns.
These limitations underscore the need for more advanced approaches, such as machine learning, to effectively manage spike detection in radar systems.
Our main goal is to rapidly detect marine radar targets in a noisy environment where traditional RTD methods are insufficient. As mentioned earlier, the main challenge lies in differentiating small targets from spikes, as sea spike returns can closely resemble those of desired targets. Therefore, our research focuses on the challenging task of accurately classifying small targets and spikes.
We proposes a method as shown in Figure 3: (1) Mathematical process of radar signal processing; (2) Extraction and generation of spikes and small targets; (3) Data pre-processing; (4) Classification by Deep Learning models.

The proposed method takes
The mathematical process of the radar
The initial three
Pulse compression
Pulse compression enhances radar range resolution and signal-to-noise ratio (SNR) by transmitting a long pulse for high energy and then compressing it in time after reception. This approach allows the radar to distinguish closely spaced targets despite the longer pulse, achieving high SNR and fine resolution simultaneously. The range resolution, defined by
Partition to distance segments
After the pulse compression process, dividing the radar range into multiple distance segments becomes essential. This partitioning allows us to account for the varying characteristics of radar returns across different distances. Each sub-range may differ in noise levels or interference, enabling focused analysis and targeted filtering of less reliable segments. Specifically, the radar range is divided into ten sub-range intervals.
Mathematical correction for radar’s kinesis
When a radar system is mounted on a moving platform, such as an aircraft or a ship, the motion of the platform introduces additional errors and complexities in the radar measurements. These include Doppler shifts, changes in the radar’s position and orientation, and platform-induced biases, all of which must be corrected to ensure accurate target detection and tracking. Mathematical correction techniques, such as Kalman filtering, are essential for compensating these motion-induced errors by continuously updating the radar’s position and velocity estimates. This correction ensures that the radar can maintain precise geolocation of targets and reliable tracking, despite the platform’s movement. Without such corrections, the accuracy and reliability of the radar system would be significantly degraded, leading to erroneous target information.
Fast fourier transform process
After partitioning the radar range into ten periods of sub-ranges and applying mathematical corrections, the Fast Fourier Transform (FFT) is used to convert the time-domain radar signals into the frequency domain. This transformation allows the extraction of both range and Doppler information, resulting in the creation of Range-Doppler maps. These maps display the detected targets in terms of their range and relative velocity, providing a two-dimensional view that helps to distinguish between multiple targets and accurately estimate their speeds. The FFT process improves the radars ability to analyze the frequency components, thereby improving the resolution and accuracy of target detection within each sub-range.
At the end of this stage, we obtain matrices representing Range-Doppler (
Extraction and generation of spikes and small targets
In this step, we undertake the transformation of matrices representing Range-Doppler (
After the transformation process, each strip either contains a single small target
This transformation process is performed for two primary reasons: First, to exclude targets or spikes from the ML model that do not require complex classification. Second, to ensure that the performance of the ML model , as measured by the metrics presented in the 7.2 subsection, is rigorously evaluated, avoiding artificially high scores from inputs that are easily detectable by the physical model.
The data production process for our research, which we will elaborate on in the evaluation section, involves a flight over the Mediterranean Sea using a maritime radar. This process yielded a considerable amount of spikes essential for training the Deep-Learning (DL) model, which, like any data-driven model, exhibits a data-starved nature. However, the dataset lacks a significant number of small targets crucial for effective model training. The lack of available labeled data is a well-known difficulty in applying machine learning-based methods to Radar Target Detection (RTD).
To address the challenge of limited amount of small targets, we injected simulated targets onto an authentic sea background, resulting in the creation of semi-simulative strips of targets (the specifics of these simulative targets are discussed further in the evaluation section). On the one hand, this approach facilitates effective training of the DL model; on the other hand, it preserves the authentic sea background, including clutter with an unknown and complex noise distribution.
As mentioned above, the first step in processing the
To ensure reliable separation between small targets and spikes, it is essential to filter out
The first step employs the traditional CFAR approach. This step involves applying a threshold, which is defined separately for each range-Doppler ( The threshold In an ideal scenario without deviations, the desired matrix size would be one-dimensional, specifically Figure 4 presents examples demonstrating how each parameter affects the appearance of the strip. For instance, the bottom-right strip is defined by a dB over a threshold value of 2 (mid-level) and a velocity of 1.918 m/s (mid-range). As a result, the target’s position on the strip is centered, corresponding to the mid-range velocity, and its brightness is distinctly visible. In contrast, the upper-left strip is characterized by a dB over the threshold value of 0 (low level) and a velocity of 3.577 m/s (high range). Consequently, the target’s position shifts to the right, reflecting the high-range velocity, while its brightness is significantly reduced. Examples of Upon completion of this stage, we obtain a set of

At this stage, we process the
Let Matrix size verification: Data normalization between 0 and 1:
The second stage involves two basic and well-known elements:
This ensures that the ML model learns the data accurately and efficiently.
Deep learning classification models
In this section, we present two deep learning classification models designed to distinguish between targets and spikes using (
Convolutional neural networks
Convolutional Neural Networks (CNNs) are a class of deep learning models particularly effective for processing structured grid-like data such as images. They operate by applying learnable convolutional filters to input data to automatically extract hierarchical spatial features, often followed by pooling operations for dimensionality reduction and dense layers for final classification or regression.
CNNs are trained end-to-end using backpropagation and gradient-based optimization methods such as Adam. Their architecture, combining local feature extraction, parameter sharing, and non-linear activation functions, has made them a foundational tool in computer vision and related fields. For a comprehensive overview of CNNs, see.28,29
Asymmetrical CNN-Based Model for Radar Target Detection
CNNs have evolved into highly flexible architectures that can be adapted to a wide range of data types and tasks. Recent studies demonstrate how CNN-based models are being tailored for complex, domain-specific challenges beyond traditional image classification. For example, Urdiales et al. 30 proposed a hybrid architecture combining convolutional LSTM, Siamese, and recurrent networks for improved multi-object tracking. Ruiz et al. 31 developed a lightweight SSD-based model with precision refinement, achieving over 99% accuracy in detecting circular fixation elements in aircraft manufacturing. These recent advancements demonstrate the potential of CNN-based models to be extended and optimized for specialized and domain-specific applications.
The deep learning model we have developed is fundamentally an image recognition model and is, therefore, based on a basic image recognition model such as CNN. However, adjustments to the model’s architecture are required to address the challenge we are facing, which is different from classic image recognition. Our model aims to accurately classify between small targets and spikes within an
In addition to the difficulty involved in recognizing Given the extended width of 256 pixels, it is essential to capture long-range dependencies and contextual information across this entire dimension. However, with a limited height of only 3 pixels, standard symmetric convolutional filters (e.g., In the model tailored for this purpose, we applied a Additionally, we avoid using pooling in the vertical axis because it may lead to information loss and accelerate the transition from 2D to 1D CNN, which is undesirable.
Network Architecture:
The idea behind designing the AsymCNN architecture is to address the aforementioned challenges and achieve optimal classification. This is accomplished by gradually modifying each axis independently, resulting in an integrated 2D and 1D CNN model, as outlined below:
The proposed model was developed through a structured three-stage methodology designed to systematically optimize both the architecture and training process. All architectural choices are supported by ablation and sensitivity analyses, as detailed in Section 5.
Figure 5 illustrates the construction of the AsymCNN architecture.

AsymCNN for
Loss function
We adopt binary cross-entropy as the loss function. Binary cross-entropy loss, often used in binary classification tasks, measures the discrepancy between predicted probabilities and true binary labels. For a single example with true label
For a dataset with
Activation functions
In our model, we implement the Rectified Linear Unit (ReLU) activation function across all hidden layers of our neural network architecture. Additionally, we utilize the sigmoid activation function specifically in the final output layer. The ReLU activation function is a widely used non-linear activation function in deep learning. It introduces non-linearity to the model by thresholding the input at zero, resulting in output values that are either zero for negative inputs or linearly proportional to positive inputs. Mathematically, the ReLU activation function can be defined as:
The conventional flat and two-dimensional representation of the

The two-dimensional representation of the
The
Analyzing the
The novel model we have coined RbCNN is essentially a transformation of the AsymCNN model introduced earlier. However, rather than adhering to the conventional, sprawling architecture of a CNN model, we have reimagined it into a ring-based framework as shown in Figure 7. This adaptation enables the data to maintain its original format without the need for defining an intersection point for the

Transforming the AsymCNN model into a Rings-based CNN model enables the input data to retain its original format without the need to define an intersection point for the
The RbCNN architecture introduces an innovative approach to handling the cylindrical nature of
Implementation of RbCNN
The RbCNN model builds upon the classical Convolutional Neural Network (CNN) architecture by introducing a custom mechanism designed to better handle the circular structure of Range-Doppler (
Standard CNNs often face challenges with edge effects, but these can be addressed when the data exhibits a wrap-around nature, as in the case of
To implement this cylindrical architecture simply and efficiently, each convolutional layer includes matrices of size
Figure 8 illustrates this process for a matrix of size

An example of the RbCNN applied to a single matrix in the RbCNN model. In this case, 3 columns (
This process is applied across all convolutional layers, but is not needed in the dense layers. Dense layers treat the input as a whole and do not encounter the same edge-related issues as convolutional layers. Since dense layers integrate the entire input, regardless of its position or shape, there is no need to copy columns to preserve continuity.
One of the key ideas and advantages of the RbCNN model compared to the baseline CNN is that, instead of padding the edges with zeros which carry no meaningful information RbCNN pads the edges with valuable data. This approach enables improved performance without introducing additional weights to the CNN layers or increasing computational complexity, as detailed in Section 6.
The model development process was structured into three main stages, systematically refining the architecture and training parameters to maximize classification performance.
To justify the proposed architecture and evaluate the impact of design choices, we employed a structured three-stage methodology, progressing from coarse to fine granularity. First, an ablation study was conducted to determine the optimal architectural configuration. This was followed by sensitivity analyses of convolutional kernel widths and stride values, which are critical due to the asymmetric nature of the input data. Finally, comprehensive hyperparameter optimization was performed to further enhance model performance. To improve the reliability of model performance estimates without incurring prohibitive computational overhead, all three stages employed five-fold cross-validation.
Baseline architecture ablation and evaluation strategy
The first stage of the model development process focused on identifying an effective baseline architecture. Different combinations of convolutional and dense layers were evaluated to determine their impact on classification performance. The goal is to establish a robust initial model structure before proceeding to detailed architectural refinements.
To determine the optimal architectural design, configurations with up to four convolutional layers and three fully connected (dense) layers were explored. Following the principle of progressive complexity, architectural depth and spatial representation were incrementally increased to assess their influence on model performance. The convolutional architectures are summarized in Table 1, while the dense layers followed a fixed configuration of 128, 32, 16.
Convolutional layer configuration used in the architecture search.
Convolutional layer configuration used in the architecture search.
To identify the optimal configuration, we varied the number of convolutional layers (one to four) and dense layers (zero to three). Configurations were constructed sequentially, such that each configuration incorporated all preceding layers in order; non-sequential combinations (e.g., using the first and third layers while skipping the second) were not considered. The convolutional and dense structures were incrementally expanded to evaluate the effect of increasing architectural depth on model performance.
Each model configuration was trained independently on all eight radar modes using five-fold cross-validation. Performance metrics were computed as the mean across folds and modes to ensure robustness and generalizability. The ablation study was conducted using the designated training and validation dataset.
Figure 9 presents the results of the ablation study evaluating the impact of varying the number of convolutional and dense layers. Each curve corresponds to a different number of convolutional layers (ranging from one to four), while the x-axis represents the number of dense layers (zero to three), and the y-axis shows the mean test accuracy across all radar modes using five-fold cross-validation.

Mean test accuracy across all radar modes for each CNN and dense layer configuration. Each curve corresponds to a different number of convolutional layers; the x-axis indicates the number of dense layers, and the y-axis shows the mean test accuracy.
The results indicate a clear improvement in performance with the addition of convolutional layers, particularly when progressing from one to three layers. Notably, the configuration with three convolutional layers and two dense layers achieves the highest accuracy of 94.79%, representing an optimal balance between architectural complexity and generalization. Beyond three convolutional layers, accuracy gains plateaued or slightly declined, suggesting diminishing returns with additional depth. Similarly, increasing the number of dense layers beyond two does not consistently enhance performance and may have introduced overfitting.
Based on the results in Figure 9, the configuration comprising three convolutional layers and two dense layers, which achieved the highest mean accuracy, was selected as the baseline architecture for subsequent experiments. Building on this structure, the next stage involves optimizing the convolutional kernel widths and stride widths to further improve model performance.
With the baseline architecture finalized, the second stage focuses on optimizing kernel and stride widths for the convolutional layers. Four candidate configurations (Conf1-Conf4) were defined, each specifying distinct kernel and stride settings across the three convolutional layers. Detailed specifications and corresponding mean test accuracies are provided in Table 2. The kernel and stride sizes were designed to progressively reduce the spatial resolution by approximately half at each layer within a configuration, and across configurations. Minor adjustments were made in Conf1 Layer 1 to prevent excessive early compression, and in Conf4 Layer 3, where further stride reduction was not feasible due to size constraints.
Mean test accuracies for asymCNN configurations (conf1-conf4) with different kernel widths and stride widths (reported as kernel width / stride width) across three convolutional layers.
Mean test accuracies for asymCNN configurations (conf1-conf4) with different kernel widths and stride widths (reported as kernel width / stride width) across three convolutional layers.
To evaluate the effect of kernel and stride selection, the performance of each configuration was assessed based on the mean test accuracy across all radar modes.
As shown in Table 2, Conf2 achieves the highest mean test accuracy, demonstrating the effectiveness of a moderate reduction in spatial resolution across layers. Conf1 and Conf4, applying more aggressive or minimal reductions, performed worse. Accordingly, Conf2 was adopted for subsequent experiments.
Following the selection of the optimal kernel and stride settings, hyperparameter tuning was conducted to improve training efficiency and generalization.
Building on the finalized baseline architecture, hyperparameter optimization was performed to further enhance model performance. The learning rate, batch size, and dropout rate were tuned using a grid search strategy. Evaluation employed five-fold cross-validation, and the mean test accuracy across folds and radar modes was used to select the optimal settings.
Hyperparameter optimization employed five-fold cross-validation, with mean test accuracy across folds and radar modes guiding the selection of optimal settings.
The hyperparameters explored include:
Learning rate: Dropout rate: Batch size:
Each configuration was trained independently, and the mean of the results was used to ensure robustness and generalizability. This tuning process led to the best model performance, achieving a mean test accuracy of 94.93% with a learning rate of
Figure 10 illustrates the effect of varying the learning rate on mean validation accuracy, grouped by batch size. Across all batch sizes, a learning rate of

Mean validation accuracy as a function of learning rate, grouped by batch size (Dropout Rate = 0.7).
Figure 11 similarly illustrates the effect of varying the dropout rate on mean validation accuracy. A dropout rate of 0.7 yields the highest accuracy for batch sizes 16 and 32, while batch size 64 performs slightly better at a dropout rate of 0.5. Overall, the model shows low sensitivity to the dropout rate, with only minor variations in accuracy across the tested range.

Mean validation accuracy as a function of dropout rate, grouped by batch size (Learning Rate =
The sensitivity analyses confirm that the learning rate has the strongest impact on model performance, followed by batch size with moderate sensitivity, while dropout rate exhibits weak sensitivity. Accordingly, the hyperparameter settings a learning rate of
Final training and evaluation were conducted based on the optimized model configuration. Detailed experimental results and performance analysis are presented in Section 7.
The computational complexity of the proposed models was evaluated based on
Both models were trained using a batch size of 16 and a learning rate of
Training and inference time comparison for AsymCNN and RbCNN.
Training and inference time comparison for AsymCNN and RbCNN.
This section outlines the experimental setup, datasets, baseline algorithms, and evaluation metrics used to evaluate the proposed models across four research questions, followed by a presentation and discussion of the results.
We aim to address the following research questions:
RQ1 examines how the performance of the proposed AsymCNN and RbCNN models compares with two widely adopted deep learning baselines: ResNet and DenseNet.
In aerial radar applications, the sea background changes rapidly during flight, as discussed in Section 3. Thus, RQ2 examines the ability to detect small targets when the sea background varies.
RQ3 examines the impact of a target’s velocity, acceleration, and signal intensity on the ability to detect it in a noisy marine environment.
Experimental setup
Datasets
To collect
Radar characteristics for evaluation.
Radar characteristics for evaluation.
This radar was mounted on an aircraft, which conducted flights over the eastern Mediterranean Sea. The aircraft maintained a speed of 150 to 200 knots at an altitude of 3,000 feet above sea level, while performing scans in the designated operational mode. The sea was scanned eight times across eight different areas and flight directions to detect naval targets, with each scan lasted between 14 and 38 seconds as detailed in Table 5.
Characteristics of each flight mode, including aircraft azimuth and radar elevation, which influence the sea background, and flight time, which affects the number of
Following the mathematical process described in Section 4.1, and considering the radar’s 75 km range and the received sample rate of 250Hz, we obtained 10 Range-Doppler (
At the end of this stage, we obtained a total count of
For each
Parameters for targets implementation.
Parameters for targets implementation.
After extracting spikes and implementing small targets, we obtained the complete data strips, as described in Table 7. The number of strips in each mode is primarily determined by the number of real spikes detected within that mode. In general, a higher number of
Number of strips for each mode.
For our research, we randomly selected 9,728 strips for each mode, equally divided between target and spike samples. The reasons for selecting 9,728 samples is that we aim to have an equal number of samples across all modes to ensure reliable comparisons between the machine learning models of the different modes, which will be presented later. Mode 7 has the fewest samples, with 9,728 samples. Therefore, 9,728 is the maximum number we can select to maintain equal sample sizes across all modes. For each mode, we randomly selected 25% of the data (2,432 strips) for testing, equally divided between targets and spikes. The remaining 75% of the data (7,296 samples) were used for training and validation.
To address RQ1, we compared the performance of the proposed AsymCNN and RbCNN architectures against two widely adopted deep learning architectures: ResNet and DenseNet. Each architecture was trained separately using an identical training procedure to ensure a fair and consistent comparison.
For the ResNet and DenseNet architectures, additional adaptations were required to accommodate the unique characteristics of the radar data. Given the highly non-standard radar input dimensions (
Due to the limited dataset size, DenseNet121 and ResNet50 were selected as baseline architectures, as they are the smallest widely adopted backbones available with ImageNet-pretrained weights. Each model incorporated a lightweight classification head consisting of a Dense layer with 128 units followed by a binary output neuron. To mitigate overfitting, the backbone networks were initialized with pretrained weights and kept frozen throughout the entire training process.
To address RQ2, we adopted the AsymCNN architecture and, similar to RQ1, created and trained a separate model for each mode. However, in contrast to the procedure used in RQ1, training and validation were conducted using data from the other seven modes, while the target mode was reserved exclusively for testing. We then compared these results to the model trained and tested on the same test mode samples, as described in RQ1.
To ensure an equitable comparison, we used the same number of training and validation samples (7,296) for RQ2 as in RQ1. This was achieved by randomly selecting 1,042 samples from each of the seven modes, resulting in a total of 7,294 (
For testing the AsymCNN models, we used the same 2,432 test samples as described previously. We finally computed the mean over the tests of the different modes.
Finally, to address RQ3, we analyzed the performance of the AsymCNN and RbCNN models with respect to each target parameter separately. For the discrete parameter, dB over threshold, we analyzed the performance for each radar reception power individually. For the continuous parameters, acceleration and velocity, we first divided each into five and six intervals, respectively, and then analyzed the performance for each interval separately.
Evaluation metrics and statistical significance testing
Model performance was evaluated using standard classification metrics: Accuracy, Precision, Recall, F1-score, and AUC. To assess the statistical significance of differences between models evaluated on the same test set, we followed the methodology proposed by Dietterich. 33 Specifically, we applied McNemars test to compare classification accuracy, as it is appropriate for paired binary outcomes. For Precision, Recall, F1-score, and AUC, which are computed over the entire test set and do not permit per-sample testing, we employed a bootstrap resampling approach with 1,000 iterations. In each iteration, metric differences between models were computed across resampled datasets, and p-values were estimated based on the proportion of sign reversals.
Results
In this section, we present the results addressing the research questions presented in the last section.
RQ1: Comparison with Established Architectures
Figure 12 presents a comparison of the mean evaluation metrics of the two architectures we proposed: AsymCNN and RbCNN, with two competitive architectures: ResNet and DenseNet.

Comparison between ResNet, DenseNet, AsymCNN, and RbCNN models across evaluation metrics. RbCNN consistently outperformed the other models under separate training.
The results show that the RbCNN architecture consistently achieves the highest performance, with scores of 90.9% accuracy, 88.6% precision, 93.9% recall, 91.2% F1-score, and 97.4% AUC. AsymCNN follows closely, significantly outperforming both DenseNet and ResNet across all metrics. While DenseNet shows moderate performance, ResNet yields the lowest scores across the board. These results emphasize the effectiveness of our proposed AsymCNN and RbCNN architectures over existing models, particularly for the classification of complex-valued raw
To assess the significance of the observed differences, McNemars test was applied for accuracy comparisons, and a bootstrap resampling procedure was used for the remaining metrics, as described in Section 7.2. The statistical analysis revealed that the AsymCNN model significantly outperforms both DenseNet and ResNet across all evaluation metrics, with
Figure 13 presents a comparison between mixed and separately trained models based on the AsymCNN architecture, representing variable and constant sea backgrounds, respectively, across five metrics. The results indicate that neither the separately trained models nor the mixed trained models consistently outperform each other across all metrics.

Comparison of Mean Scores Across Metrics for Mixed and Separately Trained AsymCNN Models.
It can be observed that accuracy is higher in models trained on a specific mode representing a constant sea background. This suggests that, in general, AsymCNN models capture patterns more accurately when the sea background is constant rather than variable, resulting in a higher overall accuracy.
Precision is also higher in the constant sea background setting, which implies that training with consistent backgrounds reduces false positives. In this context, fewer spikes are misclassified as targets, enabling more accurate spike recognition in constant conditions.
In contrast, recall is higher with a variable sea background, indicating an improved capacity to capture true positives. This suggests that training in a variable sea environment enhances the model’s ability to detect targets.
While the mixed model achieves higher recall, its lower precision detracts from the F1-score, suggesting that the separately trained model provides a more balanced performance by effectively capturing true positives and minimizing false positives.
The separately trained models demonstrate a higher AUC score, indicating an improved ability to distinguish between targets and spikes across various decision thresholds.
A similar trend is observed for the RbCNN architecture. All differences were found to be statistically significant at the 99.9% confidence level, as confirmed by McNemars test for accuracy and bootstrap resampling for Precision, Recall, F1-score, and AUC.
The statistical analyses further indicate that sea background variations influence the ability to detect small targets.
In summary, each sea background condition has unique strengths: training and testing within a consistent sea background generally improve accuracy and offer a more balanced performance, enabling more precise spike recognition and improves its capacity to distinguish between targets and spikes across different decision thresholds. Conversely, a variable sea background enhances the model’s ability to detect real targets. It is evident that model performance is affected when training and testing are conducted on different modes compared to using the same mode for both. This observation addresses
To understand the characteristics of targets that influence their detectability using the ML model, we examine the impact of a targets velocity, acceleration, and signal strength on its detection capability in a noisy marine environment. To evaluate how each target characteristic influences detection, we analyzed the classification outcomes for all targets. Certain targets were accurately classified by the machine learning model, thus identified as
Figure 14 presents a comparison of the mean recall scores across different characteristics for the AsymCNN and RbCNN models. This shows the third contribution of this paper

Comparison of Mean Recall Scores across Different Characteristics for Separately Trained AsymCNN and RbCNN: (a) dB Over Threshold, (b) Acceleration Intervals, (c) Velocity Intervals.
An interesting result is observed for the ‘velocity’ characteristic. For mid-range values, the behavior resembles that of the acceleration characteristic, where recall values remain relatively stable. However, at the extreme ends of the range, a notable decrease in recall is observed for the AsymCNN model only. This finding suggests that the AsymCNN model has greater difficulty classifying targets with either very low or very high velocities. This phenomenon is influenced by the targets position along the strip. A direct relationship exists between the targets velocity and its location on the strip, where targets with extreme velocity values are positioned near the strip’s edges, as illustrated in Figure 15. Targets located at the edges of the strip appear to be more challenging for the AsymCNN model to detect. Unlike the AsymCNN, the RbCNN model effectively mitigates the edge effect due to its cylindrical structure, achieving an approximately constant trend at the edges, where the classification improvement compared to the AsymCNN model is most pronounced.

Examples of
In this research, we introduced an innovative approach to detect small marine targets in a noisy environment. This approach combines novel deep learning techniques – the Rings-based Convolutional Neural Network (RbCNN), specifically designed for processing cylindrical
Evaluation was performed by collecting
The results demonstrate that the proposed AsymCNN and RbCNN architectures achieve superior performance in detecting small targets within noisy environments compared to existing established architectures. Furthermore, they indicate that variations in sea background conditions influence the models ability to detect such targets. In addition, we identified which target characteristics have the greatest influence on detectability by the ML models, highlighting signal strength as the most influential factor. Finally, our findings show that the cylindrical structure of the RbCNN model effectively mitigates edge effects, leading to improved performance relative to the AsymCNN.
Despite its valuable contributions, the research is subject to several limitations in its execution. It was conducted under specific laboratory conditions and limitations, some of which enabled precise and reliable analysis of the results. For example, each data strip contained either a target or a spike, but not both, ensuring a clear evaluation of individual cases. Although the research environment was relatively homogeneous, with constant-altitude flights over the Mediterranean Sea, the variations in sea backgrounds were sufficient to ensure the reliability of the research.
One limitation relates to the noisy environments represented in the dataset. These reflect only the conditions present on the specific day of data collection and do not capture the full spectrum of possible noise scenarios encountered at sea. Naturally, increasing the number of operational days and example cases would introduce a broader variety of environmental noise, which could further enhance model robustness and generalizability.
Additionally, while the targets were semi-simulative and not fully representative of real-world scenarios, the study relied primarily on real data and, where real data is missing, on a simulation approach validated by domain experts. The spikes, which are particularly challenging to simulate given the unknown and complex noise distribution, are entirely authentic. The targets, simulated by experts using an authentic sea background, were designed to replicate the key optical and spatial characteristics of real marine targets. By simulating targets using parameters reviewed and approved by experts, we ensured that the research findings are both realistic and reliable.
In future work, we plan to evaluate our method using fully real data targets, which will allow for a more comprehensive validation of the proposed methods in operational settings. Our ultimate goal is to train the ML model exclusively on semi-simulated data and achieve robust performance on real data. Additionally, we aim to assess the effectiveness of the RbCNN model in various scenarios and validate its overall efficacy on a broader scale.
Beyond the current architecture, there is significant potential to extend the RbCNN model by integrating advanced classification techniques. For instance, the Neural Dynamic Classification (NDC) algorithm 34 could be used on RbCNN-extracted features to enhance class separation by discovering optimal transformation spaces. Likewise, the Dynamic Ensemble Learning (DEL) algorithm 35 could enable the training of multiple RbCNN variants under different settings, allowing for a more diverse and robust ensemble. Alternatively, the Finite Element Machine (FEMa) 36 offers a fast, parameterless post-classification method that could further reduce computational overhead in real-time deployment. Moreover, integrating self-supervised learning (SSL) approaches,37,38 such as SimCLR-based contrastive pretraining, would allow the model to benefit from a large volume of unlabeled radar data by learning generalized representations prior to fine-tuning. These hybrid strategies would not only improve performance but also increase robustness to new environments and reduce reliance on labeled data, aligning with operational needs in marine radar systems.
Footnotes
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
