Sage Journals: Discover world-class research

Abstract

The detection of small maritime targets by radar demands particular attention due to the challenge of distinguishing them within noisy environments. With advancements in machine learning (ML), several studies have explored integrating ML models to replace or enhance traditional radar target detection methods. In this research, we explore the integration of ML with airborne radar for maritime target detection, a domain that, to the best of our knowledge, has not been addressed in existing studies. We introduce an approach that uniquely applies ML as a secondary stage to enhance detection in scenarios where traditional methods prove insufficient. In this paper, we propose two deep-learning models aimed at improving real-time detection of small marine targets using airborne radar systems. The first model is a Convolutional Neural Network (CNN) designed to handle low-height, highly asymmetrical input dimensions (AsymCNN) adapted to classify small targets and spikes from Range-Doppler ( $R / D$ ) maps. The second, a novel Rings-based Convolutional Neural Network (RbCNN) developed for this research, extends the AsymCNN architecture, optimizing it for cylindrical input data such as $R / D$ maps. To evaluate our method, we used raw radar signal data collected from airborne radar systems operating over the sea. Each signal is represented using two components, in-phase and quadrature ( $I / Q$ ) samples, which together capture both the amplitude and phase characteristics of the signal. We demonstrate that our architectures, AsymCNN and RbCNN, outperform traditional detection methods, Resnet and Densenet, in distinguishing small marine targets from spikes.

Keywords

Radar deep learning marine machine learning range-Doppler

1. Introduction

Detecting small marine targets using radar systems presents significant challenges due to the complex and dynamic nature of maritime environments, where clutter frequently obscure true targets. Clutter, in the context of marine surveillance, arises from wave reflections and other natural formations, often obscuring targets by blending them into the background, especially small, hard-to-detect ones. Similarly, clutter spikes (hereinafter, spikes), caused by brief reflections from small, reflective objects, appear as sharp signal anomalies that can mimic actual targets. The simultaneous occurrence of clutter further complicates target detection, as radar systems must distinguish between persistent background noise and transient anomalies. This interplay creates a particularly challenging scenario, making it difficult for radar systems to accurately differentiate between true targets and spikes.

Traditional methods for mitigating the impact of spikes in radar data primarily rely on signal processing techniques such as thresholding, averaging, filtering, and time-frequency analysis.¹ Thresholding techniques like Constant False Alarm Rate (CFAR) are employed to adjust thresholds to distinguish between noise and legitimate targets,² while Range-Doppler ( $R / D$ ) maps - two-dimensional representations widely used in radar signal processing are used to enhance target detection accuracy.³ However, these methods are still susceptible to false alarms triggered by spikes that temporarily exceed the threshold. To overcome some of these limitations, researchers have explored more advanced signal processing architectures. For example, Mullen et al.⁴ demonstrated that embedding a microwave signal within a LIDAR beam reduces clutter and suppresses false alarms through multimodal sensing.

Despite these efforts, traditional radar processing techniques are limited in their ability to reliably differentiate between true targets and spikes, underscoring the need for more sophisticated approaches.

Machine learning (ML) techniques have increasingly been employed to enhance the detection of marine radar target detection, particularly in addressing challenges posed by noisy environments and clutter.⁵ Existing literature reports the application of various ML models to radar target detection (RTD), with a significant focus on replacing traditional methods with ML approaches. Recent advancements in ML present promising avenues for improving the accuracy and reliability of radar target detection, even in the presence of significant noise and interference.^6,7 These methods perform remarkably well when it comes to identifying large targets in a noisy setting. Nevertheless, identifying small targets poses a unique difficulty because the noisy surroundings may show up in the $R / D$ map very similar to the targets. A similar challenge arises in drone-based maritime applications, where drone imagery was applied to detect small floating objects in water.⁸ We propose the remarkable advancements in machine learning (ML), and especially deep learning, over the last several years could be a useful strategy for enhancing such target recognition in radar applications.

In this paper, we address the challenge of distinguishing and classifying small marine targets and spikes in scenarios where traditional radar target detection (RTD) methods prove insufficient. Specifically, we focus on the use of airborne radar, a context that introduces unique challenges due to the movement of the radar platform.

To differentiate between spikes and small targets, we introduce two deep-learning classification models designed to classify targets and spikes using Range-Doppler ( $R / D$ ) maps as inputs. The first model is a Convolutional Neural Network (CNN) designed to handle low-height, highly asymmetrical input dimensions and classify $R / D$ radar stripes as small targets or spikes, referred to hereafter as the Asymmetrical CNN (AsymCNN). The second model, a novel approach developed for this research, is named Rings-based Convolutional Neural Network (RbCNN). This innovative model extends the AsymCNN architecture, optimizing it specifically for cylindrical inputs such as $R / D$ maps, rather than for data represented as arrays or tensors, as is typical in conventional CNN models.

To evaluate our method, we collected raw radar signal data, specifically in-phase and quadrature ( $I / Q$ ) samples, from airborne radar systems operating over the Mediterranean Sea. Using traditional radar signal processing techniques, we extracted real spikes and generated semi-simulated small targets as $R / D$ maps. Our results show that AsymCNN and RbCNN effectively classify targets in scenarios where traditional physical models fail. Both models outperform leading CNN architectures such as ResNet and DenseNet, with RbCNN demonstrating superior performance over AsymCNN.

Our research makes two key contributions: First, from a physical perspective, we improve the capability to distinguish between small targets and spikes in noisy marine environments tracked by airborne radar, by integrating machine learning techniques. Second, from a machine learning perspective, we developed and tailored a CNN model specifically designed to handle low-height, highly asymmetrical dimensions (AsymCNN), and introduced a novel Rings-Based CNN model optimized for cylinder-shaped inputs (RbCNN).

2. Related work

This section reviews existing literature on various ML models employed for radar target detection. The discussion follows a methodological progression from traditional ML techniques to deep learning, hybrid models, and recent advances like generative and cross-sensor methods.

Traditional machine learning techniques have been instrumental in early advancements in radar target detection. Callaghan et al.⁹ explored the application of machine learning to suppress sea clutter, comparing k-nearest neighbors (k-NN) and Support Vector Machine (SVM). Their study concluded that k-NN performs better than SVM in distinguishing between targets and clutter. Similarly, Li et al.⁶ focused on detecting small targets within sea clutter by extracting discriminative features from time and frequency domains. They employed a binary SVM algorithm for classification, which showed significant improvements in detection probability over classical detectors using the IPIX database.

The use of deep learning, particularly Convolutional Neural Networks (CNNs), has significantly advanced radar target detection. Su et al.¹⁰ proposed a maritime target detection method based on CNNs using IPIX-measured sea clutter and target signal data. They trained LeNet and GoogLeNet models, demonstrating their high precision and effectiveness in feature extraction and recognition tasks.

Pan et al.¹¹ introduced a novel approach using Faster Region-based Convolutional Neural Network (Faster R-CNN) to extract features from pulse-distance two-dimensional images. Their results indicated that this method achieves higher detection probabilities than traditional CFAR methods. Mou et al.¹² further improved Faster R-CNN by incorporating advanced techniques such as soft-NMS and Precise ROI Pooling, achieving better accuracy and reliability in marine target detection.

CNNs can directly process raw radar data and have been shown to significantly improve detection accuracy. O’Shea and Clancy¹³ applied CNNs to radar classification tasks, demonstrating their capability to handle complex radar data.

Combining different machine learning techniques have been proven to be robust approaches in radar target detection. Guo et al.¹⁴ proposed a method using deep convolutional auto-encoders (DCAEs) for filtering sea clutter and logistic regression for classification, achieving higher detection accuracy with IPIX radar data. Ensemble methods were explored by Zhang et al.¹⁵ presented a hybrid model combining CNNs and SVMs, leveraging the strengths of both methods for enhanced performance in clutter-rich environments.

Recent studies have introduced innovative approaches to enhance radar target detection in marine environments. Ma et al.⁷ developed a multi-source input neural network (MSINN) and utilized the yolov3-tiny model for sea clutter pre-processing, significantly improving detection efficiency. Linghu et al.¹⁶ proposed a deep neural network model to study sea clutter characteristics and parameter inversion, improving predictions by incorporating environmental features such as wind speed and wave height. Zhao et al.¹⁷ introduced a machine learning-based processor combining artificial neural networks (ANN) and DBSCAN clustering, named DBSCAN-CFAR, which showed robust performance under varying clutter conditions.

Time-frequency analysis has been another significant approach. Tang et al.¹⁸ proposed a method using deep learning with time-frequency characteristics for sea clutter suppression, employing discrete wavelet transform (DWT) and LeNet-5 neural networks to classify and identify sub-band signals, achieving higher recognition accuracy.

Generative Adversarial Networks (GANs) have been explored for their ability to generate synthetic radar signals, augmenting training datasets. Pei et al.¹⁹ designed a clutter suppression network based on CycleGAN, demonstrating superior performance in both simulated and measured marine radar data. Wu et al.²⁰ also utilized CycleGAN for sea clutter suppression, showing improvements in signal-to-clutter ratio (SCR) and stability. In a related context, Chun et al.²¹ combined StyleGAN2-ADA and YOLOv5 to boost buried pipe detection in GPR imagery by generating hard examples, achieving higher accuracy with fewer labeled samples. Similarly, Huang et al.²² used SA-DenseCL and Mask R-CNN to improve GPR tunnel inspection, achieving higher precision than conventional pretraining methods. Beyond ground-based radar, Mohan and Simske²³ developed a cross-sensor maritime detection system using CNNs trained on optical and SAR imagery, demonstrating effective generalization across sensor types. In material inspection, Dao et al.²⁴ evaluated GPR and ANN models for predicting concrete strength, showing that nonparametric methods like GPR improve uncertainty-aware estimation based on radar-derived features.

In the context of these advancements, Rahman et al.²⁵ explored a machine learning-based approach for maritime target classification and anomaly detection using millimeter-wave radar Doppler signatures. Utilizing experimental data from W-band and G-band radars, the study targeted eight classes of maritime objects, including boats, swimmers, and buoys. By extracting features from Doppler spectra and spectrograms, the authors achieved validation and test accuracy of up to 93.3% and 88.7%, respectively. Their use of a one-class support vector machine (OCSVM) for anomaly detection also proved effective in identifying outliers with high accuracy. The research addresses the complexities posed by sea clutter and spikes, offering a machine-learning solution that complements and enhances traditional radar detection methods, highlighting the potential of millimeter-wave radars in maritime surveillance.

In conclusion, many attempts have been made to apply machine learning models to radar target detection (RTD). However, to the best of our knowledge, previous research has primarily focused on stationary radar systems, whereas our study explores the application of airborne radar in this context. Moreover, most existing studies have focused on replacing traditional RTD methods with ML techniques. Instead, our approach leverages ML as a secondary stage specifically to address scenarios where traditional RTD methods are insufficient. By focusing on these challenging cases, even small improvements in our model’s performance yield significant value, enhancing detection capabilities in complex environments.

3. Problem definition

Radar images are constructed from $I / Q$ data, a complex-valued representation of radar signals, where the in-phase $(I)$ component captures the real part of the signal and the quadrature $(Q)$ component captures the imaginary part. This representation enables the extraction of both amplitude and phase information from radar returns, providing richer insights into the target’s characteristics. Analyzing $I / Q$ radar data is essential in radar signal processing tasks such as target detection, tracking, and classification. In this paper, our objective is to classify small marine targets and spikes from radar images constructed using complex-valued raw data signals in the form of $I / Q$ radar data.

The detection of small marine targets using radar systems poses significant challenges, primarily due to the complex and dynamic nature of maritime environments. The presence of clutter and spikes in the marine environment exacerbates this issue because the nature of sea spikes returns can be very similar to that of returns from a desired target.

Clutter, in the context of marine surveillance, manifests in various forms, such as wave reflections or other natural formations. Clutter can obscure targets entirely, blending them into the background and thwarting detection efforts. This phenomenon is particularly problematic for small targets, which are already challenging to distinguish in the marine landscape. Similarly, spikes, in the context of marine surveillance, typically appear as sudden, sharp signal anomalies caused by brief reflections from small, highly reflective objects. These spikes can temporarily mimic the appearance of actual targets, leading to confusion in target identification. This phenomenon becomes especially problematic in situations where rapid and accurate detection is crucial, as it can lead to delays in response or the misclassification of spikes as valid targets.

Using airborne radar for maritime target detection presents unique challenges not encountered with static radar systems located in fixed areas. These challenges arise primarily due to the rapid movement of the aircraft, which induces fast changes in sea state variability.

Dynamic sea conditions, such as wave height and roughness, change rapidly during flight. These changes can significantly alter the radar signature of the surface, making it difficult to distinguish between genuine targets and environmental noise. The behavior and characteristics of the sea significantly differ between the open ocean and near-coastal areas, further complicating target detection.

Additionally, the radar’s angle of incidence changes continuously with the aircraft’s flight path. These variations affect the strength and quality of the radar return, leading to inconsistent detection and classification of small targets. This inconsistency poses a significant challenge in maintaining reliable target identification in a dynamic maritime environment.

Traditional methods for mitigating the impact of spikes in radar data primarily rely on signal processing techniques such as thresholding, averaging, filtering, and time-frequency analysis.¹ Fixed and adaptive thresholding techniques, like Constant False Alarm Rate (CFAR), which will be explained in Step 4.2 in Section 4.2, adjust detecting thresholds to differentiate between noise and legitimate targets. However, these approaches can still be susceptible to false alarms caused by spikes that temporarily exceed the threshold as can be seen in Figure 1, which presents data from our experimental evaluation.

Figure 1.

Traditional Radar Target Detection by CFAR.

To address the challenge of spikes crossing the CFAR threshold, traditional methods suggest comparing Range-Doppler ( $R / D$ ) maps across adjacent sweeps to utilize the temporal consistency of true targets, as explained in Step 4.2 of Section 4.2 Signals that persist across multiple $R / D$ maps are classified as targets, while transient signals are identified as spikes as demonstrated in Figure 2, also based on our experimental data.

Figure 2.

Traditional Radar Target Detection by Sequence of Range-Doppler ( $R / D$ ) Maps. Signals circled in red represent targets, while signals circled in blue indicate spikes. The sequence shows three signals consistently present across all three maps, which are classified as targets. In contrast, the remaining three signals appear only in the first map and are classified as spikes.

The figure presents a sequence of Range-Doppler ( $R / D$ ) maps. In the first map (time=1), six signals exceed the CFAR threshold. Signals circled in red represent targets, while signals circled in blue indicate spikes. To classify these signals as either targets or spikes, we compare the first $R / D$ map with the two adjacent maps. If a signal is present in all three consecutive maps, it is classified as a target; otherwise, it is considered a spike. In this specific case, three signals appear in all three maps and are classified as targets, while the remaining three signals are observed only in the first map and are classified as spikes.

This approach effectively reduces false alarms by filtering out brief anomalies, thereby enhancing detection accuracy in cluttered environments. However, it has some limitations. First, this approach may miss the detection of fast-moving or small targets. Also, it increases the computational load and the time required to process multiple sweeps. Finally, its sensitivity to environmental changes may cause variability in radar returns.

These limitations underscore the need for more advanced approaches, such as machine learning, to effectively manage spike detection in radar systems.

4. Proposed method description

Our main goal is to rapidly detect marine radar targets in a noisy environment where traditional RTD methods are insufficient. As mentioned earlier, the main challenge lies in differentiating small targets from spikes, as sea spike returns can closely resemble those of desired targets. Therefore, our research focuses on the challenging task of accurately classifying small targets and spikes.

We proposes a method as shown in Figure 3: (1) Mathematical process of radar signal processing; (2) Extraction and generation of spikes and small targets; (3) Data pre-processing; (4) Classification by Deep Learning models.

Figure 3.

The proposed method takes $I / Q$ radar signals as input and outputs a classification of the target or spike.

4.1. Mathematical process of radar signal processing

The mathematical process of the radar $I / Q$ signals includes four steps. At the end of this process, the $I / Q$ radar signals are converted to Range-Doppler maps ( $R / D$ maps).

$R a n g e - D o p p l e r (R / D) m a p s$ are two-dimensional representations commonly used in radar signal processing. They depict the distribution of radar echoes over both the range and the Doppler frequency. Essentially, they provide a range-frequency view of radar returns, allowing for the detection and characterization of targets within the radar’s surveillance area.

The initial three sub-processes, as presented in Figure 3, of the ‘Mathematical Process of Radar Signal Processing,’ involve corrections to the aerial radar, followed by the final stage, which includes the FFT (Fast Fourier Transform) process. These are well-established mathematical procedures documented in the literature.^26,27 Next, we describe each sub-process separately.

4.1.1. Pulse compression

Pulse compression enhances radar range resolution and signal-to-noise ratio (SNR) by transmitting a long pulse for high energy and then compressing it in time after reception. This approach allows the radar to distinguish closely spaced targets despite the longer pulse, achieving high SNR and fine resolution simultaneously. The range resolution, defined by $R_{res} = \frac{c}{2 B}$ improves as bandwidth $B$ increases, effectively enabling precise target detection in noisy environments.

4.1.2. Partition to distance segments

After the pulse compression process, dividing the radar range into multiple distance segments becomes essential. This partitioning allows us to account for the varying characteristics of radar returns across different distances. Each sub-range may differ in noise levels or interference, enabling focused analysis and targeted filtering of less reliable segments. Specifically, the radar range is divided into ten sub-range intervals.

4.1.3. Mathematical correction for radar’s kinesis

When a radar system is mounted on a moving platform, such as an aircraft or a ship, the motion of the platform introduces additional errors and complexities in the radar measurements. These include Doppler shifts, changes in the radar’s position and orientation, and platform-induced biases, all of which must be corrected to ensure accurate target detection and tracking. Mathematical correction techniques, such as Kalman filtering, are essential for compensating these motion-induced errors by continuously updating the radar’s position and velocity estimates. This correction ensures that the radar can maintain precise geolocation of targets and reliable tracking, despite the platform’s movement. Without such corrections, the accuracy and reliability of the radar system would be significantly degraded, leading to erroneous target information.

4.1.4. Fast fourier transform process

After partitioning the radar range into ten periods of sub-ranges and applying mathematical corrections, the Fast Fourier Transform (FFT) is used to convert the time-domain radar signals into the frequency domain. This transformation allows the extraction of both range and Doppler information, resulting in the creation of Range-Doppler maps. These maps display the detected targets in terms of their range and relative velocity, providing a two-dimensional view that helps to distinguish between multiple targets and accurately estimate their speeds. The FFT process improves the radars ability to analyze the frequency components, thereby improving the resolution and accuracy of target detection within each sub-range.

At the end of this stage, we obtain matrices representing Range-Doppler ( $R / D$ ) maps, sized at 1,000 range gates by 256 Doppler frequencies, containing a variety of targets and spikes of different sizes and intensities. The range gate size and the number of Doppler frequencies are determined by the radars characteristics, which will be detailed later in Subsection 7.1.1.

4.2 Extraction and generation of spikes and small targets

In this step, we undertake the transformation of matrices representing Range-Doppler ( $R / D$ ) maps, initially sized at 1,000 Range-Gates by 256 Doppler frequencies, and containing a multitude of targets and spikes of diverse sizes and intensities, into strip matrices of dimensionality 3 Range-Gates by 256 Doppler frequencies.

After the transformation process, each strip either contains a single small target or a spike which, according to the traditional physical model, might erroneously be classified as a small target. These strips are then fed as input data to our Machine-Learning (ML) model in subsequent phases, enabling it to tackle the challenging classification task that the physical model is unable to classify successfully.

This transformation process is performed for two primary reasons: First, to exclude targets or spikes from the ML model that do not require complex classification. Second, to ensure that the performance of the ML model , as measured by the metrics presented in the 7.2 subsection, is rigorously evaluated, avoiding artificially high scores from inputs that are easily detectable by the physical model.

The data production process for our research, which we will elaborate on in the evaluation section, involves a flight over the Mediterranean Sea using a maritime radar. This process yielded a considerable amount of spikes essential for training the Deep-Learning (DL) model, which, like any data-driven model, exhibits a data-starved nature. However, the dataset lacks a significant number of small targets crucial for effective model training. The lack of available labeled data is a well-known difficulty in applying machine learning-based methods to Radar Target Detection (RTD).

To address the challenge of limited amount of small targets, we injected simulated targets onto an authentic sea background, resulting in the creation of semi-simulative strips of targets (the specifics of these simulative targets are discussed further in the evaluation section). On the one hand, this approach facilitates effective training of the DL model; on the other hand, it preserves the authentic sea background, including clutter with an unknown and complex noise distribution.

As mentioned above, the first step in processing the $R / D$ maps is to split their size from $1, 000 \times 256$ into strip matrices of dimensionality $3 \times 256$ . This process includes six sub-stages, which we describe as follows:

Filtering Out Range-Doppler Maps:

To ensure reliable separation between small targets and spikes, it is essential to filter out $R / D$ maps that do not meet defined quality thresholds. This filtering process focuses on excluding maps generated from radar returns at ranges that are either too high or too low relative to the radar position. Such extreme ranges may introduce inaccuracies due to the radars limitations, as they are more likely to produce unreliable or noisy data. Specifically, the closest $R / D$ map is excluded because the radars antenna does not fully cover close ranges, resulting in incomplete data collection. At the farthest ranges, the antennas shallow angle allows thermal noise to dominate, creating noise levels that exceed the desired spike threshold and could obscure target detection. As part of this selective filtering, we retain only sub-ranges 2, 3, and 4 from an initial set of 10 segments. Within these selected sub-ranges, additional filtering is applied to ensure that the $R / D$ maps contain significant spikes, specifically originating from sea reflections rather than thermal noise. To achieve this, we selected $R / D$ maps where the mean spike intensity exceeds the mean noise level by at least 10 dB.

Define Threshold by CFAR Approach:

The first step employs the traditional CFAR approach. This step involves applying a threshold, which is defined separately for each range-Doppler ( $R / D$ ) map, to the radar signals to identify possible targets.

The threshold $T$ is defined as: $T = α \cdot Noise Level$ where $α$ is a scaling factor determined by the desired false alarm rate.

Detection and Removal of Large Targets: The next step involves checking the size of the signals. If a signal exceeds the threshold defined by the CFAR and extends across more than two range cells, it is classified as a large target and removed from the range-Doppler ( $R / D$ ) map before being inputted into the ML model. Since full ground-truth labeling is not available in all cases, we assume with high probability that signals extending across more than two range cells are not spikes.

Spikes Extraction: If a signal exceeds the threshold defined by the CFAR and extends across only one range cell, it is suspected to be either a small target or a spike. To ensure that a single range cell detection is a spike rather than a small target, an additional procedure is required: comparing the tested range-Doppler ( $R / D$ ) map with the next two adjacent maps. If it is observed in both adjacent $R / D$ maps, it could be classified as either a small target or spike. Since there is uncertainty, it is removed from the $R / D$ map. Otherwise, the detection is classified as spike and is saved along with the range cell above and the range cell below, forming a spike strip sized $3 \times 256$ . To ensure the reliability of the extracted spikes, quality control and sampling-based verification were performed through experiments conducted by radar engineers.

In an ideal scenario without deviations, the desired matrix size would be one-dimensional, specifically $1 \times 256$ . However, due to the potential spillover of targets and spikes into the pixels above and below, we utilize a matrix size of $3 \times 256$ . After running this process, we have a data set of $3 \times 256$ $R / D$ map strips, each containing spikes across all strips.

Identifying Segments Depicting Smooth Sea Background: After extracting spike strips, our aim is to create semi-simulative targets. To ensure balanced and effective model training, we generate an equal number of targets and spikes, producing one semi-simulative target for each extracted spike. The first step involves identifying and preserving strips of $3 \times 256$ size in the $R / D$ map that do not contain spikes or targets. The number of selected strips corresponds to the number of spikes found in the specific $R / D$ map. In the second step, targets are injected as described in the following section.

Injecting Simulated Targets onto the Sea Background: The process of simulating small maritime targets and embedding them in the $R / D$ maps is guided by specific target parameters and radar processing techniques. In this modeling, we generate point targets by selecting a set of three key parameters that shape the radar signature: velocity, acceleration and dB over threshold intensity. This enables us to construct a diverse target dataset. Each parameter uniquely influences the appearance of the $R / D$ target strips. Velocity determines the target’s position along the horizontal axis, with higher velocity placing the target further to the right. Acceleration affects the target’s spread across the horizontal axis, with greater acceleration results in broader spread, while intensity dictates the target’s brightness.

Figure 4 presents examples demonstrating how each parameter affects the appearance of the strip. For instance, the bottom-right strip is defined by a dB over a threshold value of 2 (mid-level) and a velocity of 1.918 m/s (mid-range). As a result, the target’s position on the strip is centered, corresponding to the mid-range velocity, and its brightness is distinctly visible. In contrast, the upper-left strip is characterized by a dB over the threshold value of 0 (low level) and a velocity of 3.577 m/s (high range). Consequently, the target’s position shifts to the right, reflecting the high-range velocity, while its brightness is significantly reduced.

Figure 4.

Examples of $R / D$ strips with targets. Each strip is labeled with characters along its top. The horizontal position of the target corresponds to velocity, with higher velocities shifting the target further to the right. Acceleration affects the horizontal spread of the target, with greater acceleration resulting in a broader distribution. Intensity determines the target’s brightness.

Upon completion of this stage, we obtain a set of $3 \times 256$ strips, balanced between real spikes and semi-simulated targets. These strips will be utilized as input data for our ML model in subsequent phases.

4.3. Data preprocessing

At this stage, we process the $3 \times 256$ matrix strips of the $R / D$ maps before feeding them into the ML model to ensure optimal performance and interpretability of the model. We employ well-known established methods from the literature in both radar technology and the field of ML:

Let $R D$ represent the set of all $R / D$ strip matrices, where each matrix corresponds to a $R / D$ map strip. Let $x$ denote a pixel within any of these strips. The first stage, derived from radar literature, involves applying the following mathematical operation to each pixel $x$ :

$\forall x \in R D : x = 10 \log_{10} (x)$ , where $R D \in R D$ . This method is widely used to convert pixel values to a logarithmic scale, or decibels (dB). This transformation compresses the dynamic range, making weak and strong signals more distinguishable, and enhances visualization by aligning with human perception.

The second stage involves two basic and well-known elements:

Matrix size verification: $size (R D) = 3 \times 256$

Data normalization between 0 and 1: $x^{'} = \frac{x - min (X)}{max (X) - min (X)}, where x \in R D$

This ensures that the ML model learns the data accurately and efficiently.

4.4. Deep learning classification models

In this section, we present two deep learning classification models designed to distinguish between targets and spikes using ( $R / D$ ) map inputs. The first model is a CNN specifically tailored to classify $R / D$ radar stripes as small targets or spikes, referred to hereafter as the Asymmetrical CNN (AsymCNN). Adapting the CNN model for this task presents several challenges, which will be discussed in detail later in this section. The second model, a novel approach we developed for this research, is called the Rings-based Convolutional Neural Networks (RbCNN). This innovative model builds on the AsymCNN architecture, optimized specifically for cylinder-shaped inputs such as $R / D$ maps.

4.4.1. Convolutional neural networks

Convolutional Neural Networks (CNNs) are a class of deep learning models particularly effective for processing structured grid-like data such as images. They operate by applying learnable convolutional filters to input data to automatically extract hierarchical spatial features, often followed by pooling operations for dimensionality reduction and dense layers for final classification or regression.

CNNs are trained end-to-end using backpropagation and gradient-based optimization methods such as Adam. Their architecture, combining local feature extraction, parameter sharing, and non-linear activation functions, has made them a foundational tool in computer vision and related fields. For a comprehensive overview of CNNs, see.^28,29

4.4.2 Asymmetrical CNN-Based Model for Radar Target Detection

CNNs have evolved into highly flexible architectures that can be adapted to a wide range of data types and tasks. Recent studies demonstrate how CNN-based models are being tailored for complex, domain-specific challenges beyond traditional image classification. For example, Urdiales et al.³⁰ proposed a hybrid architecture combining convolutional LSTM, Siamese, and recurrent networks for improved multi-object tracking. Ruiz et al.³¹ developed a lightweight SSD-based model with precision refinement, achieving over 99% accuracy in detecting circular fixation elements in aircraft manufacturing. These recent advancements demonstrate the potential of CNN-based models to be extended and optimized for specialized and domain-specific applications.

The deep learning model we have developed is fundamentally an image recognition model and is, therefore, based on a basic image recognition model such as CNN. However, adjustments to the model’s architecture are required to address the challenge we are facing, which is different from classic image recognition. Our model aims to accurately classify between small targets and spikes within an $R / D$ image obtained from radar.

In addition to the difficulty involved in recognizing $R / D$ images using CNNs, we have addressed inherent challenges arising from the input shape. As mentioned in section 4.2, the input matrix size is $3 \times 256$ . This specific size presents unique challenges for CNNs due to the low height and extremely asymmetrical dimensions. These challenges can be categorized into those related to the horizontal axis, the vertical axis, and the combination of both:

Horizontal Axis Challenges: The primary challenge on the horizontal axis for a matrix input size of $3 \times 256$ is to achieve a sufficiently large receptive field. Araujo et al.³² have observed a logarithmic relationship between classification accuracy and receptive field size, highlighting the importance of adequate receptive field coverage. The formula used to calculate the receptive field size after L layers is given by: $r_{0} = \sum_{l = 1}^{L} ((k_{l} - 1) \prod_{i = 1}^{l - 1} s_{i}) + 1$ Where:

–
$r_{0}$ : The receptive field size after all layers.
–
$L$ : The total number of layers in the CNN.
–
$k_{l}$ : The filter size (or kernel size) in the $l$ -th layer.
–
$s_{i}$ : The stride in the $i$ -th layer.
–
$\prod_{i = 1}^{l - 1} s_{i}$ : The product of the strides from the first layer up to the $(l - 1)$ -th layer, representing the accumulated stride effect.

Given the extended width of 256 pixels, it is essential to capture long-range dependencies and contextual information across this entire dimension. However, with a limited height of only 3 pixels, standard symmetric convolutional filters (e.g., $3 \times 3$ ) are less effective at expanding the receptive field quickly enough to cover the horizontal axis adequately. For example, to achieve a receptive field of 68 pixels using $3 \times 3$ filters with a stride of one, we would need 34 layers, as each layer increases the receptive field by only 2 pixels. The receptive field $r_{0}$ after $L$ layers, when the filter size $k$ is consistent across all layers and the stride $s$ is constant and equal to 1, can be calculated using the simplified formula: $r_{0} = (k - 1) \cdot L + 1$ For a $3 \times 256$ input matrix using $3 \times 3$ filters with a stride of 1, the number of layers $L$ required to achieve a receptive field of 68 pixels in the horizontal axis can be calculated as follows: $\begin{aligned} 68 & = (3 - 1) \cdot L + 1 \\ L & = 33.5 \Rightarrow L = 34 layers \end{aligned}$ Of course, we prefer to avoid using 34 layers because having too many layers introduces several challenges that can impact performance. One major issue is the vanishing and exploding gradient problem, where gradients become extremely small or large during backpropagation, hindering effective learning. Additionally, gradient instability, where small perturbations can cause large output variations, poses another challenge, making the network difficult to train reliably. Similarly, overly large filters increase the number of parameters, raising the risk of overfitting. We aim to achieve an optimal balance between model depth and filter size to enhance generalization and training stability. Specifically, we defined non-standard filter sizes and strides that gradually converge toward a symmetrical shape with a stride of one.

In the model tailored for this purpose, we applied a $3 \times 16$ filter with a stride of 4 in the first layer, a $3 \times 8$ filter with a stride of 2 in the second layer, and a $1 \times 4$ filter with a stride of 1 in the third layer. This approach significantly expands the receptive field more efficiently, achieving a receptive field of 68 pixels with just three layers.

Vertical Axis Challenges: The vertical axis poses a different challenge due to the low height (3 pixels). To connect all the pixels in the vertical dimension, we need to use filter sizes larger than one. However, this requires careful handling of padding. Without padding, the vertical dimension is reduced, effectively transforming the network into a 1D CNN, which we aim to avoid for as long as possible. On the other hand, excessive padding in subsequent layers can lead to inefficiencies and unnecessary computations. For the reasons discussed earlier, padding was applied only in the first layer. From the second layer onward, padding was omitted because the edge pixels no longer contained sufficient useful information (in fact, one-third of the information in each edge pixel was redundant). This approach effectively transformed the CNN model into a combination of 2D and 1D structures. It is acceptable to transition from a 2D to a 1D structure because, as discussed in Step 4.2, our physical analysis indicates that the three vertically adjacent pixels represent a single spike or target, with potential spillover into the adjacent pixels above and below.

Additionally, we avoid using pooling in the vertical axis because it may lead to information loss and accelerate the transition from 2D to 1D CNN, which is undesirable.

The depth axis input begins as 1D and gradually increases in volume with each subsequent layer, a common characteristic in CNNs.

Network Architecture:

The idea behind designing the AsymCNN architecture is to address the aforementioned challenges and achieve optimal classification. This is accomplished by gradually modifying each axis independently, resulting in an integrated 2D and 1D CNN model, as outlined below:

First Layer: Eight $3 \times 16$ filters with a stride of 4 and padding to maintain the spatial dimensions and ensure initial feature extraction.

Second Layer: Sixteen $3 \times 8$ filters with a stride of 2, designed to progressively reduce the horizontal dimension and transitioning the model from 2D to 1D convolution to capture sequential dependencies.

Third Layer: A 1D convolutional layer with thirty-two $1 \times 4$ filters and a stride of 1.

Flatten Layer: This layer flattens the 3D feature maps (or 2D in our case) into a 1D vector, preparing the data for fully connected layers.

Dropout Layer: Applies a dropout rate of 0.7 to prevent overfitting by randomly setting 70% of the input units to zero during training.

First Fully Connected Layer: Contains 128 units to enable complex feature interactions.

Second Fully Connected Layer: Contains 32 units, further refining the feature interactions.

Output Layer: Contains a single unit with a sigmoid activation function for binary classification between target and spike.

The proposed model was developed through a structured three-stage methodology designed to systematically optimize both the architecture and training process. All architectural choices are supported by ablation and sensitivity analyses, as detailed in Section 5.

Figure 5 illustrates the construction of the AsymCNN architecture.

Figure 5.
AsymCNN for $R / D$ maps classification. The AsymCNN structure enables gradual convergence along the vertical and horizontal axes while progressively increasing the depth axis, resulting in a robust and efficient architecture tailored for the $3 \times 256$ $R / D$ maps input matrix.

Loss function

We adopt binary cross-entropy as the loss function. Binary cross-entropy loss, often used in binary classification tasks, measures the discrepancy between predicted probabilities and true binary labels. For a single example with true label $y$ and predicted probability $\hat{y}$ , the binary cross-entropy loss $L (y, \hat{y})$ is given by: $L (y, \hat{y}) = - (y \log (\hat{y}) + (1 - y) \log (1 - \hat{y}))$ Where $y$ is the true label (either 0 or 1), $\hat{y}$ is the predicted probability of the positive class (usually obtained from the output of the sigmoid activation function), $\log$ represents the natural logarithm.

For a dataset with $n$ samples, the overall binary cross-entropy loss $L$ is the mean of the losses for all samples: $L = \frac{1}{n} \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i})$

Activation functions

In our model, we implement the Rectified Linear Unit (ReLU) activation function across all hidden layers of our neural network architecture. Additionally, we utilize the sigmoid activation function specifically in the final output layer. The ReLU activation function is a widely used non-linear activation function in deep learning. It introduces non-linearity to the model by thresholding the input at zero, resulting in output values that are either zero for negative inputs or linearly proportional to positive inputs. Mathematically, the ReLU activation function can be defined as: $f (x) = max (0, x)$ where $x$ represents the input to the neuron or layer. This function is piecewise linear and computationally efficient, making it a popular choice in deep neural networks. The sigmoid activation function is a known choice in neural networks for binary classification tasks. It maps input values to a range between 0 and 1, making it suitable for modeling probabilities. Mathematically, the sigmoid function is defined as: $f (x) = \frac{1}{1 + e^{- x}}$ where $x$ represents the input to the neuron or layer.
4.4.3. Rings-Based convolutional neural network

The conventional flat and two-dimensional representation of the $R / D$ map is merely a partial depiction. The accurate representation of the $R / D$ map is indeed cylindrical as described in Figure 6.

Figure 6.

The two-dimensional representation of the $R / D$ map shown above is a flattened version of the cylindrical shape illustrated below.

The $R / D$ map exhibits a cylindrical shape due to the inherent periodicity of the Doppler frequency axis, a result of applying a Fourier transform to radar signals. This transform converts time-domain data into the frequency domain, creating a Doppler spectrum that repeats at intervals defined by the sampling frequency $F_{s}$ . As the Doppler frequency axis is cyclic, frequencies exceeding the Nyquist limit, given by $\frac{F_{s}}{2}$ , wrap around and reappear at the start of the frequency range. Conceptually, this cyclic behavior can be visualized as the Doppler axis forming the circular part of a cylinder, with the range axis extending linearly along its length. The relationship between the Doppler frequency $f_{d}$ and the target’s velocity $v$ is expressed as: $f_{d} = \frac{2 \cdot v}{λ}$ where $λ$ is the wavelength of the radar signal. Consequently, as targets increase in velocity, their corresponding Doppler frequencies will loop back on themselves when $f_{d}$ exceeds the Nyquist limit, giving the $R / D$ map its characteristic cylindrical representation.

Analyzing the $R / D$ map in a two-dimensional, non-cylindrical manner risks losing valuable information located at the map’s edges. Hence, we aim to adapt the CNN model to seamlessly merge the cylindrical shape of the map, ensuring the preservation of data. This innovative model, developed by us for this purpose, signifies a novel approach to our research.

The novel model we have coined RbCNN is essentially a transformation of the AsymCNN model introduced earlier. However, rather than adhering to the conventional, sprawling architecture of a CNN model, we have reimagined it into a ring-based framework as shown in Figure 7. This adaptation enables the data to maintain its original format without the need for defining an intersection point for the $R / D$ map, thus mitigating potential information loss at the edges. By preserving data continuity at the edges, RbCNN ensures better feature extraction and more accurate target detection, making it particularly well-suited for $R / D$ maps.

Figure 7.

Transforming the AsymCNN model into a Rings-based CNN model enables the input data to retain its original format without the need to define an intersection point for the $R / D$ map, thereby reducing potential information loss at the edges. (a) AsymCNN for Two-dimensional $R / D$ Map classification; (b) RbCNN for Cylindrical $R / D$ Map Classification.

The RbCNN architecture introduces an innovative approach to handling the cylindrical nature of $R / D$ maps, enhancing the performance of convolutional layers in situations where boundary effects could otherwise degrade feature extraction. By enabling seamless convolution across matrix boundaries, RbCNN offers a novel solution for improving radar-based target detection.

Implementation of RbCNN

The RbCNN model builds upon the classical Convolutional Neural Network (CNN) architecture by introducing a custom mechanism designed to better handle the circular structure of Range-Doppler ( $R / D$ ) maps.

Standard CNNs often face challenges with edge effects, but these can be addressed when the data exhibits a wrap-around nature, as in the case of $R / D$ maps. The RbCNN overcomes these challenges by simulating a cylindrical structure, which better preserves continuity across matrix boundaries.

To implement this cylindrical architecture simply and efficiently, each convolutional layer includes matrices of size $n \times m$ , a number of columns from the left side of the matrix are copied and appended to the right side, or vice versa. The number of copied columns is determined by the width of the convolutional filter, $F_{w}$ . Specifically, $F_{w} - 1$ columns are copied, ensuring that the convolutional filter can operate seamlessly across matrix boundaries without edge effects.

Figure 8 illustrates this process for a matrix of size $n \times m$ and a filter width of $F_{w} = 4$ . The matrix demonstrates the RbCNN operation, where the first three columns on the left ( $X_{1}, X_{2}, X_{3}$ ) are copied and appended to the right, with the copied cells highlighted in gray. This creates a cylindrical structure, allowing the convolutional filters to seamlessly wrap around the matrix, preserving information at the edges.

Figure 8.

An example of the RbCNN applied to a single matrix in the RbCNN model. In this case, 3 columns ( $F_{w} - 1 = 3$ ) are copied from the left side of the matrix to the right, creating a cylindrical structure for further convolution operations. (The copied columns are highlighted in gray).

This process is applied across all convolutional layers, but is not needed in the dense layers. Dense layers treat the input as a whole and do not encounter the same edge-related issues as convolutional layers. Since dense layers integrate the entire input, regardless of its position or shape, there is no need to copy columns to preserve continuity.

One of the key ideas and advantages of the RbCNN model compared to the baseline CNN is that, instead of padding the edges with zeros which carry no meaningful information RbCNN pads the edges with valuable data. This approach enables improved performance without introducing additional weights to the CNN layers or increasing computational complexity, as detailed in Section 6.

5. Model architecture design and optimization

The model development process was structured into three main stages, systematically refining the architecture and training parameters to maximize classification performance.

To justify the proposed architecture and evaluate the impact of design choices, we employed a structured three-stage methodology, progressing from coarse to fine granularity. First, an ablation study was conducted to determine the optimal architectural configuration. This was followed by sensitivity analyses of convolutional kernel widths and stride values, which are critical due to the asymmetric nature of the input data. Finally, comprehensive hyperparameter optimization was performed to further enhance model performance. To improve the reliability of model performance estimates without incurring prohibitive computational overhead, all three stages employed five-fold cross-validation.

5.1. Baseline architecture ablation and evaluation strategy

The first stage of the model development process focused on identifying an effective baseline architecture. Different combinations of convolutional and dense layers were evaluated to determine their impact on classification performance. The goal is to establish a robust initial model structure before proceeding to detailed architectural refinements.

To determine the optimal architectural design, configurations with up to four convolutional layers and three fully connected (dense) layers were explored. Following the principle of progressive complexity, architectural depth and spatial representation were incrementally increased to assess their influence on model performance. The convolutional architectures are summarized in Table 1, while the dense layers followed a fixed configuration of 128, 32, 16.

Table 1.
Convolutional layer configuration used in the architecture search.

Layer Type Filters Kernel/Stride

Layer 1 2D CNN 8 (3,16)/(1,4)

Layer 2 2D CNN 16 (3,8)/(1,2)

Layer 3 1D CNN 32 4 / 1

Layer 4 1D CNN 64 2 / 1

Layer	Type	Filters	Kernel/Stride
Layer 1	2D CNN	8	(3,16)/(1,4)
Layer 2	2D CNN	16	(3,8)/(1,2)
Layer 3	1D CNN	32	4 / 1
Layer 4	1D CNN	64	2 / 1

To identify the optimal configuration, we varied the number of convolutional layers (one to four) and dense layers (zero to three). Configurations were constructed sequentially, such that each configuration incorporated all preceding layers in order; non-sequential combinations (e.g., using the first and third layers while skipping the second) were not considered. The convolutional and dense structures were incrementally expanded to evaluate the effect of increasing architectural depth on model performance.

Each model configuration was trained independently on all eight radar modes using five-fold cross-validation. Performance metrics were computed as the mean across folds and modes to ensure robustness and generalizability. The ablation study was conducted using the designated training and validation dataset.

Figure 9 presents the results of the ablation study evaluating the impact of varying the number of convolutional and dense layers. Each curve corresponds to a different number of convolutional layers (ranging from one to four), while the x-axis represents the number of dense layers (zero to three), and the y-axis shows the mean test accuracy across all radar modes using five-fold cross-validation.

Figure 9.

Mean test accuracy across all radar modes for each CNN and dense layer configuration. Each curve corresponds to a different number of convolutional layers; the x-axis indicates the number of dense layers, and the y-axis shows the mean test accuracy.

The results indicate a clear improvement in performance with the addition of convolutional layers, particularly when progressing from one to three layers. Notably, the configuration with three convolutional layers and two dense layers achieves the highest accuracy of 94.79%, representing an optimal balance between architectural complexity and generalization. Beyond three convolutional layers, accuracy gains plateaued or slightly declined, suggesting diminishing returns with additional depth. Similarly, increasing the number of dense layers beyond two does not consistently enhance performance and may have introduced overfitting.

Based on the results in Figure 9, the configuration comprising three convolutional layers and two dense layers, which achieved the highest mean accuracy, was selected as the baseline architecture for subsequent experiments. Building on this structure, the next stage involves optimizing the convolutional kernel widths and stride widths to further improve model performance.

5.2. Kernel width and stride width optimization

With the baseline architecture finalized, the second stage focuses on optimizing kernel and stride widths for the convolutional layers. Four candidate configurations (Conf1-Conf4) were defined, each specifying distinct kernel and stride settings across the three convolutional layers. Detailed specifications and corresponding mean test accuracies are provided in Table 2. The kernel and stride sizes were designed to progressively reduce the spatial resolution by approximately half at each layer within a configuration, and across configurations. Minor adjustments were made in Conf1 Layer 1 to prevent excessive early compression, and in Conf4 Layer 3, where further stride reduction was not feasible due to size constraints.

Table 2.
Mean test accuracies for asymCNN configurations (conf1-conf4) with different kernel widths and stride widths (reported as kernel width / stride width) across three convolutional layers.

Layer Conf 1 Conf 2 Conf 3 Conf 4

Kernel Width / Stride Width

Layer 1 24 / 6 16 / 4 8 / 2 4 / 1

Layer 2 16 / 4 8 / 2 4 / 1 2 / 1

Layer 3 4 / 2 4 / 1 2 / 1 2 / 1

Accuracy 84.92% 93.72% 92.46% 83.83%

Layer	Conf 1	Conf 2	Conf 3	Conf 4
Layer 1	24 / 6	16 / 4	8 / 2	4 / 1
Layer 2	16 / 4	8 / 2	4 / 1	2 / 1
Layer 3	4 / 2	4 / 1	2 / 1	2 / 1
Accuracy	84.92%	93.72%	92.46%	83.83%

To evaluate the effect of kernel and stride selection, the performance of each configuration was assessed based on the mean test accuracy across all radar modes.

As shown in Table 2, Conf2 achieves the highest mean test accuracy, demonstrating the effectiveness of a moderate reduction in spatial resolution across layers. Conf1 and Conf4, applying more aggressive or minimal reductions, performed worse. Accordingly, Conf2 was adopted for subsequent experiments.

Following the selection of the optimal kernel and stride settings, hyperparameter tuning was conducted to improve training efficiency and generalization.

5.3. Hyperparameter optimization

Building on the finalized baseline architecture, hyperparameter optimization was performed to further enhance model performance. The learning rate, batch size, and dropout rate were tuned using a grid search strategy. Evaluation employed five-fold cross-validation, and the mean test accuracy across folds and radar modes was used to select the optimal settings.

Hyperparameter optimization employed five-fold cross-validation, with mean test accuracy across folds and radar modes guiding the selection of optimal settings.

The hyperparameters explored include:

Learning rate: $10^{- 4}, 5 \times 10^{- 5}, 10^{- 5}$

Dropout rate: $0.3, 0.5, 0.7$

Batch size: $16, 32, 64$

Each configuration was trained independently, and the mean of the results was used to ensure robustness and generalizability. This tuning process led to the best model performance, achieving a mean test accuracy of 94.93% with a learning rate of $10^{- 4}$ , a dropout rate of 0.7, and a batch size of 16. These hyperparameters were adopted for all subsequent experiments, providing a robust foundation for further architectural analysis.

Figure 10 illustrates the effect of varying the learning rate on mean validation accuracy, grouped by batch size. Across all batch sizes, a learning rate of $10^{- 4}$ consistently achieved the highest performance, with batch size 16 yielding the best results. The results demonstrate a clear sensitivity to the learning rate: between $10^{- 5}$ and $5 \times 10^{- 5}$ , validation accuracy improved sharply, particularly for larger batch sizes, indicating that very low learning rates hinder convergence. In contrast, the sensitivity between $5 \times 10^{- 5}$ and $10^{- 4}$ was less pronounced, with performance improvements becoming more moderate. These findings highlight the importance of selecting a sufficiently high learning rate to ensure stable convergence and optimal model performance.

Figure 10.

Mean validation accuracy as a function of learning rate, grouped by batch size (Dropout Rate = 0.7).

Figure 11 similarly illustrates the effect of varying the dropout rate on mean validation accuracy. A dropout rate of 0.7 yields the highest accuracy for batch sizes 16 and 32, while batch size 64 performs slightly better at a dropout rate of 0.5. Overall, the model shows low sensitivity to the dropout rate, with only minor variations in accuracy across the tested range.

Figure 11.

Mean validation accuracy as a function of dropout rate, grouped by batch size (Learning Rate = $10^{- 4}$ ).

The sensitivity analyses confirm that the learning rate has the strongest impact on model performance, followed by batch size with moderate sensitivity, while dropout rate exhibits weak sensitivity. Accordingly, the hyperparameter settings a learning rate of $10^{- 4}$ , dropout rate of 0.7, and batch size of 16 were fixed for the final training phase.

Final training and evaluation were conducted based on the optimized model configuration. Detailed experimental results and performance analysis are presented in Section 7.

6. Computational complexity analysis

The computational complexity of the proposed models was evaluated based on training time per epoch and inference time per sample, with particular focus on inference efficiency for real-time target classification. Both the AsymCNN and RbCNN models were tested across all eight operational modes. Experiments were conducted on a workstation equipped with an NVIDIA RTX 4090 GPU, 60 GB RAM, and an AMD EPYC 7302 (32) processor.

Both models were trained using a batch size of 16 and a learning rate of $10^{- 4}$ . As shown in Table 3, the Mean training time per epoch was approximately 0.59 seconds for the AsymCNN model and 0.60 seconds for the RbCNN model, with the slight increase primarily attributed to the ring-padding operation introduced in RbCNN to preserve the cylindrical continuity of the Range-Doppler ( $R / D$ ) maps. During inference, the mean test time per sample was evaluated across 19,456 samples. The AsymCNN model achieved a mean prediction time of 0.15 milliseconds per sample, while the RbCNN achieved 0.14 milliseconds per sample, indicating a negligible reduction in computational overhead. To summarize, the computational cost of RbCNN is effectively equivalent to that of the AsymCNN model. The minor increase in training time and negligible improvement in inference time confirm that RbCNN achieves enhanced performance without introducing significant computational overhead.

Table 3.
Training and inference time comparison for AsymCNN and RbCNN.

Training Time Inference Time

Model per Epoch per Sample

AsymCNN $\sim$ 0.59 sec 0.15 ms

RbCNN $\sim$ 0.60 sec 0.14 ms

	Training Time	Inference Time
AsymCNN	$\sim$ 0.59 sec	0.15 ms
RbCNN	$\sim$ 0.60 sec	0.14 ms

7. Experimental results

This section outlines the experimental setup, datasets, baseline algorithms, and evaluation metrics used to evaluate the proposed models across four research questions, followed by a presentation and discussion of the results.

We aim to address the following research questions: RQ1. How does the performance of the proposed CNN-based architectures compare with that of existing CNN architectures?

RQ1 examines how the performance of the proposed AsymCNN and RbCNN models compares with two widely adopted deep learning baselines: ResNet and DenseNet.

RQ2. How do differences in sea background affect the ability to detect small targets?

In aerial radar applications, the sea background changes rapidly during flight, as discussed in Section 3. Thus, RQ2 examines the ability to detect small targets when the sea background varies.

RQ3. What characteristics of targets influence their detectability using a machine learning model?

RQ3 examines the impact of a target’s velocity, acceleration, and signal intensity on the ability to detect it in a noisy marine environment.

7.1. Experimental setup

7.1.1. Datasets

To collect $I / Q$ raw data, we utilized a radar maritime patrol system. The radar’s characteristics are detailed in Table 4.

Table 4.
Radar characteristics for evaluation.

Parameter Specification

Range Up to 150 km

Frequency X-band (8–12 GHz)

Type Pulsed Doppler Radar

Resolution 7.5 m (range), $2^{\circ}$ (angle), 1 m/s (velocity)

Power 4 kW

Antenna Slot Array, 32 dB gain

Pulse Repetition Frequency 1000 Hz

Beamwidth 2.2°

Target Detection Capability Ships

Parameter	Specification
Range	Up to 150 km
Frequency	X-band (8–12 GHz)
Type	Pulsed Doppler Radar
Resolution	7.5 m (range), $2^{\circ}$ (angle), 1 m/s (velocity)
Power	4 kW
Antenna	Slot Array, 32 dB gain
Pulse Repetition Frequency	1000 Hz
Beamwidth	2.2°
Target Detection Capability	Ships

This radar was mounted on an aircraft, which conducted flights over the eastern Mediterranean Sea. The aircraft maintained a speed of 150 to 200 knots at an altitude of 3,000 feet above sea level, while performing scans in the designated operational mode. The sea was scanned eight times across eight different areas and flight directions to detect naval targets, with each scan lasted between 14 and 38 seconds as detailed in Table 5.

Table 5.

Characteristics of each flight mode, including aircraft azimuth and radar elevation, which influence the sea background, and flight time, which affects the number of $R / D$ maps available per mode.

	Aircraft	Radar	Flight	Num of
Mode	Azimuth	Elevation	Time (sec)	$R / D$ maps
1	291.984	$-$ 5.189	16.384	192
2	290.732	$-$ 6.234	37.376	438
3	287.580	$-$ 8.224	23.552	276
4	345.743	$-$ 3.361	14.336	168
5	355.906	$-$ 4.560	34.816	408
6	250.684	$-$ 2.748	15.872	186
7	163.543	$-$ 5.506	16.384	192
8	259.191	$-$ 2.944	27.904	327

Following the mathematical process described in Section 4.1, and considering the radar’s 75 km range and the received sample rate of 250Hz, we obtained 10 Range-Doppler ( $R / D$ ) matrices, each with dimensions of 1,000 range bins by 256 Doppler bins, for every 256 transmitted pulses. The range resolution (or Range Gate Size) is 7.5 meters. As detailed in Step 4.2 in Section 4.2, we then filter seven out of the ten $R / D$ maps, retaining only maps (or sub-ranges) 2, 3 and 4.

At the end of this stage, we obtained a total count of $1, 000 \times 256$ Range-Doppler ( $R / D$ ) maps for each mode, as detailed in Table 5.

7.1.2. Extraction and generation of spikes and small targets

For each $R / D$ map, we follow the process described in Section 4.2 This involves transforming each initial $1, 000 \times 256$ $R / D$ map, which contains numerous targets and spikes of varying sizes and intensities, into strip matrices with dimensions of $3 \times 256$ . For target implementation, we define a set of three distinct parameters: velocity, acceleration, and dB over threshold, as detailed in Section 4.2, item 4.2. The values and characteristics of these parameters are described in Table 6.

Table 6.
Parameters for targets implementation.

Parameter Range Distribution Continuous/Discrete

Velocity 0–3.6886 m/s Uniform Continuous

Acceleration 0–0.5 m/s $^{2}$ Uniform Continuous

dB over threshold [0,1,2,3,4,5] over 3.7 $\times$ 10⁻⁴ Uniform Discrete

Parameter	Range	Distribution	Continuous/Discrete
Velocity	0–3.6886 m/s	Uniform	Continuous
Acceleration	0–0.5 m/s $^{2}$	Uniform	Continuous
dB over threshold	[0,1,2,3,4,5] over 3.7 $\times$ 10⁻⁴	Uniform	Discrete

After extracting spikes and implementing small targets, we obtained the complete data strips, as described in Table 7. The number of strips in each mode is primarily determined by the number of real spikes detected within that mode. In general, a higher number of $R / D$ maps, listed in Table 5, tends to yield more spikes, as more maps provide more opportunities for spike detection. However, this relationship is not strictly proportional, as it ultimately depends on the actual number of spikes identified in each mode.

Table 7.

Number of strips for each mode.

Mode	Spike Strips	Target Strips	Total Strips
1	5,084	5,084	10,168
2	10,713	10,713	21,426
3	7,888	7,888	15,776
4	5,304	5,304	10,608
5	10,978	10,978	21,956
6	4,865	4,865	9,730
7	4,864	4,864	9,728
8	9,750	9,750	19,500
Total	59,446	59,446	118,892

7.1.3. Samples used for the models

For our research, we randomly selected 9,728 strips for each mode, equally divided between target and spike samples. The reasons for selecting 9,728 samples is that we aim to have an equal number of samples across all modes to ensure reliable comparisons between the machine learning models of the different modes, which will be presented later. Mode 7 has the fewest samples, with 9,728 samples. Therefore, 9,728 is the maximum number we can select to maintain equal sample sizes across all modes. For each mode, we randomly selected 25% of the data (2,432 strips) for testing, equally divided between targets and spikes. The remaining 75% of the data (7,296 samples) were used for training and validation.

To address RQ1, we compared the performance of the proposed AsymCNN and RbCNN architectures against two widely adopted deep learning architectures: ResNet and DenseNet. Each architecture was trained separately using an identical training procedure to ensure a fair and consistent comparison.

For each architecture, we created and trained a separate model for each mode, following the methodology described in Sections 4.3 and 4.4. As previously mentioned, 75% of the data (7,296 samples) from each mode were allocated for training and validation. For each model, 20% of this subset (1,459 samples) were randomly selected for validation, while the remaining 80% (5,837 samples) were used for training. The optimal batch size and learning rate were fixed at 16 and $10^{- 4}$ , respectively, while the number of training epochs was optimized individually for each model.

For the ResNet and DenseNet architectures, additional adaptations were required to accommodate the unique characteristics of the radar data. Given the highly non-standard radar input dimensions ( $3 \times 256 \times 1$ ), a preprocessing step was applied to meet the spatial input requirements of the backbone architectures. Two strategies were evaluated to address the height mismatch: zero-padding and upsampling. Based on empirical evaluation, upsampling was selected, as it provided superior classification performance for both DenseNet- and ResNet-based models. To handle the single-channel input, the upsampled images were expanded to three channels using a $1 \times 1$ convolution.

Due to the limited dataset size, DenseNet121 and ResNet50 were selected as baseline architectures, as they are the smallest widely adopted backbones available with ImageNet-pretrained weights. Each model incorporated a lightweight classification head consisting of a Dense layer with 128 units followed by a binary output neuron. To mitigate overfitting, the backbone networks were initialized with pretrained weights and kept frozen throughout the entire training process.

To address RQ2, we adopted the AsymCNN architecture and, similar to RQ1, created and trained a separate model for each mode. However, in contrast to the procedure used in RQ1, training and validation were conducted using data from the other seven modes, while the target mode was reserved exclusively for testing. We then compared these results to the model trained and tested on the same test mode samples, as described in RQ1.

To ensure an equitable comparison, we used the same number of training and validation samples (7,296) for RQ2 as in RQ1. This was achieved by randomly selecting 1,042 samples from each of the seven modes, resulting in a total of 7,294 ( $\sim$ 7,296) samples for training and validation.

For testing the AsymCNN models, we used the same 2,432 test samples as described previously. We finally computed the mean over the tests of the different modes.

Finally, to address RQ3, we analyzed the performance of the AsymCNN and RbCNN models with respect to each target parameter separately. For the discrete parameter, dB over threshold, we analyzed the performance for each radar reception power individually. For the continuous parameters, acceleration and velocity, we first divided each into five and six intervals, respectively, and then analyzed the performance for each interval separately.

7.2. Evaluation metrics and statistical significance testing

Model performance was evaluated using standard classification metrics: Accuracy, Precision, Recall, F1-score, and AUC. To assess the statistical significance of differences between models evaluated on the same test set, we followed the methodology proposed by Dietterich.³³ Specifically, we applied McNemars test to compare classification accuracy, as it is appropriate for paired binary outcomes. For Precision, Recall, F1-score, and AUC, which are computed over the entire test set and do not permit per-sample testing, we employed a bootstrap resampling approach with 1,000 iterations. In each iteration, metric differences between models were computed across resampled datasets, and p-values were estimated based on the proportion of sign reversals.

7.3. Results

In this section, we present the results addressing the research questions presented in the last section.

7.3.1. RQ1: Comparison with Established Architectures

Figure 12 presents a comparison of the mean evaluation metrics of the two architectures we proposed: AsymCNN and RbCNN, with two competitive architectures: ResNet and DenseNet.

Figure 12.

Comparison between ResNet, DenseNet, AsymCNN, and RbCNN models across evaluation metrics. RbCNN consistently outperformed the other models under separate training.

The results show that the RbCNN architecture consistently achieves the highest performance, with scores of 90.9% accuracy, 88.6% precision, 93.9% recall, 91.2% F1-score, and 97.4% AUC. AsymCNN follows closely, significantly outperforming both DenseNet and ResNet across all metrics. While DenseNet shows moderate performance, ResNet yields the lowest scores across the board. These results emphasize the effectiveness of our proposed AsymCNN and RbCNN architectures over existing models, particularly for the classification of complex-valued raw $I / Q$ radar data into small maritime targets or spikes.

To assess the significance of the observed differences, McNemars test was applied for accuracy comparisons, and a bootstrap resampling procedure was used for the remaining metrics, as described in Section 7.2. The statistical analysis revealed that the AsymCNN model significantly outperforms both DenseNet and ResNet across all evaluation metrics, with $p$ -values below 0.001, corresponding to a confidence level greater than 99.9%. Furthermore, the RbCNN model achieves statistically significant improvements over the AsymCNN model across all metrics, also with $p$ -values below 0.001.

7.3.2. RQ2: Impact of Sea Backgrounds on Target Classification

Figure 13 presents a comparison between mixed and separately trained models based on the AsymCNN architecture, representing variable and constant sea backgrounds, respectively, across five metrics. The results indicate that neither the separately trained models nor the mixed trained models consistently outperform each other across all metrics.

Figure 13.

Comparison of Mean Scores Across Metrics for Mixed and Separately Trained AsymCNN Models.

It can be observed that accuracy is higher in models trained on a specific mode representing a constant sea background. This suggests that, in general, AsymCNN models capture patterns more accurately when the sea background is constant rather than variable, resulting in a higher overall accuracy.

Precision is also higher in the constant sea background setting, which implies that training with consistent backgrounds reduces false positives. In this context, fewer spikes are misclassified as targets, enabling more accurate spike recognition in constant conditions.

In contrast, recall is higher with a variable sea background, indicating an improved capacity to capture true positives. This suggests that training in a variable sea environment enhances the model’s ability to detect targets.

While the mixed model achieves higher recall, its lower precision detracts from the F1-score, suggesting that the separately trained model provides a more balanced performance by effectively capturing true positives and minimizing false positives.

The separately trained models demonstrate a higher AUC score, indicating an improved ability to distinguish between targets and spikes across various decision thresholds.

A similar trend is observed for the RbCNN architecture. All differences were found to be statistically significant at the 99.9% confidence level, as confirmed by McNemars test for accuracy and bootstrap resampling for Precision, Recall, F1-score, and AUC.

The statistical analyses further indicate that sea background variations influence the ability to detect small targets.

In summary, each sea background condition has unique strengths: training and testing within a consistent sea background generally improve accuracy and offer a more balanced performance, enabling more precise spike recognition and improves its capacity to distinguish between targets and spikes across different decision thresholds. Conversely, a variable sea background enhances the model’s ability to detect real targets. It is evident that model performance is affected when training and testing are conducted on different modes compared to using the same mode for both. This observation addresses RQ2.

7.3.3. RQ3: Target Characteristics Influence on Detectability

To understand the characteristics of targets that influence their detectability using the ML model, we examine the impact of a targets velocity, acceleration, and signal strength on its detection capability in a noisy marine environment. To evaluate how each target characteristic influences detection, we analyzed the classification outcomes for all targets. Certain targets were accurately classified by the machine learning model, thus identified as True Positives (TP), while others were misclassified and labeled as False Negatives (FN). Recall, defined mathematically as $TP / (TP + FN)$ , is therefore the most appropriate metric for this analysis. Metrics such as accuracy or precision are not suitable in this context because their mathematical definitions incorporate True Negatives (TN) or False Positives (FP), which correspond to real spikes that are either correctly classified or misclassified, respectively. Since real spikes lack the defined characteristics, recall is the only appropriate metric for evaluating the influence of target characteristics on detectability. To further analyze this influence, we categorized the Recall metric by each characteristic separately and present the mean Recall score for each characteristic.

Figure 14 presents a comparison of the mean recall scores across different characteristics for the AsymCNN and RbCNN models. This shows the third contribution of this paper (RQ3). The results reveal two main insights: first, the RbCNN model outperforms the AsymCNN model in detecting small targets, consistent with the findings reported in RQ1. Second, for the ”dB Over Threshold” and ”acceleration” characteristics, both models exhibit a similar trend. Specifically, an approximately linear trend is observed for the ‘dB Over Threshold’ characteristic, indicating a clear relationship between dB levels and recall performance: as radar return strength increases, so does the recall score. Figure 14 also illustrates that the acceleration characteristic maintains a consistent recall value across various intervals, indicating that acceleration magnitude does not significantly impact the models target detection capability.

Figure 14.

Comparison of Mean Recall Scores across Different Characteristics for Separately Trained AsymCNN and RbCNN: (a) dB Over Threshold, (b) Acceleration Intervals, (c) Velocity Intervals.

An interesting result is observed for the ‘velocity’ characteristic. For mid-range values, the behavior resembles that of the acceleration characteristic, where recall values remain relatively stable. However, at the extreme ends of the range, a notable decrease in recall is observed for the AsymCNN model only. This finding suggests that the AsymCNN model has greater difficulty classifying targets with either very low or very high velocities. This phenomenon is influenced by the targets position along the strip. A direct relationship exists between the targets velocity and its location on the strip, where targets with extreme velocity values are positioned near the strip’s edges, as illustrated in Figure 15. Targets located at the edges of the strip appear to be more challenging for the AsymCNN model to detect. Unlike the AsymCNN, the RbCNN model effectively mitigates the edge effect due to its cylindrical structure, achieving an approximately constant trend at the edges, where the classification improvement compared to the AsymCNN model is most pronounced.

Figure 15.

Examples of $R / D$ strips featuring targets with varying velocities. The target’s velocity determines its position along the horizontal axis, with higher velocity placing the target further to the right.

8. Conclusions, discussion and future work

In this research, we introduced an innovative approach to detect small marine targets in a noisy environment. This approach combines novel deep learning techniques – the Rings-based Convolutional Neural Network (RbCNN), specifically designed for processing cylindrical $R / D$ inputs. Our approach involved initial mathematical processing of the $I / Q$ data to generate $R / D$ maps. Then, we transformed the $R / D$ maps into strip matrices containing real spikes and semi-simulative targets. These strips were fed as input data to the RbCNN model to classify between targets and spikes.

Evaluation was performed by collecting $I / Q$ data using a radar mounted on an aircraft, which conducted flights over the Mediterranean Sea.

The results demonstrate that the proposed AsymCNN and RbCNN architectures achieve superior performance in detecting small targets within noisy environments compared to existing established architectures. Furthermore, they indicate that variations in sea background conditions influence the models ability to detect such targets. In addition, we identified which target characteristics have the greatest influence on detectability by the ML models, highlighting signal strength as the most influential factor. Finally, our findings show that the cylindrical structure of the RbCNN model effectively mitigates edge effects, leading to improved performance relative to the AsymCNN.

Despite its valuable contributions, the research is subject to several limitations in its execution. It was conducted under specific laboratory conditions and limitations, some of which enabled precise and reliable analysis of the results. For example, each data strip contained either a target or a spike, but not both, ensuring a clear evaluation of individual cases. Although the research environment was relatively homogeneous, with constant-altitude flights over the Mediterranean Sea, the variations in sea backgrounds were sufficient to ensure the reliability of the research.

One limitation relates to the noisy environments represented in the dataset. These reflect only the conditions present on the specific day of data collection and do not capture the full spectrum of possible noise scenarios encountered at sea. Naturally, increasing the number of operational days and example cases would introduce a broader variety of environmental noise, which could further enhance model robustness and generalizability.

Additionally, while the targets were semi-simulative and not fully representative of real-world scenarios, the study relied primarily on real data and, where real data is missing, on a simulation approach validated by domain experts. The spikes, which are particularly challenging to simulate given the unknown and complex noise distribution, are entirely authentic. The targets, simulated by experts using an authentic sea background, were designed to replicate the key optical and spatial characteristics of real marine targets. By simulating targets using parameters reviewed and approved by experts, we ensured that the research findings are both realistic and reliable.

In future work, we plan to evaluate our method using fully real data targets, which will allow for a more comprehensive validation of the proposed methods in operational settings. Our ultimate goal is to train the ML model exclusively on semi-simulated data and achieve robust performance on real data. Additionally, we aim to assess the effectiveness of the RbCNN model in various scenarios and validate its overall efficacy on a broader scale.

Beyond the current architecture, there is significant potential to extend the RbCNN model by integrating advanced classification techniques. For instance, the Neural Dynamic Classification (NDC) algorithm³⁴ could be used on RbCNN-extracted features to enhance class separation by discovering optimal transformation spaces. Likewise, the Dynamic Ensemble Learning (DEL) algorithm³⁵ could enable the training of multiple RbCNN variants under different settings, allowing for a more diverse and robust ensemble. Alternatively, the Finite Element Machine (FEMa)³⁶ offers a fast, parameterless post-classification method that could further reduce computational overhead in real-time deployment. Moreover, integrating self-supervised learning (SSL) approaches,^37,38 such as SimCLR-based contrastive pretraining, would allow the model to benefit from a large volume of unlabeled radar data by learning generalized representations prior to fine-tuning. These hybrid strategies would not only improve performance but also increase robustness to new environments and reduce reliance on labeled data, aligning with operational needs in marine radar systems.

Footnotes

ORCID iDs

Ido Tam

Meir Kalech

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Yang

. Overview of radar detection methods for low altitude targets in marine environments. J Syst Eng Electron 2024; 35: 1–13.

Barkat

. On adaptive cell-averaging CFAR radar signal detection. Syracuse: Syracuse University, 1987.

Sevimli

Tofighi

Cetin

. Range-doppler radar target detection using denoising within the compressive sensing framework. In: 2014 22nd European Signal Processing Conference (EUSIPCO). IEEE, 2014, pp. 1950–1954.

Mullen

Vieira

Herezfeld

, et al. Application of radar technology to aerial lidar systems for enhancement of shallow underwater target detection. IEEE Trans Microw Theory Tech 1995; 43: 2370–2377.

Jiang

Ren

Liu

, et al. Artificial neural networks and deep learning techniques applied to radar target detection: a review. Electronics 2022; 11: 156.

Xie

Tang

, et al. Svm-based sea-surface small target detection: A false-alarm-rate-controllable approach. IEEE Geosci Remote Sens Lett 2019; 16: 1225–1229.

Zhang

, et al. Research on sea clutter reflectivity using deep learning model in industry 4.0. IEEE Trans Indus Inform 2019; 16: 5929–5937.

Gasienica-Jozkowy

Knapik

Cyganek

. An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integr Comput Aided Eng 2021; 28: 221–235.

Callaghan

Burger

Mishra

. A machine learning approach to radar sea clutter suppression. In: 2017 IEEE Radar Conference (RadarConf). IEEE, 2017, pp. 1222–1227.

10.

Chen

Guan

, et al. Deep cnn-based radar detection for real maritime target under different sea states and polarizations. In: International Conference on Cognitive Systems and Signal Processing. Springer, 2018, pp. 321–331.

11.

Pan

Chen

Wang

, et al. A novel approach for marine small target detection based on deep learning. In: 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP). IEEE, 2019, pp. 395–399.

12.

Mou

Chen

Guan

, et al. Marine target detection based on improved faster R-CNN for navigation radar PPI images. In: 2019 International Conference on Control, Automation and Information Sciences (ICCAIS). IEEE, 2019, pp. 1–5.

13.

O’Shea

Clancy

. Convolutional radio modulation recognition networks. Deep Learn Wireless Phys Layer Commun 2016; 629: 213–226.

14.

Guo

Zhang

Shao

, et al. Sea clutter and target detection with deep neural networks. In: Proceedings of the 2nd international conference artificial intelligence for engineering applications. 2017, pp. 316–326.

15.

Zhang

You

, et al. Deep learning-based automatic clutter/interference detection for hfswr. Remote Sens (Basel) 2018; 10: 1517.

16.

Linghu

Jeon

, et al. Sea clutter feature prediction and parameters inversion using deep learning model. IEEE Trans Indus Inform 2022; 19: 8374–8383.

17.

Zhao

Jiang

Wang

, et al. Robust cfar detection for multiple targets in k-distributed sea clutter based on machine learning. Symmetry 2019; 11: 1482.

18.

Tang

Cheng

, et al. A novel sea clutter suppression method based on deep learning with exploiting time-frequency features. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), volume 5. IEEE, 2021, pp. 2548–2552.

19.

Pei

Yang

, et al. A sea clutter suppression method based on machine learning approach for marine surveillance radar. IEEE J Select Topics Appl Earth Observa Remote Sens 2022; 15: 3120–3130.

20.

Pei

Huo

, et al. A machine learning approach to clutter suppression for marine surveillance radar. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 3137–3140.

21.

Chun

Suzuki

Kato

. Iterative application of generative adversarial networks for improved buried pipe detection from images obtained by ground-penetrating radar. Comput-Aided Civil Infrastru Eng 2023; 38: 2472–2490.

22.

Huang

Yang

Zhou

, et al. A deep learning framework based on improved self-supervised learning for ground-penetrating radar tunnel lining inspection. Computer-Aided Civil Infrastru Eng 2024; 39: 814–833.

23.

Mohan

Simske

. Cross-sensor vision system for maritime object detection. Front Marine Sci 2023; 10: 1112955.

24.

Dao

Adeli

, et al. A sensitivity and robustness analysis of gpr and ann for high-performance concrete compressive strength prediction using a monte carlo simulation. Sustainability 2020; 12: 830.

25.

Rahman

Vattulainen

Robertson

. Machine learning-based approach for maritime target classification and anomaly detection using millimetre wave radar doppler signatures. IET Radar, Sonar Naviga 2024; 18: 344–360.

26.

Farnett

Stevens

Skolnik

. Pulse compression radar. Radar Handbook 1990; 2: 10–11.

27.

Chen

. The micro-Doppler effect in radar. Boston: Artech house, 2019.

28.

Goodfellow

Bengio

Courville

, et al. Deep learning Vol. 1. Cambridge: MIT press Cambridge, 2016.

29.

LeCun

Bottou

Bengio

, et al. Gradient-based learning applied to document recognition. Procee IEEE 1998; 86: 2278–2324.

30.

Urdiales

Martín

Armingol

. An improved deep learning architecture for multi-object tracking systems. Integr Comput Aided Eng 2023; 30: 121–134.

31.

Ruiz

Díaz

González

, et al. Improving the competitiveness of aircraft manufacturing automated processes by a deep neural network. Integr Comput Aided Eng 2023; 30: 341–352.

32.

Araujo

Norris

Sim

. Computing receptive fields of convolutional neural networks. Distill 2019; 4: e21.

33.

Dietterich

. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 1998; 10: 1895–1923.

34.

Rafiei

Adeli

. A new neural dynamic classification algorithm. IEEE Trans Neural Netw Learn Syst 2017; 28: 3074–3083.

35.

Alam

KMR

Siddique

Adeli

. A dynamic ensemble learning algorithm for neural networks. Neural Comput Appl 2020; 32: 8675–8690.

36.

Pereira

Piteri

Souza

, et al. Fema: A finite element machine for fast learning. Neural Comput Appl 2020; 32: 6393–6404.

37.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for electroencephalography. IEEE Trans Neural Netw Learn Syst 2022; 35: 1457–1471.

38.

Rafiei

Gauthier

Adeli

, et al. Self-supervised learning for near-wild cognitive workload estimation. J Med Syst 2024; 48: 107.

Layer	Conf 1	Conf 2	Conf 3	Conf 4
Kernel Width / Stride Width
Layer 1	24 / 6	16 / 4	8 / 2	4 / 1
Layer 2	16 / 4	8 / 2	4 / 1	2 / 1
Layer 3	4 / 2	4 / 1	2 / 1	2 / 1
Accuracy	84.92%	93.72%	92.46%	83.83%

	Training Time	Inference Time
Model	per Epoch	per Sample
AsymCNN	$\sim$ 0.59 sec	0.15 ms
RbCNN	$\sim$ 0.60 sec	0.14 ms

Marine radar target detection in a noisy environment using rings-based convolutional neural network

Abstract

Keywords

1. Introduction

2. Related work

3. Problem definition

4.1.1. Pulse compression

4.1.2. Partition to distance segments

4.1.3. Mathematical correction for radar’s kinesis

4.1.4. Fast fourier transform process

4.2 Extraction and generation of spikes and small targets

4.4. Deep learning classification models

4.4.1. Convolutional neural networks

4.4.2 Asymmetrical CNN-Based Model for Radar Target Detection

Network Architecture:

Loss function

Activation functions

Implementation of RbCNN

5.1. Baseline architecture ablation and evaluation strategy

Table 1. Convolutional layer configuration used in the architecture search. Layer Type Filters Kernel/Stride Layer 1 2D CNN 8 (3,16)/(1,4) Layer 2 2D CNN 16 (3,8)/(1,2) Layer 3 1D CNN 32 4 / 1 Layer 4 1D CNN 64 2 / 1

Table 3. Training and inference time comparison for AsymCNN and RbCNN. Training Time Inference Time Model per Epoch per Sample AsymCNN ∼ 0.59 sec 0.15 ms RbCNN ∼ 0.60 sec 0.14 ms

7.1. Experimental setup

7.1.1. Datasets

Table 6. Parameters for targets implementation. Parameter Range Distribution Continuous/Discrete Velocity 0–3.6886 m/s Uniform Continuous Acceleration 0–0.5 m/s 2 Uniform Continuous dB over threshold [0,1,2,3,4,5] over 3.7 × 10−4 Uniform Discrete

7.2. Evaluation metrics and statistical significance testing

7.3. Results

7.3.1. RQ1: Comparison with Established Architectures

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

References

Table 1.
Convolutional layer configuration used in the architecture search.

Layer Type Filters Kernel/Stride

Layer 1 2D CNN 8 (3,16)/(1,4)

Layer 2 2D CNN 16 (3,8)/(1,2)

Layer 3 1D CNN 32 4 / 1

Layer 4 1D CNN 64 2 / 1

Table 3.
Training and inference time comparison for AsymCNN and RbCNN.

Training Time Inference Time

Model per Epoch per Sample

AsymCNN $\sim$ 0.59 sec 0.15 ms

RbCNN $\sim$ 0.60 sec 0.14 ms

Table 6.
Parameters for targets implementation.

Parameter Range Distribution Continuous/Discrete

Velocity 0–3.6886 m/s Uniform Continuous

Acceleration 0–0.5 m/s $^{2}$ Uniform Continuous

dB over threshold [0,1,2,3,4,5] over 3.7 $\times$ 10⁻⁴ Uniform Discrete