Sage Journals: Discover world-class research

Abstract

Accurate environmental sensing is an important prerequisite for autonomous driving in off-road environments. Most targets in off-road environments do not have regular shapes, colors, textures and other features, making them difficult to identify. In addition, complex driving conditions can cause large, broadband vibrations in off-road vehicles, which interfere with environment sensing and affect the accuracy and efficiency of perception. To address the above problems, this paper proposes an improved 3D point cloud filtering algorithm for unstructured environments and a point cloud classification method using neural networks, and provides an experimental proof-of-principle of the proposed methods. A comparison of the results under six conditions shows that the amount of data processed by the improved filtering algorithm is 65%–85% of that processed by the conventional filtering algorithm, and the trained neural network model achieves an accuracy of 98.0% and a loss value as low as 0.008 when classifying three typical targets in an unstructured environment. A comparison with algorithms proposed in other papers shows that the proposed method is highly feasible.

Keywords

Unstructured environments improved filtering algorithm point cloud target detection CNN

Introduction

Autonomous driving technology enhances the safety and reliability of engineering vehicles and has garnered significant attention in the domains of resource extraction and transportation. Environmental sensing is crucial for the autonomous operation of engineering vehicles, necessitating access to extensive, accurate environmental data to facilitate obstacle avoidance and local path planning. Currently, widely used environmental sensors include monocular vision, binocular vision, depth cameras, radar, and LiDAR. LiDAR’s most significant advantage lies in its capability to rapidly acquire highly accurate point clouds over extensive areas, independent of lighting conditions. However, classifying and identifying objects based on features like non-uniform density and unstructured distribution remains a challenge in using LiDAR point clouds for environmental perception. The storage and real-time computation of large amounts of point cloud data are additional challenges. (a) High dispersion of data: Point clouds in unstructured environments often exhibit high levels of dispersion due to the variability and unpredictability of terrain features, complicating traditional filtering and segmentation tasks. (b) Noise and clutter: Unstructured settings are prone to noise and clutter from natural elements such as leaves, uneven ground, or non-static objects, which can mislead detection and classification algorithms. (c) Lack of consistency: Unlike structured environments, objects in unstructured environments lack consistency in shape, size and distribution, making the direct application of traditional methods challenging.

Ding et al.¹ developed a dual-frequency continuous-wave radar method for target detection that leverages differential directional radar to integrate target information from various detection angles. This approach enhances the richness of the target data, facilitates the accurate identification of potential moving targets, and effectively suppresses mid-frequency interference, crucial for broad-spectrum detection applications. Yuan et al.² introduced a novel obstacle detection and path tracking strategy that combines monocular camera data with ranging radar input. This method, leveraging the integration of laser and vision data, enables mobile robots to track targets effectively while concurrently addressing the challenges of robot localization, target tracking, and map construction. Yoon and Park³ proposed an ultrasonic localization technique using a genetic algorithm in structured environments, designed to prevent ultrasonic signal collision and thus maximize localization accuracy. Wenzl et al.⁴ investigated a LIDAR sensor network tailored for the autonomous tracking of pedestrians within a surveillance area, introducing a sophisticated decentralized track fusion architecture for enhanced multi-target detection and tracking. Vakulya and Simon⁵ developed a neural network-based model for acoustic sensor-target detection, designed to compensate for co-measurement and systematic errors, thereby ensuring reliable results even when traditional consistency function-based algorithms fail. Despite these advancements, validations of these target detection and localization techniques in unstructured environments remain unexplored.

In traditional point cloud analysis within unstructured environments, the disorganized characteristics of LIDAR point clouds pose significant challenges to both the accuracy and efficiency of data processing. Moreover, managing the storage and real-time operation of substantial point cloud volumes continues to be a formidable challenge.

Due to significant vibrations generated when engineering vehicles traverse non-structured surfaces (as evidenced by RTK data showing the amplitude of vibrations on such surfaces), the point cloud data collected are highly dispersed. Simple coordinate threshold methods and traditional filtering techniques struggle to reliably filter this data. In response to the challenge of rapidly removing complex noise in point clouds under non-structured conditions, this paper introduces an improved statistical filtering algorithm for point clouds. Within this algorithm, values can be set to control the neighborhood size, standard deviation multiples, and the stringency of selection criteria. It is evident that these settings greatly influence the dispersion and spatial distribution of the processed point cloud (as will be demonstrated in the experimental section).

Machine learning is widely applied in sensor-based object classification and recognition to enhance accuracy.^6–9 Accordingly, this study presents a method that utilizes a convolutional neural network (CNN) to segment original LiDAR point clouds into individual targets, followed by feature extraction and classification of these targets. In most environments, objects can be assumed to be perpendicular to the ground, allowing for the separation of ground and non-ground points in the LiDAR data. The remaining non-ground points are then projected onto a plane and clustered into distinct targets; subsequently, this plane is rasterized into neatly arranged cells containing the corresponding scatter points. Connected components are then consolidated, and an inverse mapping from the plane segments unorganized three-dimensional points into distinct objects with unique labels. This method eliminates redundant iterations in the target segmentation process, thereby improving computational efficiency. This study examines three types of objects: trees, pedestrians, and material piles. The proposed feature extraction and classification method is applicable in most unstructured environments and supports decision-making for autonomous vehicles, thus enabling autonomous driving in unknown environments.

The contributions of this paper are as follows:

(1) In response to the challenge of rapidly removing complex noise in point clouds under non-structured conditions, this study introduces an improved statistical filtering algorithm, enabling rapid filtering of point clouds in unstructured environments;

(2) To address the difficulty of rapidly and accurately recognizing three-dimensional point clouds in unstructured environments using traditional methods, this paper proposes a method of projecting three-dimensional point cloud data onto a two-dimensional plane for target cloud detection. Furthermore, a convolutional neural network model specifically designed for target cloud detection is introduced, which demonstrates superior performance compared to other network models;

(3) The proposed methods have been systematically validated and tested through experimentation.

The rest of this paper is organized as follows: Section II presents the related existing research. Section III comprehensively describes the data filtering method for LiDAR point cloud and the construction of a neural network model. Section IV carries out the experimental design and data collection. Section V verifies the effectiveness of the filtering algorithm proposed in this paper by comparing it with other filtering methods and discusses the pros and cons of the proposed neural network-based classification method. Section 6 concludes the paper by summarizing the advantages and disadvantages of the proposed method.

Related works

Point cloud filtering

Research has demonstrated that morphological filters are capable of eliminating target points, and the application of morphological operations with small windows effectively removes minor ground objects, such as individual trees, thereby producing a surface that more closely approximates the ground level. Huang¹⁰ introduced a filtering algorithm that adapts to the inherent properties of point clouds gathered by airborne LiDAR. This method adjusts the filtering window based on point cloud density and gradient differences above and below the surface. However, its reliance on the specific characteristics of 3D point clouds limits its general applicability. Sithole and Vosselman¹¹ developed a filtering algorithm based on altitude difference segmentation that performs well in structured environments. Yet, in unstructured settings, the significant variance in altitude differences within the 3D point clouds leads to suboptimal outcomes. Furthermore, a progressive morphology filter was introduced in¹² for isolating non-terrestrial LIDAR signals by incrementally increasing the filter window size. This technique, which applies a threshold of altitude difference to exclude vehicles, vegetation, and buildings while preserving the ground, was evaluated using datasets from both mountainous and urban environments. Despite its advancements, this method still suffers from certain inaccuracies, including false positives and omissions.

Point cloud segmentation

Chen et al.¹³ divided a complete LiDAR point cloud scene into uniformly distributed 3D voxels and applied feature coding to elucidate the characteristics of each voxel. Yang et al.¹⁴ encoded each voxel as a placeholder and projected oriented 2D bounding boxes in aerial views of LiDAR data. Hao and Wang¹⁵ developed an object classification algorithm tailored for complex environmental scenes, employing Gaussian mapping to segment the point cloud and reconstruct scene topology. However, this method was limited to objects composed solely of planes and overlooked the occlusion challenges inherent in 3D objects. Yang et al.¹⁶ introduced a semantic feature point alignment method, utilizing intersections of feature lines with the ground as semantic points and combining geometric constraints with semantic information for feature point matching to achieve alignment. Despite its sophistication, this approach lacks a consistent method for global alignment due to the complexity of semantic feature point extraction. Hamraz et al.¹⁷ proposed a tree segmentation strategy using digital surface models to differentiate the point cloud into upper and multiple understory layers by analyzing the vertical distribution of overlapping LiDAR points. Broggi¹⁸ estimated the absolute velocity of targets through visual range estimation, constructed a voxel-based comprehensive 3D map from sampled point clouds, segmented it into clusters using voxel filling, and labeled these clusters as stationary or moving targets. However, vision-based 3D engine implementations are hampered by visibility constraints, resulting in sparse parallax maps. Zhao et al.¹⁹ implemented a geometric segmentation algorithm to distinguish between target and ground areas in LiDAR data, then deeply classified corresponding images captured by the camera using a fuzzy logic inference framework to integrate LiDAR data with imagery for frame-by-frame analysis. Zeng et al.²⁰ identified candidate keypoints with high local heteroskedasticity values by computing shape indices and dual Gaussian weighted metrics for each 3D point, facilitating 3D model identification and alignment. This approach was effectively employed by numerous teams during the 2007 DARPA Urban Challenge to segment point clouds and detect vehicles on the track.^21–23 Himmelsbach et al.²⁴ devised a rapid segmentation method for extensive long-range 3D point clouds, enabling local ground plane estimation and swift 2D connected component labeling by splitting the problem into two simpler subproblems and projecting 3D points onto a 2.5D mesh anchored to the ground.

A. CNN

In recent years, the landscape of feature extraction in machine vision has been significantly enriched by the advent of deep learning technologies, including autoencoders, convolutional neural networks (CNNs), restricted Boltzmann machines, and deep networks.^25–27 These methods have become pivotal in enhancing the accuracy of sensor-based object classification and identification within the field of object recognition.

Zeng et al.²⁸ demonstrated the effectiveness of a CNN-based multi-feature fusion learning method specifically tailored for the retrieval of nonrigid 3D models. Wang et al.²⁹ developed a novel graphical convolutional kernel that selectively concentrates on the most pertinent segments of a point cloud, capturing essential structural features for semantic segmentation. Pang and Ulr³⁰ advanced 2D classification techniques for point clouds using CNNs, standardizing the size of training samples to ensure uniform processing across populated boundaries, thereby enabling the classifier to scan for all object classes within a consistently sized window.

Further, Song et al.³¹ transformed target point clouds into a Hoff space using a Hoff transform algorithm, subsequently rasterizing them into a series of uniform meshes. They then quantified the accumulator in each mesh, employing CNNs for the classification of 3D objects. Rangel et al.³² leveraged spatial information within 3D data to segment targets in images and conducted semi-supervised learning on each targeted image. This approach utilized the robust classification capabilities of CNNs to effectively generalize categories characterized by high intra-class variation.

Additionally, a novel nonrigid CNN-based multi-feature fusion learning model was introduced,³³ highlighting a growing trend in 3D object classification that integrates various point cloud representations and CNN models to generate comprehensive identifying information about objects.³⁴

Methods

The operating environments of construction vehicles are typically complex and unstructured, where targets often lack regular shapes, colors, textures, and other distinct features, complicating the task of target recognition. Additionally, rigorous driving practices induce significant wide-band vibrations in construction vehicles, which introduce substantial noise into the perceptual data collected by sensors such as LiDAR, cameras, and range sensors. This noise substantially interferes with the accuracy of target classification.

To address these challenges, this study first implements a denoising process on the collected data to enhance data quality. Subsequently, the cleaned data is used to identify and classify targets. Specifically, this paper applies convolutional neural networks (CNN) to classify point cloud data representing trees, pedestrians, and piles, which are common elements within construction sites. The effectiveness of this approach is validated through experimental methods that are designed to test the robustness and accuracy of the CNN model under the challenging conditions typical of construction environments.

Proposed framework overview

In response to the challenges of target recognition in unstructured environments characterized by significant noise, this study introduces a hybrid model that integrates an enhanced filtering algorithm with convolutional neural networks (CNNs), as detailed in Figure 1. The model comprises four primary components:

1) Data acquisition. A 16-wire LiDAR is used for point cloud data acquisition. This paper aims to classify the point cloud in unstructured environments. Thus, the RTK is employed to collect the variation of the three-axis angle of the vehicle to illustrate the complexity of the environment and to prepare for the next step of processing.

2) Point cloud filtering. A reasonable filtering process is needed to prepare for the subsequent segmentation because of the high dispersion of the point cloud collected in unstructured environments. The traditional statistical filtering algorithm cannot achieve satisfactory results for discrete point clouds; therefore, this paper will propose an improved filtering method.

3) Point cloud segmentation. Firstly, the ground is separated from the filtered point cloud. Then, the point cloud is projected into the x–z plane to obtain target clusters and their 2D geometry features. Finally, the targets are labeled.

4) Classification. CNNs consist of multiple convolutional and pooling layers with a deep architecture that can automatically extract key features of the data and reveal the characteristics of the source data. In this paper, we will take advantage of this capability of CNNs to accomplish the classification requirements by training the neural network based on the processed point cloud.

Figure 1.

Framework of the proposed method.

Each section is discussed in detail below.

Data acquisition

The data acquisition system employed in this study is comprised of a loader, a LiDAR, a Real-Time Kinematic (RTK) system, an acquisition card, and a computer. The LiDAR is directly interfaced with the computer, while the RTK system connects through the acquisition card. To assess the robustness and applicability of the proposed algorithm, experiments were designed under six distinct conditions in an unstructured environment, each varying in velocity and route; detailed descriptions of these conditions are provided in the Experimental Section.

LiDAR point clouds serve as the primary data source for training the neural network and validating the algorithm’s performance. Concurrently, vehicle condition data, including pitch, roll, and yaw angles, are utilized to delineate the contrasts between structured and unstructured environments where engineering vehicles operate. The high scanning frequency of the LiDAR yields tens of thousands of data points per second. In an effort to optimize computational resources and enhance processing efficiency, only the point cloud data within a 30-m radius centered on the LiDAR’s location is retained, effectively excluding points outside this defined area.

Point cloud filtering

A construction vehicle driving on an unstructured pavement will generate large vibration (the vibration level of the pavement can be determined from the RTK data), thereby leading to a large dispersion of the collected point cloud, which is difficult for the coordinate threshold method and the traditional filter method to filter reliably. Thus, the point cloud is processed by using the improved statistical filter algorithm in this paper. The specific improvements are as follows.

Assume that the point cloud set is $X = {x_{i}, i = 1, 2, \dots, n}$ , and for any point $x_{i}$ , let $S_{i}$ denote the average distance to $k$ points in the neighborhood.

$S_{i} = \frac{1}{k} \sum_{j = 1}^{k} \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2} + {(z_{i} - z_{j})}^{2}}$ (1)

In the statistical filtering algorithm, $S_{i}$ is assumed to be subject to normal distribution $N (μ, σ)$ , where $μ$ is the mean and $σ$ is the standard deviation.

$μ = \frac{\sum_{i = 1}^{n} S_{i}}{n}$ (2)

$σ = \frac{\sum_{i = 1}^{n} {(S_{i} - μ)}^{2}}{n}$ (3)

Consequently, any points in a point cloud whose distances from their neighbors do not fall within the range defined by (μ−(μ−ασ),μ+ασ)(μ−(μ−ασ),μ+ασ) are classified as outliers. Within the algorithm, the parameter k can be adjusted to modulate the size of the neighborhood, and the standard deviation multiplier α can be varied to control the range of conditions, effectively determining the stringency of the filtering process. The choices of k and α significantly influence the dispersion and spatial arrangement of the filtered point cloud, thereby affecting the overall efficacy of the filtering.

In response to the distinct noise characteristics typical in unstructured environments, this study advances the conventional single-pass statistical filtering approach to a multi-iteration filtering algorithm. This enhancement allows for varying levels of noise reduction by adjusting k and αα, accommodating different noisy point cloud scenarios and improving the adaptability and precision of the noise mitigation process.

The point cloud dataset $X_{i}$ is updated every time a statistical filter is performed, and the average distance of all points in the updated $X_{i}$ to their respective neighborhoods are still subject to a normal distribution, but the mean and standard deviation of the new normal distribution become $μ_{i}$ and $σ_{i}$ , respectively. In accordance with the pattern of the effect of standard deviation on the normal distribution, a small σ corresponds to increased concentration of the data, and conversely, a large $σ$ corresponds to less data concentration. Therefore, considering the fluctuating nature of noise in point cloud in an unstructured environment, multiple iterations of statistical filtering can be performed, and the standard deviation gradually decreases as the number of iterations increases until the concentration of the data yields the desired result when $σ (m)$ is less than the certain threshold $c$ , at which point the iterations are stopped, as shown in Algorithm 1.

Algorithm 1.

Improved filtering algorithm.

Data: k, α, and threshold c

Input: The original point cloud X

Output: The cleaned point cloud X’

1 Read in X, and calculate µ, σ with equations (2) and (3);

2 while σ>c do

3 foreach

X_{i}

4 Calculate

S_{i}

with equation (1);

5 if µ−ασ <

S_{i}

< µ+ασ then

6 Retain

X_{i}

;

7 else

8 Delete

X_{i}

;

9 end

10 end

11 Update µ,σ, k,α;

12 end

Contrary to the conventional statistical filtering algorithms, the enhanced algorithm proposed in this study conducts multiple rounds of statistical filtering on highly dispersive noise within 3D point clouds. It regulates the iterations of filtering through a standard deviation threshold based on the Euclidean distance. This method demonstrates significant improvements over traditional statistical filters, yielding more precise outcomes. Furthermore, it establishes optimal conditions for subsequent point cloud segmentation and feature extraction processes in unstructured environments.

Point cloud segmentation

In the context outlined, a pragmatic strategy involves approximating most physical entities as orthogonal to the terrestrial plane. Accordingly, the initial step entails segmenting the tridimensional point cloud by projecting it onto the x–z plane. This projection yields a contiguous plane of ground points, while other objects manifest as distinct components within the point cloud projection. Subsequently, employing a specified threshold, the ground points are sieved in the x–z plane, thereby filtering them effectively.

To isolate non-ground points into discrete, interconnected clusters, a methodical approach involves rasterizing the projection points into uniform square units. These units are subsequently amalgamated into autonomous objects, discerned based on the salient geometric attributes projected onto the x–z plane. Following this segmentation process, the identified clusters are then reconstituted into their native tridimensional configurations, with accompanying categorical labels.

Target detection

CNN has unparalleled advantages in deep learning tasks with image as input, however, to get high precision results, a huge amount of data is required, and the dataset in this paper is obviously difficult to train a brand new large convolutional network, so this paper designs the network as a 25-layer structure according to the realistic dataset, and its structure is shown in Figure 2.

Figure 2.

Proposed network specific structure.

In addressing deep learning tasks involving image inputs, Convolutional Neural Networks (CNNs) exhibit exceptional capabilities. However, achieving high-precision outcomes typically necessitates extensive datasets. Given the constraints posed by the limited dataset described in this study, it is impractical to train a completely new, expansive CNN from scratch. Therefore, we have designed a tailored 25-layer CNN architecture, optimally suited to our dataset’s scale. The configuration of this network is detailed in Figure 2.

The network consists of a 25-layer network with 5 convolutional layers, 5 pooling layers, 7 activation layers, 8 fully connected layers, 1 Softmax layer, 2 normalize layers, and 7 dropout layers (all drop odds are set to 0.5). The size and number of layers of the convolutional kernel can be read off from the Figure 2. The network consists of 25 layers, including 5 convolutional layers, 3 pooling layers, 7 activation layers, 3 fully connected layers, 1 softmax layer, 2 normalization layers, and 2 discard layers (the discard probability was set to 0.5). Among them, The first layer Conv1 convolution kernel size is 11 × 11, stride is 4, padding is,^1,2 subsequently Maxpool1 parameters are set respectively, kernel size is 3 × 3, stride is 2, padding is 0,, and the second layer Conv2 convolution kernel size is 5 × 5, stride is 1, padding is 0. stride is 1, padding is,[2,2] Maxpool2 parameters are set respectively, kernel size is 3 × 3, stride is 2, padding is 0, the third layer of Conv3 convolution kernel size is 3 × 3, stride is 1, padding is,[1,1] followed by the fourth and fifth layers of Convolution, where two layers of convolution parameters are designed kernel size is 3 × 3, stride is 1, padding is,[1,1] Maxpool2 parameters are set respectively, kernel size is 3 × 3, stride is 2, padding is 0. Finally, three fully connected layers are connected. as shown in Table 1.

Table 1.

Specific parameters of the proposed network design.

Layer	Kernel Size	Stride	Padding
Conv1	11 × 11	4	[1, 2]
Maxpool1	3 × 3	2	0
Conv2	5 × 5	1	[2, 2]
Maxpool2	3 × 3	2	0
Conv3	3 × 3	1	[1, 1]
Conv4	3 × 3	1	[1, 1]
Conv5	3 × 3	1	[1, 1]
Maxpool3	3 × 3	2	0
FC1	2048	N/A	N/A
FC2	2048	N/A	N/A
FC3	3	N/A	N/A

The design rationale is (1) the addition of the Relu activation function $f (x) = \max (0, x)$ after each convolution, which solves the gradient vanishing problem of Sigmoid. (2) The dropout layer is used to selectively ignore individual neurons in training to avoid overfitting (also using data augmentation to prevent overfitting) problems of the model and to enable faster convergence. (3) A Local Response Normalization (LRN) layer is added to improve the accuracy. The core idea of Local Response Normalization (LRN) is to use the nearest neighbor data for normalization, and its formula is shown in (4) below.

$b_{x, y}^{i} = a_{x, y}^{i} / {(k + α \sum_{j = \max (0, i - n / 2)}^{\min (N - 1, i + n / 2)} {(a_{x, y}^{j})}^{2})}^{β}$ (4)

Where: $a_{x, y}^{i}$ denotes the output of the $i$ convolutional kernel, acting at position (x, y), and then the neuron obtained after Relu; N denotes the total number of convolutional kernels in the layer; κ, n, α, β are hyperparameters, whose values are determined by the validation set, in this paper we use κ = 2, n = 5, α = 0.001, β = 0.75. This paper is designed to use LRN after the Relu layer. (4) The BN layer is added, thus accelerating the model training and making it more stable (adaptable to large learning rates and insensitive to parameter initialization) and avoiding manual adjustment of hyperparameters.

When the model is being trained, multicategory cross entropy is chosen as the $Loss$ function, as shown in equation (5), where ${\hat{y}}_{i}$ is the probability that the model predicts the sample to be the $i^{th}$ category. $Accuracy$ is used as the indicator to evaluate the performance of the model, as shown in equation (6). Meanwhile, we have measured the superiority of the model in terms of precision and recall, where the formula of precision recall is shown in equations (7) and (8).

$Loss = - \sum_{i} \ln {\hat{y}}_{i}$ (5)

$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (6)

$Precision = \frac{TP}{TP + FP}$ (7)

$Recall = \frac{TP}{TP + FN}$ (8)

where $TP$ , $FP$ , $TN$ and $FN$ denote the true positive, false positive, true negative and false negative, respectively.

Experiment

Although the purpose of this paper is to improve the sensing capability of unmanned engineering vehicles, no fully functional unmanned engineering vehicles exist; thus, the experiments were conducted on a manned engineering vehicle. In addition, for safety reasons, the experiments were conducted on a closed construction site with both pavement and native pavement, as shown in Figure 5.

Experiment setup

The experimental setup, as depicted in Figure 3, employed a ZL10 loader as the construction vehicle. The instrumentation included a LiDAR sensor for environmental detection and an RTK sensor for monitoring vehicle conditions, detailed in Tables 2 and 3, respectively. Both sensors were strategically mounted atop the loader to ensure optimal data acquisition, as illustrated in Figure 3(a). The LiDAR unit was interfaced directly with a computer, capturing high-resolution point clouds in the ROS environment. In parallel, the RTK system, connected through an acquisition card, recorded and processed vehicle dynamics data using LabVIEW. This setup facilitated the precise tracking of local coordinates and the loader’s orientation—roll, pitch, and yaw—as shown in Figure 3(b), thereby indirectly mapping the variability of the road surface conditions.

Figure 3.

Experiment setup: (a) is the experimental vehicle, (b) is the local coordinate system and the vehicle-related state ( $ψ, θ, ϕ$ ).

Table 2.

Parameters of LiDAR.

Parameters	Specification
Scanner mass	<2 kg
Operating temperature	−100°C–+600°C
Field of view	0°–360°
Scanning accuracy	±2 cm
Points per second	Up to 320,000
Laser wavelength	905 nm
Scanning frequency	10 Hz

Table 3.

Parameters of RTK.

Parameters	Specification
Mass	1.2 kg
Operating temperature	−40°C–+75°C
Speed accuracy	0.02 m/s
Location accuracy	0.8 cm ± 1 ppm
Heading accuracy	<0.090 (2 m baseline)
The clock precision	20 ns
The electric voltage	DC 12 V
Power consumption	3.5 W
Size	152 mm × 140 mm × 63 mm
The output frequency	50 Hz

The experimental field is the typical unstructured scenario shown in Figure 4. In this study, to evaluate the effectiveness of the filtering algorithm and to collect as many samples as possible to train the neural network, the vehicle is driven on different paths at various speeds to obtain several vibration levels. The LiDAR data obtained under each operating condition are frame intercepted, filtered, point cloud segmented, and then used for neural network training.

Figure 4.

Unstructured experimental field and its point cloud.

Considering the vehicle’s driving ability, scenario limitations and safety requirements, the selected experimental speeds range from 1.7 to 3.7 km/h for the six experimental conditions, respectively, and the driving routes are a combination of straight and curved paths. The different routes for each of the six conditions are shown in Figure 5, starting at the upper-left corner and ending at the lower-right corner of the experimental field. Although the routes of the first four conditions are similar, the stochastic character of the unstructured environment makes the paths between the conditions not identical. The six conditions have different travel routes and velocities, so the dispersion of the resulting point cloud is not the same.

Figure 5.

Experiment data: (a) is the driving routes for the six experimental conditions, (b–d) are the variations of pitch, roll, and yaw with time respectively.

The routes of the last two conditions completely avoid the routes of the first four conditions. The data from the first four conditions are included as the training set for the neural network, and the data from the last two conditions are used as the validation set to avoid the possibility of having data in the validation set that are similar to the data in the training set. In the filtering process, data from all operating conditions are engaged in the analysis.

Results analysis

Comparison of filtering effects

The point clouds are processed by traditional and improved statistical filtering algorithms @ ${α = 0.3, c = 0.3, k = 20}$ , respectively. The travel velocity for operating condition I to VI are 2.6, 2.7, 3.5, 3.7, 2.0, and 1.7 km/h respectively.

The filtering results are shown in Figures 6 to 11; (a) is the original point clouds, which are considerably dispersive, especially in the regions framed by ellipses, rectangles, triangles and circles, and we will focus on the performance of the algorithms in these regions. (b) shows the results after processing by the traditional filtering algorithm and (c) shows the results after processing by the algorithm in this paper. In Fig., circles represent regions where both algorithms obtain favorable results; ovals represent regions where an improved algorithm can yield only satisfactory results; rectangles represent regions where neither algorithm gives promising results, but the present algorithm is better than the traditional algorithm; and triangles represent regions where the proposed algorithm is worse. As can be seen in the Figures 8 and 9 conditions 3 and 4 have weak results possibly because of the high velocities of the vehicle and the large fluctuation of the road surface, thereby making the dispersion of the original point cloud too large, causing considerable difficulty in filtering.

Figure 6.

Comparison of the filtering results under the condition I: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Figure 7.

Comparison of the filtering results under the condition II: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Figure 8.

Comparison of the filtering results under the condition III: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Figure 9.

Comparison of the filtering results under the condition IV: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Figure 10.

Comparison of the filtering results under the condition V: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Figure 11.

Comparison of the filtering results under the condition VI: (a) is raw data, (b) is the filter result of the original algorithm, and (c) is the filter result of the improved algorithm.

Table 4 and Figure 12 show the number of raw point clouds and the number of point clouds obtained with traditional and improved statistical filtering for each operating condition.

Table 4.

Comparison of filtering results.

Working condition	Raw point cloud data	Traditional filtering	Improved filtering	Data volume ration
I	31,872	24,728	18,034	0.729
II	32,256	24,186	17,185	0.711
III	32,256	23,305	18,954	0.813
IV	31,872	24,813	21,063	0.849
V	32,256	25,607	17,470	0.682
VI	31,872	24,546	16,704	0.681

Figure 12.

Comparison of the average data volume of each working condition.

The point cloud processed by the improved algorithm is below 85% of the data volume compared with the traditional algorithm, reaching a minimum of 68.05%, suggesting that the improved algorithm is more efficient and robustness. The velocities of conditions III and IV are greater than those of the other four conditions, and the pitching, yawing, and rolling values fluctuate widely; thus, the filtering is not as effective as it should be

Segmentation

Figure 13 shows the results of 3D LiDAR point cloud segmentation in the experimental scenario with different types of non-terrestrial objects rendered in different colors. The unstructured environment in which most construction machinery operates typically includes objects such as trees, pedestrians and stockpiles; thus, the goal of this part is to identify these three types of objects. In accordance with the segmentation result, the target features are extracted from the point cloud projected onto the x-z plane by iterating over all target points, as shown in Figure 14, where (a) to (c), (d) to (f), and (g) to (i) are the examples of stockpiles, pedestrians and trees respectively.

Figure 13.

Point cloud clustering in different scenarios.

Figure 14.

Target point cloud segmentation.

Comparison of filtering algorithms

To further validate the superiority and adaptability of the method proposed in this paper, we compared it with the latest related methods focusing on the average error rate and execution time as shown in Figure 15. The average error rate was computed by averaging the results of 30 runs under each condition, as shown in the table. Although our model does not exhibit the best execution time—it ranks second—it meets the processing speed requirements. Importantly, our model achieves the best average error rate, demonstrating its superiority.

Figure 15.

Dynamic error rate statistics of the model under different working conditions.

To further illustrate the error rates processed by different models, we conducted a dynamic error rate analysis under various conditions, as depicted in the figure. The results reveal that T. Yang’s model exhibited the highest dynamic error rate, exceeding 50%, followed by R. Heinzler’s model, which surpassed 45%. The model by Yan Zhi also reached 40%, whereas our model maintained a maximum error rate of only 15%, significantly lower than the other models. This further underscores the efficacy of our algorithm in handling point clouds in unstructured environments.

Model training

After the segmentation, the results are manually labeled to form a sample set. The training set is the point cloud for conditions I to IV, which contains 918 target objects consisting of 412 trees, 151 pedestrians, and 355 piles. The point cloud data from condition V and VI are used as the test set, which contains 640 objects consisting of 312 trees, 103 pedestrians, and 225 piles.

With the hyperparameters set to a learning rate of 10⁻⁴, a batch size of 40, and MaxEpochs of 20, the final prediction accuracy is 98.0% and the loss is 0.008 after 1200 iterations, as shown in Figure 16.

Figure 16.

Training process of the proposed network.

In this study, the network demonstrated varied object recognition capabilities, achieving accuracies of 98.64%, 97.96%, and 97.47% for trees, piles, and pedestrians, respectively. The superior performance in identifying trees and piles can be attributed to their more distinct geometric features, coupled with a higher prevalence of these objects in the training dataset. Conversely, the recognition of pedestrians proved slightly less accurate, reflecting the inherent challenges posed by their variable appearances and poses. Overall, the model achieved an average recognition accuracy of 98.0% across these three categories.

In order to further verify the influence of the improved filtering algorithm in this paper on target point cloud detection, this part carries out network training on the point cloud data after traditional filtering and compares it with the improved filtering algorithm. The results are shown in Figure 17. Compared with traditional filtering algorithms, the detection accuracy of the proposed algorithm for trees, material piles and pedestrians is improved by 6.33%, 8.236%, and 6.93% respectively. The possible reason is that the characteristics of the material pile are not obvious and easy to be affected by the surrounding noise. Therefore, it is necessary to improve the filtering algorithm in this paper.

Figure 17.

Comparison between the original filter and the filter in this paper.

Comparison with other models

To complete the task of the CNN, the popular network is modified minimally in this paper (only the softmax and classification after the last fully connected layer are removed, and a three-output fully connected layer with the same parameters as the final fully connected layer is added, while the structure, parameters and weights of other layers are kept unchanged). The training process is shown in Figure 17. A comparison of the important parameters is shown in Table 5. From Figure 18, we can see that the six models have high accuracy, but some difference in details still exist, and the accuracy of each model is different when it reaches above 97%. The proposed model varies above 98%, while the other five models do not reach 98%, with Inception v2 being the lowest at around 97.7%. Therefore, the present model outperforms the other models in terms of accuracy. Regarding the loss, the six models all have rather small values, with ResNet56, Vgg16 and Alexnet fluctuating above 0.02; GoogLeNet and Inceptionv2 varying between 0.02 and 0.01, respectively; and the proposed model ranging between 0.015 and 0.005. Therefore, the present model also outperforms the other models in terms of loss. As shown in Table 6, our model was preferred over the other models in regards to accuracy, loss and time.

Table 5.

Comparison of the performance of different models.

Method	Execution time (s)	Average error rate (%)
Heinzler et al.³⁵	0.21	15.7
Yang et al.³⁶	6.32	26.4
Yan et al.³⁷	1.02	11.8
Ours	0.76	10.3

Figure 18.

Training process of different models.

Table 6.

Comparison of different models.

Base model	Accuracy (%)	Loss	Time elapsed
Inception v2	97.78	0.01557	867 min 14 s
GoogLeNet	97.71	0.01289	87 min 52 s
Vgg16	97.98	0.01806	218 min 24 s
ResNet56	97.97	0.02205	192 min 35 s
Alexnet	97.75	0.01991	68 min 11 s
Ours	98.04	0.00800	32 min 24 s

We conducted performance evaluation experiments for each network model and explored the performance of each model in terms of false positive rate (FP), recall rate (RC) and accuracy rate (PR) metrics. The results are shown in the following Table 7. And the table was added to the manuscript and highlighted. As can be seen from the table, we have bolded the two best performances of the three metrics, and it can be seen that the model of this paper is the best performer, where the false positive rate is 9.67%, the accuracy rate is 94.25%, and the recall rate is 93.62. Once again, it proves the superiority of the model of this paper.

Table 7.

Performance statistics of each model.

Model	FP (%)	Pr (%)	Rc (%)
Inception v2	56.32	79.35	90.69
GoogleNet	32.65	88.36	85.21
Vgg16	12.74	90.24	91.68
Restnet56	22.61	89.65	88.84
Alexnet	16.34	90.64	91.87
Ours	9.67	94.25	93.62

Comparison with existing models

Table 8 shows the LiDAR point cloud classification models proposed in recent papers. The experiment was conducted under different conditions with different samples and computer configurations, and each article has a different research interest and focus; thus, a direct comparison of accuracy is not worthwhile and is not the real purpose of Table 7. From this comparison, we can conclude that the proposed model is capable of classifying targets in unstructured environments.

Table 8.

Summary and comparison of current obstacle detection models.

Methods	Year	Environment type	Accuracy (%)
Mtehon in³⁸	2015	Structured	95.7
Mtehon in³⁹	2017	Structured	95.4
Mtehon in⁴⁰	2017	Structured	85.1
Mtehon in⁴¹	2020	Unstructured	96.9
Mtehon in⁴²	2020	Structured	98.2
Ours	2023	Unstructured	98.0

Discussion and prospects

From the comparison of the results, we can conclude that the proposed classification model for unstructured environments based on improved filtering algorithms and neural networks is valid. The model consists of four parts: data acquisition, point cloud filtering, p segmentation and classification. Firstly, LiDAR and RTK are employed to collect unstructured environmental data; secondly, the point cloud data are filtered to remove irrelevant noise, improve data availability, speed up network training and improve model classification accuracy. Then, the point cloud is projected to the x–z plane to form a number of target clusters and their 2D geometry features, and these clusters are backprojected to 3D coordinates to find the corresponding 3D point cloud for each cluster. Then, the volume, density, and other features of each target are derived. Finally, the obtained training and test sets are applied for transfer learning on an improved CNN for target detection. To verify the performance of the above classification model, experiments are designed and a large amount of data are recorded to train and test the model. The results show that the proposed algorithm has good classification accuracy, reliability and efficiency.

Nevertheless, the proposed model has some areas for improvement. Firstly, the real-time performance of the model has not been validated because the data are processed offline in this paper. Secondly, the algorithm is robust in most cases, but its robustness under high-speed conditions has to be improved, which is also a future task. The following is a vision for future work

The following is a vision for future work: (a). Development of Advanced Filtering Algorithms: We plan to advance the development of more efficient filtering algorithms capable of addressing the high variability and noise typical in unstructured environments. This includes exploring adaptive filtering techniques that dynamically adjust according to the characteristics of the data, potentially improving precision and computational efficiency. (b). Integration of Machine Learning and AI: Our future research will delve deeper into the integration of cutting-edge machine learning and artificial intelligence strategies to enhance point cloud recognition accuracy in unstructured settings. Specifically, we aim to leverage deep learning methodologies to augment feature extraction and object classification in complex scenarios. (c). Standardization of Unstructured Environment Data: We intend to establish comprehensive datasets specific to unstructured environments of engineering vehicles, alongside standardized processing and evaluation protocols. This will facilitate consistent and comparable results across various studies and applications, thereby enhancing reproducibility and collaborative research efforts.

Footnotes

Handling Editor: Aarthy Esakkiappan

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This research was funded by National Natural Science Foundation of China [U22A20184] and the Science-Technology Development Plan Project of Jilin Province [20200501013GX,20200403064SF,20200403059SF,20200403038SF].

ORCID iD

Hongfei Yang

References

Ding

Huang

, et al. Indoor target tracking using dual-frequency continuous-wave radar based on the range-only measurements. IEEE Trans Instrum Meas 2020; 69(8): 5385–5394. DOI: 10.1109/TIM.2019.2959424.

Yuan

Chen

Sun

, et al. Multisensor information fusion for people tracking with a mobile robot: a particle filtering approach. IEEE Trans Instrum Meas 2015; 64: 2427–2442.

Yoon

Park

Maximizing localization accuracy via self-configurable ultrasonic sensor grouping using genetic approach. IEEE Trans Instrum Meas 2016; 65: 1518–1529.

Wenzl

Ruser

Kargel

Performance evaluation of a decentralized multitarget-tracking algorithm using a LIDAR sensor network with stationary beams. IEEE Trans Instrum Meas 2013; 62: 1174–1182.

Vakulya

Simon

Fast adaptive acoustic localization for sensor networks. IEEE Trans Instrum Meas 2011; 60: 1820–1829.

Osgouie

Azizi

. Optimizing fuzzy logic controller for diabetes type I by genetic algorithm. In: 2010 the 2nd international conference on computer and automation engineering (ICCAE). Vol. 2, pp.4–8. IEEE, 2010.

Azizi

Seifipour

. Modeling of dermal wound healing-remodeling phase by neural networks. In: 2009 international association of computer science and information technology-spring conference, pp. 447–450. IEEE, 2009.

Azizi

Entesari

Osgouie

, et al. Intelligent mobile robot navigation in an uncertain dynamic environment. Appl Mech Mater 2013; 367: 388–392.

Azizi

Entessari

Osgouie

, et al. Introducing neural networks as a computational intelligent technique. Appl Mech Mater 2013; 464: 369–374.

10.

Huang

Yang

Tang

A fast two-dimensional median filtering algorithm. IEEE Trans Acoust Speech Signal Process 1979; 27: 13–18.

11.

Sithole

Vosselman

Experimental comparison of filter algorithms for bare-Earth extraction from airborne laser scanning point clouds. ISPRS J Photogramm Remote Sens 2004; 59: 85–101.

12.

Gross

Green

Horstmeyer

, et al. Location and geometry of the Wellington fault (New Zealand) defined by detailed three-dimensional georadar data. J Geophys Res Solid Earth 2004; 109, 1–14.

13.

Chen

Wan

, et al. Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1907–1915, 2017.

14.

Yang

Cui

Belongie

, et al. Learning single-view 3D reconstruction with limited pose supervision. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 86–101, 2018.

15.

Hao

Wang

Structure-based object detection from scene point clouds. Neurocomputing 2016; 191: 148–160.

16.

Yang

Dong

Liang

, et al. Automatic registration of large-scale urban scene point clouds based on semantic feature points. ISPRS J Photogramm Remote Sens 2016; 113: 43–58.

17.

Hamraz

Contreras

Zhang

Vertical stratification of forest canopy for segmentation of understory trees within small-footprint airborne LiDAR point clouds. ISPRS J Photogramm Remote Sens 2017; 130: 385–392.

18.

Broggi

Stefano

Marco

, et al. A full-3D voxel-based dynamic obstacle detection for urban scenario using stereo vision. In: 16th international IEEE conference on intelligent transportation systems (ITSC 2013), pp. 71–76. IEEE, 2013.

19.

Zhao

Xiao

Yuan

, et al. Fusion of 3D-LIDAR and camera data for scene parsing. J Vis Commun Image Represent 2014; 25: 165–183.

20.

Zeng

Wang

Dong

Robust 3D keypoint detection method based on double Gaussian weighted dissimilarity measure. Multimed Tools Appl 2017; 76: 26377–26389.

21.

Montemerlo

Becker

Bhat

, et al. Junior: the Stanford entry in the urban challenge. J Field Robot 2008; 25: 569–597.

22.

Thrun

Montemerlo

Aron

. Probabilistic terrain analysis for high-speed desert driving. In: Proceedings of robotics: science and systems, Philadelphia, USA, August 2006.

23.

Urmson

Joshua

Drew

, et al. Autonomous driving in urban environments: Boss and the urban challenge. J Field Robotics 2008; 25(8): 425–466.

24.

Himmelsbach

Hundelshausen

Wuensche

. Fast segmentation of 3D point clouds for ground vehicles. In: Proceedings of 2010 IEEE intelligent vehicles symposium, pp. 560–565. IEEE, 2010.

25.

Hegde

Prasad

Hebbar

, et al. Feature extraction using traditional image processing and convolutional neural network methods to classify white blood cells: a study. Australas Phys Eng Sci Med 2019; 42: 627–638.

26.

Kae

Sohn

Lee

, et al. Augmenting CRFs with Boltzmann machine shape priors for image labeling. In: IEEE conference on computer vision & pattern recognition, pp. 2019–2026. IEEE Computer Society, 2013.

27.

Wang

Zhang

. The detection and recognition of bridges’ cracks based on deep belief network. In: IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), vol. 1, pp. 768–771.

28.

Zeng

Liu

, et al. Convolutional neural network based multi-feature fusion for non-rigid 3D model retrieval. J Inf Process Syst 2008; 14: 178–192.

29.

Wang

Sun

Liu

, et al. Dynamic graph cnn for learning on point clouds. ACM Trans Graph 2019; 38: 1–12.

30.

Pang

Ulrich

. 3D point cloud object detection with multi-view convolutional neural network. In: 23rd international conference on pattern recognition (ICPR), pp. 585–590. IEEE, 2016.

31.

Song

Zhang

Tian

, et al. CNN-based 3D object classification using Hough space of LiDAR point clouds. Hum Comput Inf Sci 2020; 10: 1–14.

32.

Rangel

Martínez-Gómez

Romero-González

, et al. Semi-supervised 3D object recognition through CNN labeling. Appl Soft Comput 2018; 65: 603–613.

33.

Zeng

Zhang

Wang

, et al. Dempster–Shafer evidence theory-based multi-feature learning and fusion method for non-rigid 3D model retrieval. IET Comput Vis 2019; 13: 261–266.

34.

, et al. PointNet: Deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660. IEEE, 2017.

35.

Heinzler

Piewak

Schindler

, et al. CNN-based Lidar point cloud de-noising in adverse weather. IEEE Robot Autom Lett 2020; 5: 2514–2521.

36.

Yang

, et al. Learn to model and filter point cloud noise for a near-infrared ToF LiDAR in adverse weather. IEEE Sens J 2023; 23: 20412–20422–1, 2023.

37.

Yan

Duckett

Bellotto

Online learning for 3D LiDAR-based human detection: experimental analysis of point cloud clustering and classification methods. Auton Robots 2020; 44: 147–164.

38.

Awrangjeb

Fraser

Building change detection from Lidar point cloud data based on connected component analysis[C]. ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, 2015; 2: 393–400.

39.

Wang

Liu

, et al. Pedestrian recognition and tracking using 3D LiDAR for autonomous vehicle. Robot Auton Syst 2017; 88: 71–78.

40.

Morales

Jonay

Leopoldo

, et al. A combined voxel and particle filter-based approach for fast obstacle detection and tracking in automotive applications. IEEE Transa on Intell Transp Syst 2016; 18(7): 1824–1834.

41.

Kragh

Underwood

Multimodal obstacle detection in unstructured environments with conditional random fields. J Field Robot 2020; 37: 53–72.

42.

Tian

Song

Chen

, et al. A fast spatial clustering method for sparse LiDAR point clouds using GPU programming. Sensors 2020; 20: 2309.

Neural network-based 3D point cloud detection of targets in unstructured environments

Abstract

Keywords

Introduction

Related works

Point cloud filtering

Point cloud segmentation

A. CNN

Methods

Proposed framework overview

Data acquisition

Point cloud filtering

Point cloud segmentation

Target detection

Experiment

Experiment setup

Results analysis

Comparison of filtering effects

Segmentation

Comparison of filtering algorithms

Model training

Comparison with other models

Comparison with existing models

Discussion and prospects

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References