Abstract
Introduction
The new manufacturing paradigm pursues the acceleration of delivery rates through the gradual implementation of automated processes. Aerial robots can contribute to this automation process by providing some intrinsic capabilities such as flexibility, fast response, and availability. Figure 1 shows an aircraft manufacturing plant where small aerial robots are expected to begin carrying out specific logistic operations. In order to allow the safe introduction of aerial robots collaborating with humans in manufacturing plants, there is a need for robustness and reliability in a series of key enabling technologies such as localization, mapping, and path planning. Among them, the localization problem is essential for building a mobile robotic system, since accurate pose estimation is required for even the most basic tasks, such as holding the position. It is commonly referred to as “the most fundamental problem to providing a mobile robot with autonomous capabilities.” 1 Once the aerial robot is capable of self-localizing with enough accuracy, the rest of technologies such as navigation and guidance can be implemented.

Aerial robot taking off from a designated location to perform logistic operations in an aircraft manufacturing plant.
The localization problem consists in accurately determining the current pose of a robot (position and orientation) in a specific environment. This has been an active research topic for the past decades. Ground robots moving in 2-D environments have demonstrated good performances in terms of localization either indoor 2 or on streets and highways. 3,4 However, aerial robots operate and move in 3-D environments and are exposed to additional challenges apart from the space dimensionality, since onboard capabilities are limited due to payload, computational resources, and power constraints.
Outdoor operation for aerial robots based on global positioning system (GPS) can be generally assumed due to the existence of low-cost commercial products offering mature performances in a wide variety of applications. 5 –7 Despite this, autonomous aerial vehicles are starting to play a major role in applications such as inspection, 8 search and rescue, 9 or security surveillance, 10 which usually require the aerial robots to fly in dense environments, at low altitudes or indoors. In such cases, GPS signals are often shadowed or can be unavailable. In addition, the update frequency of GPS location queries is not enough to achieve the aforementioned tasks in a robust and safe manner, since the aerial robots are supposed to be operating in small environments and move at relatively high speeds.
Related work
In order to compensate the issues that GPS poses, other methods must be used to provide accurate and robust robot localization. Vision-based approaches for robot localization and odometry are very popular due to the affordability and availability of cameras and the low weight of the sensors. However, one important drawback is the associated processing complexity. It is very common to find approaches that make use of monocular cameras fused with inertial sensors in order to perform visual simultaneous localization and mapping (SLAM). 11 –14 However, these approaches tend to fail when the aerial vehicle exhibits high-speed motions. Other approaches make use of photogrammetry 15 but pose additional processing requirements that hinder online computation.
Another trend is to employ stereovision, 16 –18 which allows for calculation of 3-D depth measurements at a higher computational cost. More recently, Red,Green,Blue-Depth (RGB-D) sensors based on structured light have recently become a very popular option 19,20 due to their low weight, low cost, and the amount of information provided. Besides RGB images, they directly provide depth maps of the scene in front of the sensor, saving the burden of 3-D reconstruction computation. Besides, this sensor exhibits another important advantage with respect to classic camera-based approaches: Depth estimation does not depend on the availability of distinct visual features in the scene and is less reliant on lighting conditions while operating indoors. These factors make RGB-D sensors more suitable for reliable localization in our framework. There are several state-of-the-art algorithms available that provide localization estimations, such as RGBD-SLAM, 19 CCNY RGB-D, 21 or RTAB-Map. 22 They exhibit good accuracies under realistic conditions; however, when applied to aerial robots for online localization estimation, they drop their performance due to vibrations and faster motions than ground robots, as shown in the section “Experimental results.” They may pose safety issues when performing online localization in order to close the control loop, since the map building process that they perform at the same time may lead to localization jumps in order to fit the current sensor point cloud to the map.
In general, vision-based odometry and localization systems for aerial robots are not reliable enough in the long term due not only to cumulative drift but also to external factors such as poor illumination, lack of texture, occlusions, or moving objects. All these have a significant impact on the robustness and the reliability of the most state-of-the-art algorithms. The aforementioned approaches demonstrate fairly good results in the short term; however, they could quickly diverge depending on the environment. Some of them perform well even in long trajectories when we revisit the same areas, which allows for loop closing in order to reduce the localization uncertainty. 23,24 However, the problem of loop closing in order to distinguish places in the environment is usually framed as a classification task rather than a robot localization task. Nevertheless, reliable place recognition can be challenging in large-scale structured environments that might exhibit similar scenes in different areas, such as a manufacturing plant.
Other approaches that cope with long-term localization solutions based on vision rely on previously built maps of the environment. Logistic services are usually carried out in known environments, and this is also a requirement for path planning because human workers should be able to specify goals to the aerial robots in a predefined coordinate system or a map. In addition, map-based approaches for localization are usually better in terms of reliability and computational requirements. An a priori map can be computed off-line using cameras and/or range sensors, making use of existing algorithms, for example, RGBD-SLAM. 25
One of the approaches based on a predefined model of the environment is commonly known as teach-replay, 26,27 which is accomplished in two stages: First the robot is manually piloted along the desired path as in a teaching phase, and an accurate 3-D map of the environment is built, along with the robot motion from this learning path; afterward, this map is used to locate the robot when it repeatedly visits the same path. However, this approach is somewhat simplistic and limited since it only enables a robot to follow a predetermined trajectory.
Monte Carlo localization (MCL) is another approach that makes use of a known map of the environment and is commonly used for robot navigation in indoor environments. 28 It is a probabilistic localization algorithm that makes use of a particle filter to estimate the pose of the robot within the map, based on sensor measurements. Important benefits include the possibility of accommodating arbitrary sensor characteristics, motion dynamics, and noise distributions. There is a variant of MCL called adaptive MCL (AMCL). The term “adaptive” comes from the fact that the number of particles is adjusted dynamically: If there is high uncertainty about the robot pose, the number of particles increases; if the pose is well known, the number decreases. An open-source version of AMCL is included in the robot operating system (ROS) navigation stack (http://wiki.ros.org/amcl). However, it is meant for wheeled robots moving in a 2-D environment, requiring a 2-D laser scanner for map building and localization. Other authors presented an extension of this approach for 6-D localization based on 2-D laser scanner, 29 but it is meant for the 2-D motion of humanoid robots in a 3-D environment, which makes it not suitable for aerial robots. The use of GPUs can enhance the computing capabilities of the onboard system regarding MCL-based approaches, as proposed by other authors. 30,31
Localization based on radio beacons has also been deeply studied in the last years, with specific setups taking place in manufacturing environments 32 and using aerial robots. 33,34 Radio ranging sensors provide point-to-point distance measurements and offer a low-cost solution that can be implemented in almost any scenario, with the advantage of not requiring a direct line of sight between each pair of sensors. Besides, the data association problem is trivially solved by attaching the sensor identification to the range measurement information. Ultra-wideband (UWB) is a wireless communication technology which has attracted interest from the research community as a promising solution for precise target localization and tracking. 35 –37 It is particularly well suited for short-distance indoor applications, using several fixed sensors placed at known positions in the environment, and a mobile sensor onboard the aerial robot. Hence, the position estimation of the onboard sensor can be obtained by triangulation with an accuracy of the order of that from the sensors. However, these sensors are poorly suited to constitute a full localization system due to the lack of bearing information, thus leading to multiple location hypotheses. Combining error-bounded range sensing with reliable short-term position estimation based on visual odometry has been already studied for achieving robust long-term localization in position. 38 The aerial robot orientation was not robustly determined by this approach though; yaw angle was solely determined by the visual odometry algorithm, while roll and pitch angles were assumed to be observable with an inertial measurement unit (IMU).
The article is structured as follows. The section “System overview” briefly describes the overall approach. In the section “Localization approach,” the main algorithm used in this work is detailed, an enhanced MCL approach. Experiments demonstrating long-term localization are summarized in the section “Experimental results,” followed by conclusions and future work.
System overview
The aforementioned approaches (visual odometry, MCL, and radio-based localization) are promising in that they can all provide solutions to the localization problem, but important drawbacks are present using each approach alone. The main contribution of this work focuses on the combination of all three: an MCL algorithm relying on a previously built 3-D map, an RGB-D sensor for odometry and point cloud matching, and radio-based sensors installed in the environment and localized within the map. This results in a system suitable for long-term aerial robot localization, in a way that these technologies benefit from each other. We are able to have a reliable short-term position estimation based on visual odometry, while keeping its drift bounded thanks to the map matching and the integration of range measurements. The noise and outliers present in range sensing are filtered thanks to the odometry prior. Finally, the MCL exploits the odometry to update the motion of the particles and the range measurements to maintain a low dispersion on the position of particles whenever the point cloud matching with the map is not acceptable. This solution offers reliable localization in position and yaw angle, while roll and pitch angles can still be observable through an IMU. Moreover, the implemented algorithms are highly efficient so they are suitable for real-time operation on-board the aerial robot for performing online localization. The use of GPUs is discarded; we propose an approach that can be implemented in standard CPUs and hence not limited its applicability in regular platforms.
The block diagram in Figure 2 depicts an overview of our method. The aerial robot localization solution is based on the integration of visual odometry, 3-D point clouds, and distance measurements to known radio beacons into an MCL.

Schematic overview of the processing pipeline.
The odometry system combines images, point clouds, pose estimation, and key-framing in a loose-coupling filter in order to estimate a reliable and accurate localization at the short term. As it was previously mentioned, odometry will accumulate errors over time. MCL takes care of detecting and correcting the cumulative errors of the odometry by comparing the point clouds with a known 3-D map of the area and using range sensing to fixed beacons to readjust localization errors.
This article shows how the integration of all the elements and sensing approaches was carried out in order to get a fast, reliable, and error-bounded localization.
Localization approach
Robot localization involves the creation of a map of the environment, which typically is based on some sort of SLAM approach. This would later allow the estimation of the robot pose (position and orientation) in the coordinate frame of such map. Most localization algorithms require an accurate odometry in order to maintain a reliable robot pose estimation in the map. However, odometry is subject to drift as a result of error accumulation over time, and there is a need for some other source of information in order to reset such drift.
Our work extends the MCL proposed by Hornung et al. 29 As previously stated, this algorithm needs a map of the environment in order to estimate the robot pose within the map based on its motion and sensing. The starting point of the algorithm is an initial belief of the robot pose probability distribution, which determines the distribution of the particles around such pose according to such belief. These particles are then propagated following the robot motion model each time its pose changes. Every time we receive new sensor readings, each particle evaluates its accuracy by checking how likely it would receive such sensor reading at its current pose. The next step of the algorithm is redistributing (resampling) the particles to new poses that are more likely to be accurate. This is an iterative process that involves moving, sensing, and resampling, while all the particles should converge to a single cluster near the true pose of the robot.
The motion model used in this work is based on a fast and reliable visual odometry computed from RGB-D data. We have adapted a stereovision algorithm 38 (which is publicly available at http://wiki.ros.org/viodom) in order to make it work with RGB-D sensors. The odometry estimation is applied to the particles, providing an estimation of the a priori distribution of the MCL. The sensor readings are the point clouds provided by the RGB-D sensor and the distance measurements to several radio-tags installed in the environment. The weight of each particle is then calculated according to the two types of sensor readings. The point clouds are transformed to each particle pose in order to find correspondences between the cloud and what the map should look like from that particle’s pose. The distance measurements to radio-tags are used to check how well each particle position matches such distances. In order to fuse both types of sensor readings, each particle has a weight for each type of sensor and they are later fused into a single weight.
The particle filter consists of
where
Each particle has an associated weight
Filter initialization
Particles can be initialized manually or automatically by setting the initial position together with a covariance matrix to distribute the particles in the space.
In the case of manual initialization, the particle cloud is sampled following a 4-D normal distribution using as mean the provided initial position and a specific covariance matrix
Automatic initialization can handle the “kidnapped robot” problem. Particles are drawn uniformly over the 4-D state (
Nevertheless, for achieving full autonomous operation of an aerial robot before takeoff, we must manually initialize the starting pose of the robot, providing an initial position and orientation along with a covariance matrix to distribute the particles around such initial pose. Knowing the starting takeoff location of an aerial robot will usually be the case when deploying an autonomous system in real scenarios, for example, in a manufacturing plant for logistic operations.
Besides, the risk of getting stuck in highly symmetric environments is smaller when combining the 3-D point clouds with radio range measurements. Particles that do not fit distance measurements to fixed beacons will be rapidly discarded.
Filter prediction
The odometry system periodically provides increments for each of the components of the robot state vector
This information is used to compute the a priori distribution of the particles. Thus, considering that the multi-rotors are holonomic systems, the state of the particles will evolve according to the following expressions
The values of Δ
Filter update
We have defined a threshold in both position and orientation such that if the visual odometry exceeds any of those, a filter update is performed using the last 3-D point cloud received from the RGB-D camera, and all the distance measurements to the UWB beacons are received since the last filter update. Then, each particle
Given the distinct nature of the two technologies involved, we calculate separate weights for each sensing modality. A weighted average is used to obtain the final weight of each particle
where
Such probability grid only needs to be computed once, is not required to be updated for a given environment, and relieves from performing numerous distance computations between each cloud point for each particle and its closest occupied point in the map. Besides, each point cloud is first transformed according to the current roll and pitch provided by the onboard IMU. This transformation is done just once per update, reducing the computational requirements as well.
Then, for every point of the transformed cloud, we access its corresponding value in the 3-D probability grid. Such value would be an indicator of how likely is that point to be part of the map. By doing this with every point of the cloud and adding all the probability values, we obtain a figure of how well that particle fits the true location of the aerial robot according to the map.
Finally, the weight
where

Error histogram characterization of the range sensor for indoor environments.
The distance measurements are integrated into the particle filter as follows. Since the radio beacons do not provide bearing information, we first define new particle states without
and sensor UWB beacon states using their known positions
Given the measured radio-based distance
This constraint can be easily applied to the particle weight calculation according to the computed distance
where
A great advantage of this approach is that the distance to a single sensor is enough to update the weights of the particles, it does not need to wait until distances to three or more sensors are obtained by the onboard radio-tag, which usually happens in other state-of-the-art range-based localization systems.
The authors also evaluated the option of drawing new particles into the filter with each new range measurement taking into account the lack of bearing information. Thus, with each range measurement, a set of particles would be drawn in a sphere surrounding the onboard sensor position into the map. Most of these particles will be later on destroyed in the next resampling after update thanks to the weight contribution of the point cloud matching with the 3-D map. However, this approach was discarded because it requires a large number of particles for a proper representation of the range hypotheses, which hinders the computational efficiency.
Finally, the particle set is resampled to allow spreading the particles over the maximum likelihood areas. The algorithm employed for resampling is the low-variance sampler. 41
The updated state vector for the aerial robot is then calculated as the weighted sum of all the particles. Due to this mixed approach, the outliers commonly present in indoor range measurements are rejected thanks to the particle weighting. If an outlier in the distance measurement from a range sensor is received, following equation (14), this would result in very low values for

Particle cloud within the 3-D map (black arrows), and associated 3-D point cloud projected from the weighted sum of all particles along with the 3-D map of the environment. Top left corner shows current RGB image.
Experimental results
An experimental setup has been conceived to validate the presented approach. The field test took place at the indoor test bed of the Center for Advanced Aerospace Technologies (CATEC). This facility is used to develop and test a wide range of technologies related to autonomous systems. There is a useful volume of 15 × 15 × 5 m3; part of it can be seen in Figure 5. The test bed houses an indoor localization system based on 20 tracking cameras, which only needs the installation of passive markers on the objects to locate and/or track. This system is able to provide the position and attitude of each object in real time with millimetric precision.

CATEC indoor test bed and scenario used for the field experiments. Picture of the site of the experiments (top). 3-D map of the area with approximate UWB beacon locations (b1, b2, and b3) (bottom). CATEC: Center for Advanced Aerospace Technologies; UWB: ultra-wideband.
This section provides experimental results from the estimated localization compared with ground truth data obtained from the test bed tracking system. Prior to the discussion of the experimental results, the setup describing all the elements involved in the experiment is presented in the next subsection.
Experiment setup
The aerial robot in Figure 6 performed a flight based on VICON telemetry for localization and navigated using joystick commands that sent nearby position and orientation waypoints in the robot coordinate frame. The results presented in this article were obtained off-line. In order to validate the long-term character of our approach, the flight took roughly 9 min so the visual odometry could drift enough to verify that the two sensing measurements used in the particle filter help preventing localization error growth.

Aerial robot with RGB-D sensor at the front.
The complete experimental setup is composed of the following elements: An aerial robot with the following onboard sensors: an RGB-D sensor, an IMU, and a radio range sensor (see Figure 6). A 3-D map of the working area (see Figure 5) obtained using another RGB-D sensor, with a resolution of 0.1 m. Three radio range sensors installed across the indoor test bed at known positions.
The 3-D map of the area was acquired using another RGB-D sensor whose housing was augmented with several passive markers (see Figure 7) in order to get its pose with high accuracy from the test bed’s motion capture system. Such sensor was carried around the test bed while connected to a laptop, and the acquired point clouds were projected using the current sensor pose and merged into an accurate octree. This map can be seen in Figure 5.

RGB-D sensor with passive markers for precise 3-D map building.
A small infrastructure of range sensors (three sensors) has been installed in the indoor scenario where the flight has taken place. The radio-based sensors have been installed at three different locations homogeneously distributed within the indoor test bed. These fixed locations were acquired using the VICON tracking system and provided to our filter. There is an additional node installed on the aerial robot in order to compute point-to-point distances.
The aerial robot was remotely piloted through the test bed area at different heights. The ground-truth trajectory followed by the vehicle is presented in Figure 8. This is roughly 200-m long trajectory in which the robot performed a trajectory for a whole battery duration.

Ground-truth trajectory followed by the aerial robot during the experiment.
Automatic initialization
Even though for the experiment we manually set the initial pose of the aerial robot, an example of automatic initialization is shown in Figure 9. This demonstrates the capabilities of our approach toward handling the case in which the initial pose of the aerial robot is unknown. During takeoff, the filter starts performing prediction, update, and resampling of particles, based on the distance thresholds that were defined in Table 1. The result reveals how the UAV does not need to move much in order to converge a solution in position (top row in Figure 9). This mainly occurs because the radio ranging measurements quickly induce the right location. However, some of these particles do not represent the correct yaw angle. As soon as the UAV starts moving in the horizontal plane toward some obstacles, the point cloud matching with the map refines the position and properly approximates the yaw angle (bottom row in Figure 9).

Automatic initialization of particles. From left to right and top to bottom, time evolution of particles after automatic initialization. Black arrows represent the particles.
MCL parameters.
MCL: Monte Carlo localization.
Results and discussion
The main objective of the experiment is to demonstrate the suitability of the approach in real time during regular operations. The estimated UAV position and yaw during the experiment can be seen in Figure 10, both from the visual odometry and the MCL. Roll and pitch angles are not included in the plots because, as previously stated, they are observable from the onboard IMU and are directly integrated into our localization approach. The IMU used in our approach provides accurate and filtered estimations of such angles.

Localization results. UAV ground-truth position and yaw captured by the motion capture system (red line). Estimated UAV position and orientation using the proposed odometry (blue line). Estimated UAV position and orientation using the proposed approach (green line). UAV: unmanned aerial vehicle.
These results (Figures 10 and 11) were obtained after configuring the MCL algorithm with the parameters shown in Table 1.

Position and orientation errors through time. Left plots correspond to the odometry approach, while right plots show results from the complete proposed approach including the MCL. Computed error of the estimated position and orientation with respect to the ground truth (red line). Estimated RMS error for each axis (blue line). RMS: root mean square; MCL: Monte Carlo localization.
The plot also shows the ground-truth position and orientation provided by the test bed’s motion capture system. It can be seen how the visual odometry slowly accumulates errors over time, while the estimations from our complete approach closely follow the ground truth during all the experiments. It is also worth to mention that this estimation does not drift with time and the errors are approximately bounded.
Figure 11 shows the error of the estimation during the trajectory execution. The error is computed as the Euclidean distance between the estimation and the ground truth at every time step. Besides, the plot depicts the computed root mean square (RMS) errors for each axis, which are also shown in Table 2.
RMS localization errors.
RMS: root mean square; MCL: Monte Carlo localization.
It can be seen how the RMS errors are approximately 0.4 m in
In order to quantitatively assess the performance of our localization approach, we have tested the data from the validation experiment against other popular approaches that make use of RGB-D data for robot localization, either visual odometry and mapping such as CCNY_RGBD (http://wiki.ros.org/ccnyrgbd) or SLAM approaches such as RTAB-Map (http://wiki.ros.org/rtabmap) or RGBD-SLAM (http://wiki.ros.org/rgbdslam). The plots in Figure 12 show the performance of the same flight data processed off-line using these approaches, and Table 3 summarizes general RMS errors compared to those from our approach.

Estimated UAV position and orientation using other approaches based on RGB-D sensors. UAV: unmanned aerial vehicle.
RMS error comparison.
RMS: root mean square.
The RMS errors in
Table 4 includes values for the mean frequency and corresponding mean processing times at which new pose estimations are published, using an i7 Linux laptop with 2 Cores (the UAV onboard processor is also an i7 Linux computer). Our approach, including the visual odometry, is able to deliver pose estimations approximately every 60 ms when using 500 particles, which makes it suitable for working at around half the frame rate of the RGB-D sensor. The other approaches may publish pose estimations to the ROS ecosystem at a higher rate than that of the table, but internally they just copy the same values to the output message until a new estimation is computed.
Pose estimation speed.
The visual odometer took an average 90% of a single thread and the proposed MCL algorithm the 55% of a single thread. This configuration leaves plenty of computation for robot planning and navigation.
The main contribution of this approach is that the localization accuracy can be improved in long-term operation using both RGB-D and radio-range sensing. Therefore, the parameter

Localization errors for different values of
Figure 14 presents the estimation accuracy using different numbers of particles, along with the required computational times included in Table 5 in order to demonstrate the capabilities of our approach. The rest of parameters are still the default ones, as shown in Table 1. It can be seen how the impact of the number of particles on RMS errors is fairly minor, while computation times are drastically reduced.

Localization errors for different number of particles used in MCL. MCL: Monte Carlo localization.
Computation times for one MCL iteration when modifying the number of particles.
MCL: Monte Carlo localization.
The spatial occupancy of the working area is approximated by an octree, as previously shown in Figure 5. Another set of experiments has been performed by modifying the spatial resolution of such map from the default (0.1 m) to the following two depths of the tree, which correspond to leaf sizes of 0.2 m and 0.4 m. Figure 15 shows these maps. Running the experiment with higher sizes was discarded because the map is no longer representative enough for map matching. The number of particles was 500, and

3-D map of the area with different resolutions: 0.2 m (left) and 0.4 m (right).

Localization errors for different octree resolutions of the 3-D map.
Computation times for one MCL iteration when modifying the 3-D map resolution.
MCL: Monte Carlo localization.
Conclusions and future work
An implementation of MCL suitable for operation in 3-D environments has been presented. Visual odometry based on an RGB-D sensor is used along with a previously known map of the environment and a set of radio-based sensors for point-to-point distance measurements.
Experimental results demonstrate a strong overall system performance as well as the feasibility of the approach, both in accuracy and computational efficiency. The proposed localization approach provides a reliable and accurate 4-D (
Future work will consider the inclusion of other sensor modalities into the localization system. Information as visual place recognition or altimeter can be used to draw new particles into the filter. Additionally, the authors will also evaluate how to compute accurate 3-D maps of the environment including range sensors to be used as input to the proposed approach. Moreover, improvements to the conventional MCL used in this approach will be evaluated, such as modifying the resampling method while keeping in mind the computational efficiency of the whole algorithm.
Finally, the authors are currently preparing to release the source code of the algorithms and add the corresponding information on the ROS wiki to be publicly available.
