Abstract
Introduction
Anti-personnel mines are explosive artefacts originally designed to destroy military targets. Both civilians and the military are daily affected worldwide. Since 1999, more than 90,000 people have been murdered or harmed by these devices, and around 10 people are harmed each day by the explosive blast or flying fragments emitted by the explosion. Colombia is currently one of the most mine-affected countries in the world. Since 1990, the Colombian government has registered 10,751 victims: 39% corresponding to civilians and 61% to the military. According to government statistics, there are more than 3.6 million registered IDPs (internally displaced persons) in Colombia, i.e., civilians that have to migrate from rural to urban areas. In most of cases, landmine contamination prevents the return of IDPs to their land, affecting not only the well-being of these people but also the agricultural productivity of the country, increasing social unrest in the largest cities. Colombia's new victims' law aims to compensate the victims of the Colombian conflict and lays out plans to return 3,000,000
The procedure of anti-personnel mine (landmine) detection and removal is referred to as ‘mine-clearance’ or ‘demining’. Commonly, the basic mechanism of a landmine structure consists of a detonator, a firing mechanism, an explosive charge and metal or plastic housing. These artefacts are mainly classified into two groups: anti-personnel (AP) and anti-tank (AT) landmines [6]. AP landmines are small and light whereas AT landmines are larger and require about 100 kg-f of pressure to be triggered. In this article, we present the key role that robotics has been playing by designing autonomous machines for demining AP landmines. There are three different approaches to tackling landmine detection and deactivation using complementary tools: (i) tools directly operated by human operators (e.g., metal detectors, sniffer animals, manual prodders and chemical sensors), (ii) vehicles used for deactivating mines based on controlled detonations, and (iii) unmanned vehicles able to detect and deactivate landmines. Unfortunately, in Colombia, the former approach still remains the most widely used method by the military. Figure 1 shows the different tools used for demining.

Heavy equipment used for landmine removal
In recent years, unmanned vehicles have been improved with a wide variety of sensing hardware: ground penetrating radars (GPRs), chemical sensors, multi-spectral and thermal cameras, among others. In this regard, wheeled and legged vehicles have sufficient payload capacity to carry heavier and larger equipment that normally requires highly computationally costly resources [5], whereas aerial vehicles have appeared as a new approach for demining thanks to the miniaturization of sensing power and processing electronics.
This section presents the related field-work on the use of robotics as applied for demining. Firstly, we review and highlight some of the advantages and drawbacks of the most commonly used terrestrial and aerial robot mechanisms, and secondly we present some approaches for landmine detection.
Unmanned land vehicles for landmine detection
Accurate navigation over rough terrains is perhaps the major challenge for land-based vehicles. For demining, navigation control is crucial in enabling safer terrain mapping and accurate reconnaissance. Different mechanisms, such as wheeled, legged and dragged robots have been proposed to overcome this issue [28]. Within the wheeled robot category, the Irish company Kentree manufactures a six-wheel mobile robot named HOBO [4]. The robot has an arm-manipulator capable of lifting payloads up to 75

UAV applications for landmine detection
The use of unmanned aerial vehicles (UAVs) is clearly suited to covering a minefield without the risk of triggering landmines during the mission. However, the weight and size of the sensing systems used for demining are unlikely to be placed on UAVs due to their poor payload capacity. In [13], the authors propose the fabrication of a small, multi-frequency GPR on board a UAV quadrotor able to lift up to 1.1

General description of the system. The AR Drone 2.0 quadrotor is wirelessly connected with the base station. The modules of the base station are ROS-based packages coded in C++.
Thus far, biological sensors used by animals (e.g., dogs and rats) provide the highest degree of accuracy in terms of landmine detection; (
In [27], an airborne LIDAR system integrated with a laser scanner, GPS and an inertial measurement unit (IMU) is proposed. The system is able to detect TNT and DNT using sensitive biosensors based on the soil bacterium
We propose the use of an affordable aerial system with the following sensors onboard: a CMOS camera, GPS and an IMU. Our goal is to use an artificial vision-based approach for detecting partially buried landmine-like objects and creating a
Communication driver: uses an open-source ROS-based driver called Ardrone autonomy [15] to enable the wireless transmission of navigation, sensor and control data between the AR.Drone and the base station.
Navigation: handles the flight control. It provides both camera and IMU data to the visual mosaic module. It uses an open source ROS-based package called
Visual mosaic: builds a panoramic image by combining multiple photographic images captured by the robot's camera. It generates a map of the covered terrain. It uses an ROS-based package developed by the authors called
Detection algorithm: detects partially buried landmine-like objects in real-time using image recognition methods. It uses a former ROS-based package called
GPS data: enables the geo-location of the detected landmines based on GPS information processed by the
The robot operating system (ROS) is an orientated, widely-used framework for robotics applications. The aforementioned packages were written in C++ using the OpenCV library and the Qt4 user interface development framework. In this article, we extend the functionality of the
The low-cost quadrotor-based UAV
The selected platform is the AR.Drone 2.0, a low-cost quadrotor equipped with a 1
Geo-location and image mosaicing methods
In order to geo-locate the identified landmine targets within an image (a geodesic position), we calculated the odometry between consecutive images by using the IMU data [20]. Once the landmine is geo-located, we create a map of the terrain by computing an image mosaic.
Image mosaicing is a process for building a panoramic image that results from combining multiple photographic images. To achieve an accurate match between the frames, one might apply an alignment method. There are two methods for the alignment and fusion of images: (i) direct and (ii) feature-based. The direct method iteratively estimates the camera parameters by minimizing an error function, which depends on the intensity difference in the overlapping region of the images [30]. In contrast, feature-based methods do not need initialization, are more robust against image movements, and can be used for the detection of objects that are widely separated. In this work, we investigated existing methods for feature-matching [25, 35, 16], blur-object removal due to the motion of the camera [32, 11], and also how to deal with different image exposures [8].
As mentioned, the creation of a panorama photo is the result of combining individual images. That process is also known as ‘stitching’. This approach requires nearly exact overlaps between images and similar exposures to produce a seamless result. Basically, three procedures must be applied to the image: registration, calibration and blending. To properly apply these procedures, one must first calculate the homography matrix of the image and a camera calibration matrix aimed at correcting image distortion by transforming the image into a standard coordinate system (image rectification). When the images are captured from a UAV, the inherent motion of the camera causes errors during the calculation of the transformation model. Approaches to solve this issue are based on improving the abstractions of a sparse set of features [24, 26], e.g, Jyun et al, [18] implemented a dominant image selection method to reduce the motion parallax region, Yahyanejad et al, while [33] derived a geometric transformation using the location and orientation data of the camera and also incorporated depth map information within the image stitching algorithm to improve on the feature extraction, geometric distortion correction and projection selection.
Outline
This article is organized as follows: Section 2 briefly describes the visual approach for landmine-like object detection based on prior work [19, 22]. Section 3 introduces the method for visual stitching using the low-resolution camera of the AR.Drone quadrotor. Section 4 presents simulation scenarios that were developed for the preliminary testing of the system, whereas Section 5 concludes with a field report of our system deployed for different scenarios.
Landmine detection and geo-location
Vision algorithms are used to analyse the images and to determine whether a landmine has been found. The searching procedure is developed by analysing the content of the images obtained from the bottom camera of the UAV. The developed algorithms to perform the landmine detection task are included in the ROS package


A flowchart describing: image pre-processing, registration and blending
Noise filtering
The noise filtering process implements classical vision algorithms to eliminate the noise of the image, such as morphological and noise filtering operations. The first step consists of transforming the obtained image from the RGB scale to greyscale to perform binarization operations. The filtering process involves several methods for removing those image-objects that do not match a landmine. Firstly, the erosion function is used to delete the little noise pixels of the images. Secondly, a specially designed function is used to eliminate those medium objects of the image. The object elimination criterion is based on a pre-defined pixel-size threshold. The last step is to recover the removed fragments of the landmines once the erosion function has been applied. Figures 4a-b depict the transformation process of the image to the greyscale and binary space. Figure 4c details the resultant image after applying the erosion function. The output image is then shown in Fig. 4d.
Feature extraction and image classification
The feature extraction involves two methods for obtaining enough information from the image and to determine whether the detected object in the image is a landmine. The selected properties of the images are related with pixel-features. Firstly, the size of the detected object is calculated by counting the number of white pixels in the binarized-image. Secondly, the number of pixels contained within a define percentage of intensity in each layer of the RGB scale of the image is calculated. The image classification conditions are defined by the error measurements calculated with the features extracted and the original features from a comparison template. Figure 4e shows the final image after applying the noise filtering and image classification process.
Geo-location of targets
The geodesic coordinates of the detected landmines are obtained using the flight recorder module, which comprises a GPS receiver and 4 GB of USB flash memory. The GPS has an accuracy of ±2
In previous work [22], we explained the main software components of our proposed ROS package for visual landmine detection called
Image acquisition and pre-processing
The images captured by the bottom camera of the AR-Drone have a poor resolution of 320 × 240 pixels. That limitation causes several issues regarding the feature extraction, matching and image warping. To address this problem, we used the unsharp mask technique, which improves on image sharpness by increasing the contrast based on comparing the original image with a high-pass image, as described in (1),
where ³ is the original image, λ is a scale factor that controls the contrast level and
Registration
Once the images are preprocessed, we apply an alignment technique based on local features aimed at extracting any relevant information from the images. To do this, we implemented an algorithm for extracting those features, matching key points to finally obtain a transformation model. To extract the relevant features from the image, we used the FAST method as a detector (viz., the features from accelerated segment test) [29] and the BRISK method as a descriptor (viz., binary robust invariant scalable key points) [23]. These methods were selected because they enable us to find a greater number of matched features and also use less memory space.

Algorithm 1 describes how we extract relevant features from each image. Firstly, a non-parametric KNN classifier and an optimization algorithm called Flann are used to find all the matching images. Secondly, an interactive method called RANSAC is used to verify whether the matching and selection have provided an accurate set of features compatible with the transformation model.
Therefore, the homography matrix is calculated using the navigation data of the drone (altitude, camera location (
In (2) and (3),

Features Extraction process
Note that
Blending
Blending is used to concatenate or merge the images to build the panorama. To correctly merge the images, we implemented a method that corrects the perspective transformation and projects onto a planar surface. The homography matrix is used to update the camera parameters and the rotation matrix in order to perform the correction of perspective. Hence, the image points are mapped to a planar surface in order to take into account the geometrical motion produced by the UAV's camera. The algorithm applies the aforementioned process to each image and, at the same time, constructs a single panorama image. To reduce noise, we used a mask that distributes the information of the shared regions between the set of images. We preferred using a mask rather than a Gaussian filter to ensure that we would not lose any image information during the iterative process. Algorithm 2 explains the blending step in more detail.
In this paper, we used an open source ROS package called

Blending step
Thus far, we have created several environments to emulate some of the types of minefields that might be found in Colombia, including different vegetation densities and soil topologies. Figure 7 details some of the scenarios. The 3D models were created in the Google SketchUp software package, but some of the objects within these worlds, such as rocks and plants, were downloaded from the 3D Warehouse-SketchUp database. We included these scenarios within the

Simulation scenarios: a) short green grass, b) a deserted area, c) abundant vegetation
As mentioned at the beginning of this article, landmines in Colombia are mostly hand-crafted and partially exposed on the terrain's surface so that they can be triggered. The military reports that tuna cans are one of the materials used by rebels for housing explosives. In this section, we present our experimental results after collecting data during six months of trials. Overall, we analysed 28029 images captured by the drone during flight, and the drone covered three type of terrain under different weather conditions (see Table 1). On average, the drone covered terrain areas ranging from 15
(
(
The types of terrain used for the experiments
The types of terrain used for the experiments
Figure 8a shows the trajectory followed by the drone while covering an area of 80

a) The 3D trajectory of the quadrotor: a covered area of 80

Landmines detected during the detection and geo-location mission
(detection) Summary of the overall data obtained during six months of trials. Both the TPR metric and the FPR metric condense the experimental results of the system while featuring diverse weather conditions and different types of terrain.
The ROC curve corresponds to that shown in Fig. 10a
It summarizes six months of trials. The ROC curve corresponds to that shown in Fig. 10b
Detection data classified per terrain type. The ROC curve corresponds to that shown in Fig. 10a.
We used the ROC curve to calculate the percentage of landmine detection. The ROC curve measures the sensitivity of the detection algorithm by computing the fraction of true positives out of the overall real positives, and the fraction of false positives out of the overall real negatives.

a) TPR: the ROC curve calculated for each type of terrain described in Table 1; b) TPR: the ROC curve calculated with the overall data obtained during six months of trials

Experimental results of terrain map generation based on image stitching: a) the assembled mosaic with geo-located landmine-like objects (red circles), b) output map showing the covered terrain and the drone's trajectory. Left-to-right: (i) Terrain type 1 (flying altitude: 4.5
Equations (13) and (14) describe how to obtain both measurements. The parameter TN represents the number of true negatives, TP the true positives, EP the false positives and FN the false negatives. Table 2 summarizes the overall performance of the system after analysing 28029 images similar to those shown in Fig. 9. As shown in Fig. 10b, the overall TPR measurement of 0.9416 is located at the top left-hand border of the ROC space, which demonstrates the accuracy of the detection algorithm. In the ROC space, an area of 1 represents a perfect test whereas 0.5 represents a worthless test.
Figure 11 shows the experimental results regarding terrain map generation based on the image stitching method described in Fig. 5. The column (a) of Fig. 11 depicts the assembled maps for each terrain type and the column (b) shows the trajectories followed by the drone. The flight data is listed in Table 4. Unlike the experiments regarding the landmine-like detection, in which the flight altitude was set at about 1

a) Terrain type 1 at a flying altitude of 3.24
To evaluate the feature confidence of the image mosaic (terrain map), we used a ground-truth image of the terrain to be mapped. Figure 12 shows the results. In both cases, one can see that the stitching method was able to generate the corresponding map of the terrain where the features of the panoramic images corresponded to those from the ground-truth image. Despite the low resolution of the camera (320 × 240 pixels), we achieved an average feature confidence higher than 0.82 (see Table 4). Figure 13a details the terrain areas covered by the drone flying at different altitude profiles. Likewise, Fig. 13b shows how the number of matching features decreases when the altitude of the drone increases.

Covered terrain areas with different altitude profiles
The experimental results demonstrate that the proposed method for map generation and visual landmine-like object detection delivers both visually pleasing and geometrically accurate image registration. For the bulk of cases, our system was able to detect partially buried objects in different types of terrains with a percentage of detection higher than 80% (cf. Table 3). Likewise, it was found that the quality of the feature-based registration for the image stitching was highly dependent on the distribution of features within the images and the drone's altitude. Because our goal was to use a low-cost aerial vehicle, such as the AR.Drone; the speed, altitude and the area to cover played an important role for the accurate assembling of the image mosaic. The results recommend covering small areas (50
