Abstract
Keywords
1. Introduction
There is increasing need for autonomous robots in hazardous environments, such as disaster sites and nuclear plants, as well as in inaccessible areas such as volcanoes and for space missions. Exploration is essential for robots that are required to move autonomously in novel environments. Therefore, developing efficient strategies for exploration is both interesting and important. Map information is important for path planning and task execution since the availability of a map increases the speed with which the robot can reach areas of interest in the environment. It is important to define the goal and evaluation criteria to judge the exploration performance. Intuitively, the objective of exploration is to gain the maximum amount of accurate information about the environment - represented by the explored space completeness - in the shortest time and over the minimum distance travelled.
1.1 Robot exploration algorithms and backtracking techniques
Exploration is usually made using a greedy strategy that plans one step ahead by determining the next-best locations - called ‘frontiers' - which maximize the acquired information. One of the most popular frontier-based exploration techniques was developed in [1], in which the frontiers are defined as the boundaries between free and unexplored areas. Approaching those frontiers enables the acquisition of more information about the unknown environment. Frontier-based exploration [1] is common to almost all exploration techniques and, depending upon the frontier selection mechanism, the existing techniques can be broadly classified into three categories, namely: optimal-frontier, behaviour-based and randomized motion techniques.
In optimal-frontier exploration, the next frontier is selected based upon a cost function. In [1], this function was selected to be the shortest distance required to reach each of them. The cost function may take other forms. For example, in [2], two criteria were considered in the evaluation, namely: the travelling cost required to reach a frontier and the expected information gain when performing a sensing action at that frontier. The cost function was controlled by three parameters: the distance cost, the expected utility and localizability (the latter of which is defined by the suitability of a frontier to enhance the robot localization when reaching that frontier, as described in [3]).
Optimal frontier strategies have two common problems. First, due to continuous map updates, the currently approached region might become fully explored during navigation before reaching the destination frontier. In this case, the robot will start to explore the next unknown region. This problem occurs when using sensors with a wide perceptual range. Second, the optimal frontier can lie outside the room being explored. This situation may cause the robot to explore the same room twice, with unnecessary long distances. Repetitive re-checking of the frontier during navigation and segmentation of the partially built map were suggested in [4] to solve the first and the second problem respectively. The segmentation process separates rooms from each other causing the robot to favour visiting frontiers lying inside the currently explored room. However, although such a solution works well in office-like environments, open environments cannot be reasonably segmented, which may impair performance.
The exploration process could be decomposed into smaller, simple, reactive tasks, which leads to the use of behaviour-based exploration approaches [5–11]. The exploration task is divided into simple simultaneous actions. Such actions or behaviours involve repulsive and attractive forces, such as avoiding obstacles and reaching targets, respectively. For instance, in [6] a behaviour-based approach combined three reactive behaviours for exploration, namely: “reach frontiers”, “avoid obstacles” and “avoid other robots”. Another example in which a simple wall-following behaviour as a reactive model led exploration was proposed in [7]. In [8], repulsive behaviour from previously visited areas was used for exploration. In [9, 10], a complex behaviour architecture was proposed in which a combination of several weighted behaviours were fused together for efficient exploration. These behaviours can be generally formulated as: “go to frontier”, “go to unexplored areas”, “avoid other robots” and “avoid obstacles”. However, reactive systems do not perform well in large and complex environments [11]. In such environments, the forces combining those behaviours could compensate each other in a certain region, causing local minima that trap the robot at a certain point. The local minima problem is common in behaviour-based techniques, not only for exploration but also for normal navigation. This requires such systems to have a local minima detection and recovery mechanism to avoid this problem.
In randomized motion planning [12–14], robots are directed to acquire more information through random steps. Randomized increments of a data structure called ‘sensor-based random trees' (SRTs) were generated in [13]. This tree represents the roadmap of the explored area with an associated safe region that describes obstacle-free regions around the robot depending upon its sensor aspects. The nodes of this tree are the explored locations. This basic SRT strategy was later modified to enhance the process. For instance, in frontier-based SRT (FB-SRT) [14], the random selection of target points was biased towards local frontier arcs in the current safe region. This improves efficiency in terms of shorter travelled paths and greater area coverage. Furthermore, in [15] and [16] a sensor-based exploration tree (SET) was constructed for depth-first search exploration. The target configurations were selected to maximize the estimated information utility in forward mode only by calculating the expected utility along the local frontier boundary. This helps the robot to filter out any useless actions that might be executed. In these strategies (SRT, FB-SRT and SET), if there are no more areas to explore, the robot goes back through the previous nodes to find new regions and explore again. This backtrack strategy causes long exploration distances and times, especially in places with wide open spaces.
Several approaches were proposed to solve this backtracking problem. For instance, bridges were added in [17] to the exploration tree so that the robot could plan quick paths without looping all the previous nodes. A bridge was added between any two adjacent nodes with a common safe region between them and separated by a distance greater than double the sensor range. The drawback of this improvement is the possibility that other nodes might be worth reaching despite having no shared safe region with the currently visited node. Ignoring such a node will reduce the area covered by the robot. In addition, in [18] a short path was planned between the current node without additional information and the initial node assuming that no information will be gathered from the nodes between them. This hypothesis works well in corridor environments rather than wide spaces and office-like environments.
1.2 Performance metrics for robotic exploration
Although robots are designed bearing in mind their working environment, there is no efficient way to compare robot performances in different places. One way to do this is to run a series of simulations of robots working in different structures. However, this is time-consuming and requires much effort. Several complexity metrics (CMs) had been introduced [19–22], namely: a space syntax, entropy, and obstacle distribution and compression. Those metrics are suitable for path planning rather than exploration. The space syntax method [19], which is concerned with the connectivity of environmental features rather than distances, uses a labour-intensive axial map to measure environmental complexity. Thus, it is not suitable for exploration. Entropy is regarded as a measure of uncertainty in the working environment [21, 22]. The more free spaces there are, the more decisions the robot should make to reach its target and the more time will be taken. Hence, entropy is a measure of the environmental complexity if the robot wishes to reach a particular goal. For exploration, there is no inaccurate decision, since any given decision will have benefits for exploration. Additionally, the sensor details are not considered in the entropy computations. Therefore, entropy is not an adequate measure of environmental complexity from the exploration point of view. Since the distribution of obstacles in an environment affects the navigation task's complexity, the environment is identified by a unique factor called ‘the compression factor’ [22]. This factor measures the repeatability of patterns of obstacles in a certain environment, and hence the environment's complexity. In other words, if the obstacles have repeated patterns and - in turn - can be easily described, then this environment has a low CM, and vice versa. Even though it considers the sensor properties, it is computationally expensive and not suitable for measuring the environmental complexity for the exploring robot. Therefore, there is no complexity measure in the published literature suitable from the exploration point of view, to the best of our knowledge.
Although sensor-based exploration enjoys simplicity and completeness, it can take long time and involve a greater distance, mainly due to its backtracking strategy [18]. Suppose that there are no frontier areas during exploration, then the robot must go back to previous locations until there exists a frontier area. In some situations, the robot can travel a significant distance until it reaches a position which has frontier regions. In this paper, which extends our preliminary work in [30], a novel, simple yet effective backtracking technique based on a heuristic algorithm is proposed. This new algorithm is based on the selection of the most informative node to approach directly rather than travelling across all previous nodes in order. The most informative node is determined using the ray-casting algorithm. The proposed technique can be regarded as a combination of the optimal frontier and randomized motion planning strategies. Random exploration planning is applied in forward mode, while the optimal node is approached in backward mode. The new technique is suitable for time-critical applications, such as rescue applications. Another contribution of this paper is in devising a new evaluation index
This paper is organized as follows: In Section 2, the basics of the basic SRT exploration strategy are briefly outlined. In Section 3, the new exploration technique using the proposed heuristic backtracking algorithm is described. The proposed environmental CM and
2. Sensor-based random tree exploration
The SRT exploration technique [13] is based on a random selection of robot configurations

Validation of different candidate configurations in SRT:
Algorithm 1: A pseud,-code for the basic SRT algorithm
3. A new heuristic backtracking algorithm
The following are the explicit assumptions for the new exploration approach:
Robot localization is provided by a separate module.
The exploration environment is planar, i.e.,
The exploration environment is static, i.e., it consists of unchanging surroundings in which the robot explores.
The robot is holonomic, i.e., it can turn in any direction.
In SRT, when there are no more valid configurations to reach, the robot traverses the parent node of
The basic idea for enhancing backtracking in SRT is shown schematically in Fig. 2. In the basic SRT strategy, as shown in Fig. 2(a), when there are no new areas the robot backtracks to all the previous nodes until exiting the currently explored room. While using the proposed approach, the robot searches for the most informative node, as shown in Fig. 2(b), to which the starting node is assumed as a new starting node. A shortest path - with the help of the built LSRs - is then planned to approach this node of interest, saving both distance and time.

A sketch showing a robot exploring a room using: a) the basic SRT strategy, and b) the proposed approach, where a shortest path, in green, is planned to the most informative node
3.1 Map building
A spatial representation for the unknown environment is required for two tasks: environment monitoring (such as in surveillance applications) and helping the robot to navigate the environment in backtracking mode. In this paper, the occupancy grid-based map is used for this representation. The environment is divided into small grids, each containing a value representing the probability of being occupied by obstacles. It is necessary to know for each cell whether it is unknown, free or an obstacle. Initially, the map is assumed to be unknown. Given a range scan and the robot pose, the occupancy grids within the sensor range are updated as follows. Firstly, scan readings are converted into Cartesian coordinates of the occupancy grid map, creating a polygon of points. Secondly, this polygon is identified as the LSR and filled using the flood-fill algorithm. Thirdly, obstacle cells are identified by the sensor readings that are lower than the maximum sensor range

Building an occupancy grid map using scanner range data
Algorithm 2: A pseudo-code for the proposed heuristic backtracking algorithm.
3.2 The most informative node
In the proposed approach, when there is no valid configuration to reach, the robot selects the most informative node among the previous nodes. This informative node is expected to have information gain (in terms of free cells) greater than the threshold

Estimation of information gain at different configurations by the use of a ray-casting technique
Algorithm 3: Ray-Casting algorithm to estimate the most informative node.
After identifying the most informative node, a shortest path is planned using the
4. The proposed complexity metric and evaluation index
4.1 Environmental complexity metric
Robots are tested in different environments with different degrees of complexity, which is problematic for comparing the performance of different exploration strategies. Therefore, it is useful to develop a metric that can quantify environmental complexity. This will help to reduce the effect of the environmental complexity on the performance comparison of the different exploration strategies. The available complexity measures, such as space syntax, entropy and obstacle compression, are measures related to path planning rather than robot exploration. From the path-planning perspective, free areas mean more choices for the robot to reach its target, and hence a more complex environment. On the other hand, from the exploration perspective, free areas mean more information to be acquired about the unknown environment, and hence imply a less complex environment. As such, in this paper a novel environmental CM for the exploration process is proposed, as follows.
Intuitively, obstacles in the environment could help or hinder the exploration process. The exploration process will be efficient if the sensor range spans the entire environment. The effect of the obstacle density and distribution is illustrated in Fig. 5, where two environments - namely, a mazy environment and an open space environment - have the same number of obstacle cells but a different distribution. From the exploration point of view, the mazy environment is more complex than the open space environment. The complexity measurement could be simplified by calculating the difference between the actual number of nodes required to cover a certain environment,

Effect of obstacle distribution over complexity spaces - two environments with the same number of obstacle cells with different distributions. The first mazy environment is more complex than the open space environment
where
where
On the other hand, calculating the actual number of nodes required to fully cover the structured environment while considering the obstacles' density and distribution,
Algorithm 4: Greedy Randomized Art-Gallery Algorithm.
After acquiring the 2D model of the environment, the algorithm computes the estimated number of nodes,

The CM values, obtained over a single run, for different environments with n=100 random samples, FOV=360°, a coverage threshold.
4.2 Performance evaluation index
The performance of an exploration strategy is usually measured by four metrics, namely: the distance travelled,
where
The normalized number of nodes is made here with respect to the near-optimal number of actual nodes,
The distance travelled is normalized to the total distance required for a robot to explore the entire environment, which can be estimated using the actual number of nodes calculated. The distance between two nodes in the coverage problem based on graph theory [28] is equal to double the maximum sensor range, and so the total distance,
As such, the normalized travel distance is given as:
The normalized exploration time is given as:
where
5. Simulation results
Several simulation scenarios have been implemented to validate the proposed exploration approach. Each scenario has been identified by its environmental CM and the exploration
5.1 The office-like environment
The proposed exploration approach with heuristic back-tracking and the basic SRT strategy are implemented in the office-like environment shown in Fig. 7. The environmental CM for this environment is calculated using (1) as 0.31. Detailed simulation steps at different simulation times with the associated roadmap are shown in Fig. 8. The nodes created and the paths travelled are shown in green and blue, respectively, while the shortest paths planned by the heuristic approach are shown in red. In a basic SRT strategy, red edges mean that the robot has backtracked along them to return to its home position, which does not appear in the proposed backtracking approach as the robot is not required to go back over all previous nodes (it is just required to plan a short path to the most informative nodes). A comparison between the proposed approach and the basic SRT approach is given in Table 1 in terms of the mentioned metrics at different sensor ranges, namely 1 m and 2 m. Due to the random behaviour of SRT, values are averaged over five simulation runs with different initial robot positions. The standard deviations are also shown for the four metrics.
Simulation results of the office-like environment

Office-like environment

Simulation steps at different simulation times
It can be noted from Table 1 that there is a significant decrease in the total path length and the total exploration time for the proposed approach compared to the basic SRT approach. Furthermore, it can be observed that a significant reduction in the exploration distance appears clearly in the scenario with a smaller sensor range, which can be attributed to the high number of exploration edges required to fill the entire open space. Additionally, the proposed approach provides nearly complete coverage, which is comparable to that of the original SRT approach, while exerting less exploration effort. Furthermore, the standard deviations of the four metrics are shown to be small values, which proves the high reliability of the results. The new
Here, it is imperative to discuss the advantages of the proposed backtracking approach over the approaches presented in [17] and [18]. The backtracking approach in [17] is based on constructing a bridge between two adjacent nodes with a common safe region between them. This approach is helpful in corridor-like environments, which have a low probability of having in-between nodes with unexplored regions. Moreover, in [18] the backtracking approach is based on constructing a short cut between the initial and the most recently visited nodes without taking into account the possibility of there being a valuable node between them. This demonstrates the advantage of the proposed backtracking approach over these two approaches, since the proposed heuristic backtracking is based on approaching the most informative node which has a high probability of having unexplored regions in office-like or even in open-space environments, as illustrated in the results.
5.2 The maze-like environment
Similarly, simulations of the proposed exploration approach with heuristic backtracking and the basic SRT strategy were conducted in a maze-like environment, as shown in Fig. 9. The environmental CM for this environment is calculated using (1) as 0.42. The simulation results of the basic SRT are presented in Fig. 10(a) while those of the proposed SRT are shown in Fig. 10(b), in which the red lines represent the shortest paths followed whenever there are no valuable nodes to travel to. Table 2 summarizes the results obtained with a 2 m perceptual range averaged over five simulation runs with different initial robot positions.
Simulation results of the maze-like scenario

Maze-like environment

Simulation results for the maze-like scenario: (a) the basic SRT approach, (b) the proposed approach
Similarly, the significant benefit in the total path length and the total exploration time are shown for the proposed approach compared with those of the basic SRT approach. In addition, the proposed approach showed a higher
6. Conclusions
A novel heuristic backtracking algorithm has been developed for sensor-based random tree exploration to reduce the exploration time and the distance travelled so as to cope with time-critical applications. The new approach is based on the selection of the most informative node to approach rather than backtracking across all unnecessary explored areas. The enhancement of SRT exploration using the developed backtracking algorithm has been confirmed by conducting several simulations in different exploration scenarios. A new evaluation index has been devised to encapsulate the exploration metrics, namely the exploration time, the distance travelled, the coverage, and the number of nodes in a single number, avoiding trade-off among these metrics. This index has been shown to be representative of the exploration performance to be used for exploration comparison, especially in SRT-like algorithms. A step towards finding a unique complexity metric has also been developed, representing the environment's structural complexity from the exploration point of view. This metric reflects the difference between the ideal number of visits required to fully cover the environment and the actual number of visits required. It has been shown that this complexity metric is dependent upon the disparity among obstacles in the environment. Although the estimated number of nodes ultimately depends upon the sensor properties, such as range, field of view and angular resolution, the complexity metric is nearly independent of these properties; since these properties will be reflected in both the actual and the estimated number of nodes required to cover the environment, the complexity metric will therefore not change. In the future, we will explore the potential of the evaluation index in diverse scenarios as a step towards generalizing it for the measurement of exploration performance and comparing between techniques.
