Abstract
1. Introduction
The majority of the activities that are performed in everyday living contexts are instances of pick and place tasks. These types of tasks usually require careful planning and delicate execution, as the manipulator has to be finely aligned with the object of interest (OOI) during pick up and the destination during placement. However, depending on the constraints of the tasks and the complexity of the task environments, even the state-of-the-art planners may require a significant amount of time and computational resources to generate trajectories that will yield successful executions. On the other hand, careful observation of the recurring pick and place actions that take place in everyday scenarios led to the following conclusions, which motivated the ideas behind our contribution:
Reaching towards the OOI and/or the destination is usually done as directly, roughly, and quickly as possible. The most critical parts of a manipulation task are the moves performed within close proximities of the OOI prior to pick up or the destination prior to placement, and they need to be executed delicately.
Even though there could be infinite ways of picking up or placing an object, the general tendency is to repeat a finite and discrete number of alternatives for finely approaching a particular OOI and/or its destination. For example, a bottle is grasped from the side or on its cap, a mug is grasped from the side or on its handle, or a chair is usually grabbed from its back.
Building on these observations and conclusions, we contribute an experience-based mobile manipulation planning and execution method that allows the robot to memorize and later reuse the critical reaching moves performed within close proximities of the OOIs and their destinations. We show that this approach reduces the overall computational demand for planning as the critical manipulation regions are handled by the memorized solutions. The main role of generative planning is to move the robot towards the critical manipulation region as quickly and directly as possible. This paradigm results in considerable efficiency improvements compared to planning from scratch every time the OOIs need to be manipulated. We also observe that reiterating the moves that have been executed in the past and are known to be successful improves execution reliability. We run our experiments in a simulated setup where our mobile service robot has to pick up and transport several office and hospital objects, such as chairs, overbed tables, utility carts, and stretchers to their desired destinations while avoiding collisions in cluttered environments (Figure 1).

The omni-directional mobile robot capable of manipulating several hospital and office objects (a) is transporting a utility cart from the top right of the scene to its desired pose on the middle left of the scene, indicated as the pale green ghost shape of the cart (b)
The rest of the paper is organized as follows. Section 2 provides some brief background information on the Rapidly-exploring Random Tree (RRT) variant generative motion planning algorithms. Section 3 gives an overview of the related work while Section 4 elaborates on the contributed method. Section 5 presents the results of our extensive experimental evaluation. Finally, Section 6 summarizes and concludes the paper while pointing out some potential future work.
2. Background
In this work, we utilize a set of RRT variant algorithms, namely the original RRT itself [1, 2], RRT-Connect [3], and RRT* [4], as the generative planner component of our proposed approach. In addition to their simplicity, practicality, and probabilistic completeness property, the sampling-based natures of RRT-based algorithms align very well with the way our method defines and stores the robot's manipulation experiences.
Starting from the initial configuration

Tree construction process of the RRT algorithm [3]
As the number of nodes in the tree increases, however, finding the nearest node to the randomly sampled configuration takes longer, since the entire tree needs to be traversed. There have been attempts to address this problem by using more efficient data structures for faster nearest neighbour computation as well as heuristics for informed exploration. Utilizing a kd-tree for partitioning the configuration space and searching for the nearest neighbour on the kd-tree instead of the random tree itself results in significant speed-ups [5]. Heuristics, such as shaping the probability distribution to alter the likelihood of selecting any particular node based on the node's potential exploratory or lower cost path contributions, leads to higher quality paths [6]. In addition to improving the performance of an individual planning session, reusing previously generated plans to guide current ones proves very useful, especially in highly dynamic environments, such as robot soccer, that require intensive re-planning [7]. The plan generated in the previous iteration is used to guide the progress of the plan in the current iteration by keeping a waypoint cache with the assumption that the environment would not change significantly between consecutive iterations.
The RRT-Connect algorithm [3] uses a greedy heuristic that aggressively tries to connect the tree to the samples by moving over larger distances during configuration space exploration and tree expansion. Instead of performing a single step that extends the tree some
Even though the RRT algorithm rapidly explores the configuration space and eventually reaches the goal, it does not guarantee that the solution path is an optimal one. The RRT* algorithm [4], on the other hand, focuses on the optimality of the paths generated by the RRT methodology. Each RRT tree node stores its distance from the start node in terms of path length, and instead of looking for the closest single node to the sampled configuration, RRT looks for a set of
In Section 4.2, we elaborate on how we utilize these planning algorithms in our framework while improving their performances via experience guidance and reuse.
3. Related Work
It has been shown in general problem solving domains that reusing past experience stored as derivational problem solving episodes improves planning and execution efficiency [8–10]. The reuse of previously constructed paths or motion segments has also been investigated in various forms in the motion and manipulation planning literature. In this section, we review the most recent related studies.
The adaptive motion primitives concept introduced in [11] allows the combination of predefined multi-dimensional actions with the primitives that are generated on the fly via analytical solvers during plan graph construction and search processes. Planning efficiency is improved by initially planning for only 4 out of 7 degrees of freedom (DoF) of the manipulator, and then switching to full 7 DoF planning towards the end. This hybrid and multi-resolution approach resembles the method we present in this paper, where the fine manipulation region is reached roughly and the critical motions are performed delicately within that region.
The Lightning framework [12] utilizes stored end-to-end paths in addition to running a generative planner to plan the required motion for a given problem. The processes of planning from scratch and looking for a similar stored path that can be repaired and reused for the given problem are run simultaneously and the path returned by the process that finds the solution first is used. The similarity of a stored path to the given situation is determined by comparing the start and end points, and Bi-directional RRT (BiRRT) is used to repair infeasible paths by filling in the gaps caused by the obstacles along the path. One of the uses of the generative planner component of our contributed method is to bridge the gaps along the blocked sequences in a similar manner.
The Learning from Demonstration (LfD) concept [13] is utilized in [14] to simplify programming for pick and place tasks. Using a magnetic tracker to follow the index finger of the human teacher, static trajectories are demonstrated to an industrial manipulator equipped with a vacuum gripper to pick and place objects with certain shape constraints in an obstacle-free tabletop manipulation scenario. The captured trajectories are transformed to the robot's frame of reference and segmented to extract task primitives, which are then translated into robot-specific codes to be executed in order to replicate the demonstrated trajectory. Even though it may be suitable for factory environments, this approach lacks flexibility and has limited scalability as it requires new demonstrations for each new task environment configuration that could potentially include static obstacles.
The demonstration-guided motion planning (DGMP) framework [15] combines LfD with motion planning, where the demonstrated motions are recorded relative to the robot's torso as well as the OOIs in the task environment, and dynamic time warping (DTW) [16] is used for aligning multiple demonstration sequences. Implicitly encoded constraints, such as keeping a full spoon level to the ground to avoid spilling, are automatically extracted from the task execution sequences by looking at the low variance portions of the data. Recorded and processed trajectories are reused in relatively similar scenarios to the ones used for learning, and generative planning is utilized to bridge the gaps when obstacles are present along the way. This algorithm also stores full end-to-end paths, as in [12].
In order for the full length paths to be meaningful for reuse, a large number of them covering various situations should be stored. This is one of the aspects where our proposed approach differs from the methods presented in the literature. Instead of storing a large set of complete end-to-end solutions, our algorithm stores only a few partial relative trajectories that cover the critical regions around the OOIs and the destinations so that these partial executions can be reused in any scenario independent of the actual configuration of the environment.
4. Experience-Based Mobile Manipulation
Pick and place tasks require planning and execution of delicate reaching moves. As the manipulator approaches the OOI and/or the destination, these moves become more important and critical to the success of the manipulation task. Likely due to that criticality, humans exploit a small set of those target-specific delicate reaching and manipulation moves when performing everyday pick and place tasks even though there are infinite ways of reaching for and manipulating objects [17]. Inspired by these observations, we develop the following components and bring them together harmoniously to achieve experience-based prehensile mobile manipulation:
A set of local reaching moves represented as sequences of state-action pairs that have been successfully performed in the past,
A generative planner to bridge the gaps at various stages of planning and execution,
An execution monitoring system that ensures accurate reiteration of the memorized sequences.
In the remainder of this section, we elaborate on how the target-specific local reaching moves (i.e., sequences) are defined and acquired, how the stored motion sequences are reiterated in such a way as to obtain the previously experienced results, and how the solutions provided by the generative planner are merged with the available sequences so that they complement each other effectively to yield faster and more reliable executions.
4.1 Motion Sequences for Delicate Reaching and Manipulation
Having observed the critical yet repetitive nature of the finite sets of target-specific delicate reaching and manipulation moves, we let our mobile manipulator robot simply
It is necessary to define a compact yet extensive representation for these moves in order to address several issues simultaneously. For instance, in some cases, as in our problem domain, these moves may also have additional pose and dynamics constraints; that is, a chair, a utility cart, or a stretcher can only be reached and grabbed when approached from behind within certain orientation and velocity limits, otherwise they get bumped into and pushed away without a successful grasp. Therefore, it is important for the robot to capture the velocity profile of the motion as it performs those delicate moves and memorizes them. Another important point is to make these fine trajectories independent of the target's global pose in order to maximize their utilization through potential reuse in various environmental configurations. This is achieved by defining and storing the trajectories with respect to the target's frame of reference; that is, if the robot is to pick up the OOI, then the trajectories are in the OOI's frame of reference, and if it is to place the OOI, then the trajectories are in the destination's frame of reference.
Formally, our algorithm attaches a static global frame of reference, ϕ
where

Visualization of (a) pick and (b) place sequences associated with the utility cart object and its potential destination
As the number of stored sequences and their lengths grow, processing efficiency and scalability starts to become a problem, especially if the sequences are being recorded at each step of the robot's perception cycle, which in our case corresponds to 30Hz. To address this problem, our algorithm sparsifies the sequences by granting access to them at every

Visualization of the sequences around the chair object and their
Since the sequences are defined relative to the target, some of the entry points may not be reachable due to direct obstruction or potential collisions depending on the target's global pose and the state of its immediate surrounding. One important role of the entry points is to provide a way to sparsely (and hence quickly) check for collisions along the sequences to figure out which portions of them are usable for any given environment configuration. For this purpose, each entry point features a binary flag indicating it
Once reached (Section 4.2) and merged through an entry point with index β

The robot reiterates a collision-free sequence simply by executing the motion commands associated with each frame of the sequence one after the other. Passed entry points are marked as
4.2 Generative Planner
In our contributed method, the sequences representing local fine manipulation solutions are used analogously to “cases” in a Case-Based Reasoning/Planning (CBR/P) system [8–10, 18, 19]; that is, the robot knows how to handle the rest of the problem when it recalls the case. However, since our “cases” (i.e., sequences) are already fine-tuned solutions, we do not attempt to adapt them to the current problem configuration at hand as would be done in the
Planning from scratch is only needed to navigate the robot along a collision-free path towards the sequences as the sequences take care of the delicate moves that need to be performed within the close proximity of the target to achieve the task. For that purpose, we utilize sampling-based generative planning, as the underlying representation aligns well with our definition of the robot's fine reaching and manipulation trajectories. We specifically use a set of RRT variant algorithms, namely the original RRT itself [1, 2], RRT-Connect [3], and RRT* [4], the working principles of which are provided in Section 2. As mentioned in Section 2, it is possible to bias the growth of the RRT towards the goal by sampling the goal pose with probability
We define the following strategies for utilizing the feasible entry points to guide the generative plannaing process.
When these various guidance strategies are enabled, we observe a considerable reduction in the number of generated RRT nodes and more direct paths towards the fine manipulation regions around the targets, as we show in Section 5 as part of our extensive experimental evaluation. Algorithm 2 provides the pseudocode of our experience-based sampling function that utilizes the stored sequences based on the aforementioned strategies. This function replaces the
Generative planning is not only used for navigating the robot towards the sequences that are far away, but also when the robot is reiterating a sequence and the upcoming entry point is observed to be infeasible (i.e., obstructed). In those cases, the generative planner guides the robot around the obstacle to merge into another feasible segment and proceed with the execution of the corresponding sequence, as was done in [12, 15] to bridge the gaps along the stored trajectories. Figure 6 illustrates this phenomenon. As mentioned before, and shown in the figure, infinite planning and execution loops are prevented by marking the visited entry points as infeasible (red).

Reiteration of a sequence that is blocked by an obstacle. When the robot detects the upcoming entry point along the path to be
4. 3 Execution Monitoring
Uncertainty is abundant in the world of robotics. Due to uncertainty in sensing and actuation, there may be discrepancies between the expected outcome of a particular action and the actual observed one. Therefore, actively monitoring the robot during task execution is necessary in order to detect and handle problems caused by various sources of uncertainty that may be rooted in the task environment as well as the robot itself [21].
In order to achieve the task, we need to ensure that the sequences are reiterated as originally provided, mainly because those delicate reaching and manipulation moves are acquired and stored without any generalization. For that purpose, the robot keeps track of its execution by comparing its currently observed state to the expected one as it passes through each entry point while reiterating a sequence. The sequences provide the robot with the information that the observed state should be
5. Experimental Evaluation
We conducted extensive experiments in the Webots
1
mobile robot simulation environment [22], where we modelled our omni-directional mobile manipulator robot and several passively-mobile, manipulable office and hospital objects shown in Figure 1. The robot has a footprint of roughly 0.27
For the acquisition of the fine reaching and manipulation sequences, we followed a LfD approach and joysticked the robot to demonstrate to it how to pick up and place the manipulable objects available in the simulated environment. It would also be possible for the robot to acquire this experience through self exploration. Through these demonstrations, we provided the robot with four pick and four place sequences per OOI. Based on the dimensions of the objects in our task environments and the level of motion precision expected from the robot, we empirically decided to sparsify these sequences by defining entry points at every 30
In utilizing the sequences for guiding the generative planning process, we set the bias for the entry points and the sequences to be
Figure 7 provides a visual overview of the effects of various subgoal utilization strategies explained in Section 4.2 on the generated RRT branches (orange) as well as the solution paths (blue) for navigating the utility cart object from the upper right corner of the environment to the desired destination located at the bottom left corner. In these preliminary runs, the spread of the branches, and hence the number of nodes, seems to decrease while the directness of the solution paths increases with the increased utilization of the sequences. This kind of effect is expected as the attraction of a fine manipulation region increases with the incorporation of more of the individual entry points into the planning guidance process. The numerical details of these five preliminary runs are provided in Table 1, where a correlation between the number of nodes and the planning time can be observed.
RRT planning statistics for various subgoal utilization strategies, the outcomes of which are visualized in Figure 7. 38 out of 41 subgoals were feasible in this scenario

Different ways of combining the RRT generative planner and the sequences to achieve the task; (a) using none of the keyframes as subgoals, (b) early termination by coincidental reaching, (c) sampling subgoals within goal probability, (d) sampling sequences, and (e) sampling individual subgoals. The general tendency is observed as being towards a reduced number of generated nodes, hence shorter planning times, and an increased directness of the generated paths with the increased utilization of the entry points as subgoals.
In order to thoroughly investigate how our experience-guided mobile manipulation approach performs under various conditions, we ran separate pick up and placement planning tests in five randomly configured task environments with four manipulable objects: a chair, an overbed table, a utility cart, and a stretcher (shown in Figure 1(a)). Due to the inherently random nature of the RRT-based algorithms, each of these tests were run 30 times for each combination of the three generative planners and the four subgoal utilization methods as well as the base case of planning only for reaching the actual goal (the “none” case). 2
The box plots shown in Figure 8 present the planning performances obtained with each of these combinations for picking up and placing the chair object. Similar relative performances were observed for the other manipulable objects. Even though we present the number of nodes generated in the planning process as the measured metric, it is an implicit indicator of the relative time requirements of each of those combinations as the planning time decreases with the decreasing number of generated nodes in RRT-based planners, as previously shown in Table 1.

Planning performance for picking up ((a), (c), and (e)) and placing ((b), (d), and (f)) the chair object in five different task environments. The performance measure is the number of generated nodes by the RRT ((a) and (b)), the RRT-Connect ((c) and (d)), and RRT* ((e) and (f)) generative planners. The relative time requirements of each combination are implicitly reflected.
Looking at these plots, we see that sampling individual subgoals and sampling sequences result in the best performances (i.e., least number of nodes), in that order, in all environments for all planners. On the other hand, sampling subgoals within goal probability has a varying relative performance depending on the environment and the planner. It performs the poorest in the majority of the task environments for the RRT and RRT* planners. Considering its definition, we see that this strategy actually decreases the probability of individual entry points being sampled, since it essentially involves a two-stage process where the rarely occurring decision of sampling a goal is followed by which (sub)goal to sample. Therefore, instead of consistently directing the planner towards a (sub)goal in each rare occasion of deciding to sample a goal, it distracts the focus of the generative planner, diminishing the attractive properties of the fine manipulation region. In case the RRT-Connect planner is used, however, this strategy performs the third best in the majority of the task environments, though not significantly better than the base case. This effect can be explained by the way the RRT-Connect algorithm works, which is directly reaching the sample unless an obstacle is encountered. Therefore, once a (sub)goal is sampled, the algorithm tries to reach it along a straight line without any distraction or divergence. Coincidental termination has a comparable performance to the base case due to the very few number of sparse sequences, hence only a few alternative goals to reach coincidentally. Better performances with this strategy are observed when the number of sequences and their densities are increased.
Figure 9 provides sequence utilization statistics of the tests presented in Figure 8. The bars with the same colours as the ones in Figure 8 indicate in what percentage of the several planning trials with each combination the planner eventually ended up reaching an entry point. We see that sampling individual subgoals and sampling sequences have high sequence utilization percentages, which is expected. Even the coincidental termination strategy seems to be making decent use of the sequences. Although the percentages reported for the strategy of sampling subgoals within goal probability seems to be unexpectedly high, this situation has a perfect explanation. Since this strategy combines the actual goal with the set of subgoals and picks it with a probability of

Sequence utilization while planning and executing pick ((a), (c), and (e)) and place ((b), (d), and (f)) tasks for the chair object in five different environments. Utilization percentage is measured for the RRT ((a) and (b)), the RRT-Connect ((c) and (d)), and RRT* ((e) and (f)) generative planners. (f)) generative planners.
Another experiment we conducted was a comparison of the effectiveness of utilizing every feasible segment of the sequences and hopping among them during reiteration to discarding the segments from the beginning of the sequences all the way to the last occluded part, and utilizing only the remaining segments for planning and reiteration. We ran the complete task of picking up, navigating, and placing a particular OOI 10 times with each of the three generative planners for both cases, and measured the task completion times for each run. Table 2 provides the total number of entry points on the pick and the place sequences of the particular OOI used in this experiment as well as the amount utilized in different stages of the manipulation task, from where it can be inferred that the sequences were severely obstructed. Figure 10 illustrates task completion time statistics obtained for both scenarios. Looking at the mean values, marked with stars, we see that utilizing all feasible segments of the available sequences helps the robot perform better. The reason for greater variance is that sometimes the robot merges into a segment that is obstructed along the way to the goal and it has to hop to other feasible segments to proceed, whereas sometimes it ends up on an unobstructed segment of a sequence that leads it directly to the goal. On the other hand, when the blocked segments are discarded, the robot can only end up on a segment unobstructed all the way to the goal if the planner does not take it directly to the actual goal itself. When all feasible segments are utilized, due to the greater attraction effect created by the greater quantity of the entry points, the robot plans a more direct path more quickly, and hence gets to the fine manipulation region earlier, and completes the task in less time.
Number of entry points used when partial sequence segments are utilized versus when they are discarded

Durations for the complete pick and place task when the feasible segments of the partially occluded sequences are utilized in planning and execution versus when the segments from the beginning of the sequences to the occluded part are discarded
The last experiment we conducted was to demonstrate the advantage of utilizing the attraction of a fine manipulation

Visualization of the stretcher placement scenario where the direct path to the actual goal pose is blocked
We experimented with each combination of the three generative planners and the four subgoal utilization strategies as well as the base case with an increased goal bias. We ran 30 tests for each of these combinations. The sampling probability of the single actual goal used in the base case was set to be equal to the highest sampling probability of the fine manipulation region obtained with the individual subgoal sampling strategy through adding the sampling probabilities of individual subgoals

Planning performance using various subgoal utilization strategies versus merely increasing the goal bias in the scenario where the direct path to the goal is blocked (Figure 11)
6. Conclusion and Future Work
The delicateness and precision required by the pick and place activities performed in everyday living contexts may strain even the state-of-the-art planners, demanding a significant amount of time and computational resources to generate successful solutions. However, careful examination of such tasks reveal recurring manipulation patterns, which usually occur within the close vicinity of the object and the destination. Fine manipulation within those regions is critical to the overall success of the task.
Motivated by this observation, we contribute an experience-based mobile manipulation method where the robot memorizes a small number of critical yet recurring target-specific fine reaching and manipulation moves as state-action sequences, and reuses them whenever possible to guide manipulation planning and execution. We show through extensive experimentation that this guidance helps reduce task completion times considerably when combined with a sampling-based generative planner, while increasing the chance of successful task completion by carefully reiterating previously executed and known-to-be-successful fine moves. Our approach harmoniously combines the already available partial plans and executions with those generated from scratch, yielding fast, reliable, and repeatable solutions.3
Application of the proposed approach to both prehensile and non-prehensile manipulation problems in higher dimensional setups, handling uncertainty in perception explicitly in addition to monitoring the overall execution, accumulating new experiences over time, and transferring learned delicate reaching and manipulation sequences among objects with similar properties are some of the potential research problems to tackle in the future.
