Abstract
1. Introduction
Our goal is to enable robots to plan and execute
Forceful operations, as defined by Chen et al. (2019), are the exertion of a wrench (generalized force/torque) at a point on an object. These operations are intended to be quasi-statically stable, that is, the forces are always in balance and produce relatively slow motions, and will generally require some form of fixturing to balance the applied wrenches. For example, to open a push-and-twist childproof bottle the robot must exert a downward force on the cap while applying a torque along the axis of the bottle. The robot must be in a configuration that allows it to apply enough force to accomplish the task, and also securing the bottle to prevent it from moving during the task.
To accomplish these complex, multi-step forceful manipulation tasks, robots need to make discrete decisions, such as, for example, whether to push on the bottle cap with the fingers, the palm or a tool, and whether to secure the bottle via frictional contact with a surface, with another gripper or with a vise. The robot must also make continuous decisions such as the choice of grasp pose, robot configurations, and robot trajectories. Critically, all these decisions interact in relatively complex ways to achieve a valid task execution.
Figure 1 illustrates that there are Opening a childproof bottle involves executing a downward-push and twist on the cap, while fixturing the bottle. Our system can reason over a combinatorial number of strategies to accomplish this forceful manipulation task, including push-twisting with various parts of its end effector, push-twisting with a tool (in blue), fixturing with a vise (in gray), fixturing against the table, or fixturing against a high-friction rubber mat (in red).
Choosing a strategy corresponds to making some of the aforementioned discrete decisions, for example, deciding how to fixture the bottle. We define strategies as sequences of parameterized high-level actions. Each action is implemented as a controller parameterized by a set of constrained discrete and continuous values, such as robot configurations, objects, grasp poses, and trajectories. Our goal is to find both a sequence of high-level actions (a strategy) and parameter values for those actions, all of which satisfy the various constraints on the robot’s motions.
To produce valid solutions for a wide range of object and environment configurations, the robot must be able to consider a wide range of strategies. As discussed above, small changes, such as decreasing a friction coefficient, may necessitate an entirely new strategy. In a different environment, the robot may need to first move some blocking object out of the way (Figure 5) or relocate an object in order to achieve a better grasp. Approaches that attempt to explicitly encode strategies in the form of a policy, for example, via a finite state machine or a fixed action sequence, will generally fail to capture the full range of feasible strategies (Holladay et al., 2019; Michelman and Allen, 1994). Methods that attempt to learn such a policy will need a very large number of interactions to explore this rich and highly constrained solution space.
We propose addressing forceful manipulation problems by planning over the combinatorial set of discrete/continuous choices. We use an existing
To enable this approach to forceful manipulation, we introduce force-related constraints: the requirement to fixture objects and the
Furthermore, we enable the planner to choose strategies that are
Our paper makes the following contributions: • Characterize forceful manipulation in terms of constraints on forceful kinematic chains • Generate multi-step plans that obey force- and motion-related constraints using an existing TAMP framework. • Formulate finding plans that are robust to uncertainty in the physical parameters of forceful kinematic chains as cost-sensitive planning. • Demonstrate planning and robust planning for forceful manipulation in three domains (opening a childproof bottle, twisting a nut on a bolt, cutting a vegetable).
This paper is an extension of Holladay et al. (2021) and the additional contributions beyond that work are as follows: a new domain (cutting a vegetable), additional demonstrations relating to robust planning and more detailed descriptions of the planning framework and domain specifications.
2. Related work
This paper focuses on enabling force-based reasoning in planning multi-step manipulation tasks. In this section, we review various strategies for incorporating force-based reasoning and force-related constraints across various levels of planning: from single step actions to fixed action sequences to task and motion planning.
2.1. Force-Reasoning in single actions
Several papers have considered force requirements for generating specialized motions. Gao et al. use learning from demonstration to capture “force-relevant skills,” defined as a desired position and velocity in task space, along with an interaction wrench and task constraint (Gao et al., 2019). Berenson et al. incorporated a torque-limiting constraint into a sample-based motion planning to enable manipulation of a heavy object (Berenson et al., 2009). These papers consider force constraints when generating individual actions, while we consider force constraints over a sequence of actions.
One important category of reasoning with respect to force actions is stabilizing, or fixturing an object. The goal of fixturing is to fully constrain an object or part, while enabling it to be accessible (Asada and By, 1985). Fixture planning often relies on a combination of geometric, force, and friction analyses (Hong Lee and Cutkosky, 1991). There are various methods of fixturing including using clamps (Mitsioni et al., 2019), custom jigs (Levi et al., 2022), or using another robot to directly grasp or grasp via tongs (Watanabe et al., 2013; Stuckler et al., 2016; Zhang et al., 2019). Additionally some strategies, such as fixtureless fixturing and shared grasping, rely on friction and environmental contacts to fixture without additional fixtures or tools (Chavan Dafle and Rodriguez, 2018; Hou et al., 2020). In this paper, we consider fixturing via grasping and environmental contacts.
2.2. Force-Reasoning in sequences of actions
Several papers have considered reasoning over forces across multi-step interactions. Chen et al. define “forceful operations” as a 6D wrench
Manschitz et al. termed “sequential forceful interaction tasks” as those characterized by point-to-point motions and an interaction where the robot must actively apply a wrench (Manschitz et al., 2020). The method first learns, from demonstrations, a set of movement primitives, which are then sequenced by learning a mapping from feature vectors to activation of the primitives. The goal of their work is enable the robot to reproduce the demonstrations and therefore the sequencing is pre-scripted.
Michelman and Allen formalize opening a childproof bottle via a finite state machine, where the robot iteratively rotates the cap, while pressing down, and then attempts to lift it until the cap moves (Michelman and Allen, 1994). If the cap is not yet free, the robot continues to rotate the cap. In their formulation, the bottle is fixtured and some of the continuous parameters, such as the grasp, are fixed.
Holladay et al. frames tool use as a constraint satisfaction problem where the planner must choose grasps, arm paths and tool paths subject to force and kinematic constraints such as joint torque limits, grasp stability, and environment collisions. (Holladay et al., 2019). The planner outputs a fixed sequence of actions and thus the force and kinematic constraints do not impact the sequence of actions, that is, the choice of strategy. In contrast, in this paper, we are interested in searching over various possible strategies.
2.3. Multi-Step planning with constraints
Solving for a sequence of actions parameterized by constrained and continuous values lies at the heart of multi-modal motion planning (MMMP) (Hauser and Latombe, 2010; Hauser and Ng-Thow-Hing, 2011) and task and motion planning (TAMP) (Garrett et al., 2020a). MMMP plans motions that follow kinematic modes, for example, moving through free space or pushing an object, and motions that switch between discrete modes, for example, grasping or breaking contact, where each mode is a submanifold of configuration space.
TAMP extends MMMP by incorporating non-geometric state variables and a structured action representation that supports efficient search (Gravot et al., 2005; Plaku and Hager, 2010; Kaelbling and Lozano-Pérez, 2011; Srivastava et al., 2014; Toussaint, 2015; Dantam et al., 2016; Garrett et al., 2017).
Most TAMP algorithms, although not all (Toussaint et al., 2020), have focused on handling collision and kinematic constraints. This paper focuses on integrating force-based constraints with an existing TAMP framework, PDDLStream (Garrett et al., 2020b), which we discuss in Section 6.
Most similar to our work, Toussaint et al. formulate force-related constraints that integrate into a trajectory-optimization framework (LGP) for manipulation tasks (Toussaint et al., 2020). While LGP can search over strategies, in the aforementioned paper the strategy was provided and fixed. While Toussaint et al. take a more generic approach to representing interaction which can capture dynamic manipulation, their use of 3D point-of-attack (POA) to represent transmitted 6D wrenches, fails short of representing the frictional constraints of the contact patches.
Levihn and Stilmann present a planner that reasons over which combination of objects in the environment will yield the appropriate mechanical advantage for unjamming a door (Levihn and Stilman, 2014). The type of the door directly specifies which strategy to use (lever or battering ram) and the planner considers the interdependencies of force-based and geometric-based decisions for each application. The planer is specific to the domain and does not provide a general framework for planning with force constraints.
Several other systems consider forces either as a feasibility constraint or as a cost. In assessing feasibility for assembly plans, Lee and Wang (1993) account for the amount of force required to connect two pieces in an assembly. Akbari et al. (2015) focus on incorporating “physics-based reasoning” in a TAMP system that sequences push and move actions by formulating action costs with respect to power consumed and forces applied. Again, each of these planners present a domain-specific approach to considering force.
In this work, we consider generating plans that are robust to state uncertainty, with a particular focus on physical properties of the objects. Several TAMP frameworks approach uncertainty by planning in belief space, the space of probability distributions over underlying world states (Kaelbling and Lozano-Pérez, 2013; Hadfield-Menell et al., 2015; Garrett et al., 2020c). These planners generate action sequences that can also involve information-gathering sensing actions. In contrast, our system executes open-loop plans and focuses on uncertainty that impacts the force-related constraints.
3. Problem domain
We define
Specifically, we constrain that the robot, including any grasped object, must be “strong” or “stable” enough to exert the forceful operation, that is, the robotic system must be able to exert the desired wrench of the forceful operation without experiencing excessive force errors or undesired slip. We additionally constrain that any object that the forceful operation is acting on must be secured, or fixtured, in a way that prevents its motion.
Our aim is to perform forceful manipulation tasks in a wide range of environments, where there is variation in the number, type, poses, and physical parameters, such as masses and friction coefficients, of the objects. We also characterize
To ground our work in concrete problems, we consider three example tasks within the forceful manipulation domains: (1) opening a childproof bottle (2) twisting a nut on a bolt and (3) cutting a vegetable. For each domain, we define the forceful operation(s) that represents the task (also called the task wrench(es)), strategies for exerting those forceful operation(s) and what object must be fixtured, We then discuss various methods of fixturing objects.
3.1. Childproof bottle opening
In the first domain, the objective is to open a push-and-twist childproof bottle, as introduced in Section 1. We specify the push-twist, required before removing the cap, as the forceful operation of applying wrench (0, 0, − Top Left: Opening a childproof bottle involves executing a push-twist on the cap, while fixturing the bottle. Top Right: Twist a nut requires exerting a torque about the nut, while fixturing the bolt. Bottom: To cut, the robot first press down vertically and then slices horizontally. The object being cut must be fixtured.
3.2. Nut-twisting
In the second domain, the robot twists a nut on a bolt by applying the wrench (0, 0, 0, 0, 0, To twist a nut on the bolt, the robot can use either its fingers or a spanner (in blue). While twisting, the robot must fixture the beam that the bolt is attached to. Here we show two fixturing strategies: using another robot to grasp the beam and weighing down the beam with a large mass (in green).
We do not consider the more general task of twisting a nut
3.3. Vegetable cutting
In the third domain, the robot uses a knife to cut a vegetable (as shown in Figure 4). Cutting is a complex task that involves fracture, friction and changing contacts (Jamdagni and Jia, 2019, 2021; Mu and Jia, 2022). There have been a variety of approaches to tackling this cutting-edge topic such as learning the task-specific dynamics (Mitsioni et al., 2019; Zhang et al., 2019; Rezaei-Shoshtari et al., 2020), developing specialized simulators (Heiden et al., 2021), and proposing adaptive controllers (Zeng and Hemami, 1997a; Long et al., 2013, 2014). In this work, we adopt a simplified approach to cutting. The robot uses a knife to cut an object, while fixturing the object. Here, a vise is used to fixture a cucumber (left) and a banana (right) while they are being cut with a knife (blue).
We take inspiration from Mu et al. (2019) and assume that, while being cut, the object will have negligible deformation and that dynamic effects are insignificant. Similarly to their proposed cutting process, we formulate cutting as a two-stage process where the knife first exerts downward force, followed by a translational slice. Thus, this task has two forceful operations: the downward force (0, 0, −
3.4. Fixturing
While performing any forceful operation, the robot must fixture the object it is exerting force on to prevent its motion. There are a wide variety of ways to fixture, which are not unique to any particular domain. In this paper, as shown across Figure 1, Figure 3, and Figure 4, we present several different fixturing methods such as • Grasping the object with another robot • Grasping the object in a vise • Weighing the object down with a heavy weight • Exerting additional downward force to secure the object with friction from a surface
For the second method, in practice we use a table-mounted robot hand as the vise. For the last method, the frictional surface can either be the table, or higher friction rubber mats and the robot can exert this additional downward force through various contacts: fingertips, a palm, or a grasped pusher tool.
4. Approach
Forceful manipulation tasks are characterized by constraints related to exerting force. We view a robotic system, composed of the robot joints, grasps and other possible frictional contacts, as a
When the robot is performing a forceful operation, we can capture whether the system is strong enough to exert the task wrench by assessing if the forceful kinematic chain is maintained, that is, if each joint is stable under the imparted wrench. We informally use the word
For each class of joint, we describe a mathematical model that characterizes the set of wrenches that the joint can resist and thus the joint is stable if the imparted wrench lies within this set. To test the constraint, the planned task wrench combined with the wrench due to gravity, is propagated through the joints of the forceful kinematic chain and each of the joints are evaluated for their stability.
Given our domain description, there are two forceful kinematic chains when performing a forceful operation: the chain of the system exerting the task wrench and the chain of the system fixturing the object. Returning to the bottom rightmost example of Figure 1, the exertion chain was described above and the fixturing chain is one joint: the vise’s grasp on the bottle.
We need a planning framework that generates multi-step manipulation plans that are flexible to a wide range of environments and that respect various force- and motion-related constraints. We opt to cast this as a
Each action is implemented by a parameterized controller and is associated with constraints relating the discrete and continuous parameter values that must be satisfied for the controller to achieve its desired effect. The discrete parameters are values such as objects, regions, robot arms, and the continuous parameters are values such as robot configurations, poses, paths, and wrenches.
Solving a TAMP problem corresponds to finding a valid sequence of actions and finding the discrete and continuous parameters of those actions that satisfy the constraints. These two problems are tightly connected, since the force- and motion-related constraints impact whether it’s possible to find a valid sequence of actions.
As an illustration of this connection, we return back to the example mentioned in Section 1, where the table top does not provide enough friction to fixture the bottle. This corresponds to a forceful kinematic chain where it is impossible to find a set of parameters to make it stable. In this case, the planner searches for a new set of actions, such as picking and placing the bottle into a vise, in order to create a different, stable forceful kinematic chain.
As another example, Figure 5 shows the robot completing a sequence of actions in order to cut a vegetable while it is fixtured in a vise. Since the red block on the vise prevents the robot from directly placing the vegetable in the vise, the robot constructs a plan to move the red block out of the way before fixturing and cutting the vegetable. The goal of the robot is to cut the vegetable, using the knife (in blue). The vegetable must be fixtured, which can be achieved using the vise. However, the robot cannot secure the vegetable in the vise because a red block is preventing a collision-free placement. Our system constructs a plan where the robot first picks up the red block and places it on the table, out of the way. The robot can then pick up the vegetable and fixture it in the vise. Next, the robot grasps the knife and uses it to cut the fixtured vegetable.
Both of these examples illustrate that the interleaved constraint evaluation and action search in the TAMP framework is critical to enabling the planner to solve the tasks in a variety of environments.
In order for plans to reliably succeed in a variety of environments, the robot must also be able to account for uncertainty in the world. In this work, we focus on uncertainty in the physical parameters of the stability models used to evaluate the forceful kinematic chain constraint. For example, if the stability of a grasp used during a forceful operation is dependent on precise value of a friction coefficient, this choice of grasp is not very robust to uncertainty.
In order to find plans that are
In this work, we assume a quasi-static physics model and, as input, are given geometric models of the robot, the objects and the environment along with the poses of each object. Estimates of physical parameters, such as the object’s mass and center of mass, and friction coefficients, are known. In robust forceful manipulation, we relax the need for exact estimates of physical parameters, instead using ranges.
5. Forceful kinematic chain
We have defined a Along each joint of the forceful kinematic chain, we first project the expected wrench into the subspace defined by each joint and then verify if the joint is stable under that wrench. The figure illustrate the wrench limits for each joint: For circular patch contacts, we check the friction force against a limit surface ellipsoidal model and for each robot joint we check against the 1D torque limits.
For joint types, we consider planar frictional joints and robot manipulator joints and define the mathematical models that characterize the set of wrenches each joint type can resist. While our treatment in this paper is limited to these joint types, alternative joints or models could easily be integrated, such as non-planar frictional joints (Xu et al. 2020).
In defining planar frictional joints, we consider an example joint: the robot’s grasp on the blue pusher tool in Figure 6. In the three directions of motion outside the plane of the grasp, the motion of the tool in-hand is prevented by the geometry of the hand, that is, we assume the fingers are rigid such that the tool cannot translate or rotate by penetrating into the hand. Thus, any wrenches exerted in those directions are resisted kinematically by non-penetration reaction forces, which we assume are unlimited. In the other three directions, motion within the plane of the grasp is resisted by frictional forces. We represent the boundary of the set of possible frictional wrenches in the three dimensional friction subspace of the plane of contact with a limit surface (Goyal et al. 1991). We utilize two ways to approximate the limit surface, depending on the characteristics of the planar joint (Section 5.1, Section 5.2).
For the robot’s joints, the set of wrenches that can be transmitted are bound by the joint torque limits (Section 5.3).
5.1. Limit surface for small circular patch contacts
For small circular patch contacts with uniform pressure distributions, we use an
Having transformed the wrench into the contact frame, we check if this wrench lies in the ellipsoid, which would indicate a stable contact:
As an example, in Figure 7 we compare two possible grasps on a knife. For each grasp, we visualize the limit surface and an exerted wrench, transformed to the contact frame. In this example we consider the first step of vegetable cutting: exerting the downward force. Here we show two possible grasps on the knife as it cuts a vegetable. For each grasp, we visualize the corresponding limit surface with the propagated task wrench. The top grasp is not stable, as the wrench lies outside the boundary of the limit surface. In contrast, the bottom grasp, which leverages kinematics to resist the large torque, is stable.
For both grasps, the friction coefficient, normal grasping force and radius of contact (
In both grasps, transforming the downward force of this cutting action into the contact frame generates a substantial amount of torque that the grasp must resist.
However, the grasp shown in Figure 7-top, which grasps the top of the handle, largely relies on frictional forces to resist this torque. We can imagine if the robot were to use this grasp, the knife could pivot in the robot’s hand as it moved down to cut the vegetable. As illustrated by the projected wrench (in red) falling outside of the ellipsoid, this grasp is unstable with respect to the forceful operation.
In contrast, in the grasp shown in Figure 7-bottom, which grasps the side of the handle, the large force and torque are largely resisted kinematically and the grasp is very stable. Again, looking at the grasp, the geometry of the fingers, rather than friction, prevents the knife from sliding or pivoting in the hand.
5.2. Limit surface for more general patch contacts
For contacts with more irregular shapes than a circle and with less uniform pressure distributions, we directly model the contact patch as a set of point contacts, each with its own normal force (localized pressure) and its own friction limits. Given a contact patch, we model the force it can transmit as the convex hull of generalized friction cones placed at the corners of the patch. Generalized friction cones, based on the Coulomb friction model, represent the frictional wrench that a point contact can offer (Erdmann, 1994). We represent the friction cone, FC, at each point contact with a polyhedral approximation of generators:
As an example, in the nut-twisting domain (Figure 3), we use the generalized friction cone to model the contact patch between the table and the beam holding the bolt, placing friction cones at the four corners of the beam. In evaluating the stability of fixturing, the beam to the table via a heavy weight, the applied normal force, determined by the mass and location of the weight, is modeled as a simply supported 1D beam with a partially distributed uniform load.
5.3. Torque limits
The last type of joint we consider are the joints of the robot, where the limit of each joint is expressed via its torque limits. We relate the wrenches at the end effector to robot joint torques through the manipulator Jacobian,
6. PDDLStream
Task and motion planning (TAMP) algorithms solve for a sequence of parameterized actions for the robot to take, also called the strategy or plan skeleton, and the hybrid parameters of those actions (Garrett et al., 2020a). The parameters are discrete and continuous values such as robots, robot configurations, objects, object poses, grasping poses, regions, robot paths, and wrenches. These parameters are subject to constraints, such as requiring that all paths are collision-free. The parameters of forceful manipulation tasks are also subject to the forceful kinematic chain constraints, which evaluates if each joint in the chain is stable in the face of an exerted wrench.
In order to find sequences of parameterized actions that satisfy a wide range of constraints, we use PDDLStream, a publicly available TAMP framework (Garrett et al., 2020b). It has been demonstrated in a variety of robotics domains, including pick-and-place in observed and partially observed settings (Garrett et al., 2020c).
In this section, we begin with some introduction to PDDL and then discuss how PDDLStream extends PDDL to enable planning over discrete and continuous parameters.
6.1. PDDL background
A key challenge in solving TAMP problems is solving for the hybrid (discrete/continuous), constrained parameters. If all of the parameters were discrete, we could apply domain-independent classic planning algorithms from AI planning to search for sequences of actions. These planners use predicate language, specifically the Planning Domain Definition Language (PDDL) (McDermott et al., 1998), to define the problem.
PDDL is inspired by STRIPS (Stanford Research Institute Problem Solver), a problem domain specification language developed for Shakey, a mobile robot that traveled between rooms and manipulated blocks via pushing (Nilsson, 1984). We next briefly describe PDDL in the context of a Shakey example. In the following subsection we will discuss how PDDL can be augmented to address problems with hybrid parameters.
In PDDL the state of the world is defined by a set of facts. A fact captures a relationship among state variables. We denote variables with italics symbols, e.g.
We denote constant variables with bold symbols, for example,
(
The action space is defined via a set of
As an example, the
An operator is
Given a set of facts that define the initial state, a set of facts defining the goal condition and a set of lifted operators, the planner searches for a sequence of ground operators to achieve the goal. Note that this involves solving for the sequence of operators
PDDL can be used to describe problems in finite domains; hence, all parameter values are
6.2. Algorithmic overview
PDDLStream extends PDDL to include continuous domains by allowing for the
For example, a configuration sampler could sample a collision-free joint configuration
To understand how sampling and search are integrated in PDDLStream, it is helpful to draw an analogy to the popular motion planning algorithm of probabilistic roadmaps (PRMs) (Kavraki et al., 1996). Probabilistic roadmaps use discrete graph search algorithms to solve a motion planning problem in the continuous space of robot configurations. To do this, PRMs first sample configurations and represent them as nodes in a graph. The edges of a PRM represent the one action the robot can make: moving from one configuration to another. Given an initial state, goal and graph, domain-independent graph search algorithms can be used to search for a path.
Likewise, PDDLStream first samples parameters by executing streams to certify facts. Since the parameters often must satisfy a complex set of constraints, the details of the sampling procedures are often more complex than in PRMs. The certified facts generated by the samplers are added to the initial state. Given an initial state, goal, and set of lifted operators, a discrete, domain-independent PDDL planner can be used to search for a plan composed of ground operator instances.
This algorithmic procedure is illustrated in Figure 8. The “Incremental” PDDLStream algorithm alternates between this process of sampling and searching until a plan is found, similarly to the way a PRM could continue to sample additional configurations and search the graph. In practice, we use the “Focused” PDDLStream algorithm, which more intelligently and efficiently samples in a lazy fashion, as detailed in Garrett et al. (2020b). Algorithmic Flow of PDDLStream. The samplers are used to certify static facts. These facts, together with the domain, initial facts, and goal, serve as input to a PDDL planner, which searches for a plan. If a plan cannot be found, the algorithm generates more certified facts via the samplers.
6.3. Specifying a pick-and-place domain
In this section we use a pick-and-place domain as an example of specifying domains using PDDLStream, highlighting how discrete and continuous parameters are captured. Specifying a domain requires defining a set of fact types, lifted operators, and samplers. Both the facts and actions are specified in PDDL, while the samplers are implemented in Python.
6.3.1. Facts
Facts can capture relationships over discrete variables, like objects, and continuous variables, such as robot configurations or grasp poses. Figure 9 shows a set of facts that characterize a scene with a robot and pink block on a surface. The fact ( Pick-and-Place Example. A set of facts characterize the state. Here, the robot starts at some configuration 
Facts can also be derived from other facts, for example, the fact that a robot is holding an object
6.3.2. Operators
Operators are parameterized by discrete and continuous values. As an example, the pick action is parameterized by an object to be grasped
6.3.3. Samplers
Samplers are conditional generators that output static facts that Pick-and-Place Example. Each sampler takes as input some values such that those values satisfy some constraint. The samplers (highlighted in green), output either new parameter values that are certified to satisfy some constraint or simply a certification that the inputs satisfy a constraint. Samplers can be conditioned upon each other such that the output of one sampler is the input to another, as shown by the dotted line.
For example,
The grasp
In addition to generative samplers, there are also test samplers which evaluate to true or false based on whether the input variables satisfy some constraint. These samplers do not generate new parameters and instead only add facts about existing parameters.
As an example, a test stream is used evaluate whether a trajectory
6.4. Search procedure
Given the domain specified in the previous subsection, we consider one step of the search, visualized in Figure 11 Pick-and-Place Example. Expanded view of the PDDLStream search procedure. The state is composed of facts from the initial set of facts (in blue) and the set of certified static facts generated by the samplers (in green). An action is feasible if all of the facts of the preconditions are met. The state is then updated with the resulting effects (in brown). In this example, the first action is to move from configuration 
The state is defined by all of the facts that are true. At the beginning of the search, this is the set of initial facts (in blue) and the set of facts certified by the samplers (in green). In this example pick-and-place domain, Figure 9 gives the initial facts and Figure 10 visualizes some of the streams.
As stated above, the facts certified by the samplers are static. As an example, if a grasp sampler generates a grasp on an object, the fact capturing this grasp, (
Given the state composed of all of these facts, the planner now conducts a search for a sequence of actions that can be taken to achieve the goal. For an action to be valid, all of its preconditions must be met, that is, all of the precondition facts must be true in the state. The facts could be true either because the fact was certified by a sampler, or because it was part of the initial state or because it was added to the state as the effect of a previous action.
In Figure 11, the
Following the algorithmic loop in Figure 8, if the search is unsuccessful in finding a sequence of actions to the goal, the algorithm would generate more certified facts via the samplers and search again.
7. Incorporating force into planning
PDDLStream provides a framework for solving TAMP problems. Section 6.3 shows how to specify the standard pick-and-place domain, highlighting the fact types, lifted operators, and samplers needed to capture the actions and the domain constraints, which relate to kinematic and geometric feasibility.
In order to leverage PDDLStream for forceful manipulation tasks, and thus extend PDDLStream’s range of applicability, we encode the forceful kinematic chain constraint and the fixturing requirement. We do so by adding fact types, lifted operators, and samplers. Additionally, each domain has domain-specific elements, although we find that many samplers are reused across domains. As an illustrative example for how these pieces come together, in Section 7.1, we detail the domain specification for the nut-twisting task. Through this example, we also explore what modeling effort is required to specify a domain.
In addition to finding plans, in Section 7.2, we discuss how we use cost-sensitive planning in PDDLStream to find robust plans for forceful manipulation tasks.
7.1. PDDLStream for forceful manipulation
The specification of the nut-twisting domain via the lifted operators, derived facts, and samplers. Elements colored in light gray are common across all forceful manipulation domains. Elements colored in darker gray are specific to the nut-twisting domain. Throughout the table we use the symbols:
We next include the elements (in light gray) common across all of our forceful manipulation domains, which capture the forceful kinematic chain constraint and enable fixturing. As discussed in Section 3.4, there are several different fixturing strategies so in this domain we focus on two of them (
Finally, we define two domain-specific operators,
7.1.1. Forceful kinematic chain constraint
To incorporate the forceful kinematic chain constraint into the PDDLStream framework, we assess the stability of the chain using test samplers. The facts certified by the test samplers serve as preconditions for the operators that exert forceful operations.
For example, to assess the planar frictional joints formed by the robot’s grasp, we define fact (
For an operator that applies a forceful operation using a grasped object, we use (
We also use the (
As another example, to evaluate if robot joints are stable we define the fact (
7.1.2. Fixturing
In addition to the forceful kinematic chain constraint, we require that while the robot is exerting a forceful operation on an object, the object must be fixtured. We propose implementing a variety of fixturing methods through various lifted operators and derived facts, which use forceful kinematic chain test samplers to evaluate the stability of the fixturing chain.
In the context of the nut-twisting domain, we restrict our focus to two fixturing methods: fixturing an object by holding it or by weighing it down with another heavy object. Thus, in this domain, the fact (
Neither of these two fixturing methods required adding new operators, since sequences of moves, picks, and places can be constructed to either stably grasp the fixturing object or have a robot place a weighing-down object on the fixtured object. However, other fixturing methods (detailed in Appendix B), such as operating a vise, include adding operators, as well as test samplers and facts, to the domain.
7.1.3. Domain-Specific actions
Given the forceful manipulation extensions, we now detail the domain-specific additions. In the nut-twisting domain, the robot can impart the forceful operation to twist the nut either by making contact with its fingers or through a grasped spanner. To enable this, we define two new operators in Table 1.
The controller for the
As stated in Section 7.1.1, the forceful kinematic chain constraint is implemented as preconditions for each operator. When twisting the nut with the robot hand (
We also constrain that the bolt,
7.1.4. Modeling effort
We next consider what is required in applying this framework to new settings. We consider two categories of modifications: incorporating new assessments of the forceful kinematic chain and fixturing constraints and incorporating new domain-specific elements.
Section 5 defines several mathematical models for assessing the stability of a joint. These models were incorporated as test samplers within PDDLStream. Since the framework is agnostic to the Pythonic implementation of the sampler, it is straightforward to swap out various stability models.
Adding a new joint type, and its corresponding stability model, requires adding both the corresponding sampler and the certified fact. Adding a new fixturing method may require adding operators to utilize the fixturing method (e.g. incorporating vise fixturing required adding vise actuation actions). It also requires identifying and integrating the appropriate stability assessment.
Critically, once such additions are made, they can be reused across many domains. In our experience, this allows domains to build off of each other, decreasing the implementation effort with each new domain. For example, to model the contact patch between the beam and the table in the nut-twisting domain, we incorporated the generalized friction cone. Thus, when implementing the cutting domain, we reused that same abstraction to model the contact patch between the vegetable and the table.
Solving a new task often involves adding domain-specific operators to capture a new action space (e.g. for cutting we had to add the
Finally, we must write the sampler that defines the operator’s parameterized controller. This step involves combining the existing lower-level controllers (joint-space position controller, grasp controller, guarded move controller, Cartesian impedance controller, etc.) to create the desired behavior. We find that even with operator-specific parameterized controllers, there often is significant overlap across domains with respect to how to combine the low-level controllers.
7.2. Robust planning
Given the ability to generate plans that satisfy motion- and force-based constraints, we now aim to produce
To generate robust plans, we associate each operator with a probability of success

By sampling over the friction coefficient,
We define the cost of an operator as
The cost of a plan is then the sum over the cost of all the operators in the plan. Minimizing this cost is equivalent to maximizing the plan success likelihood.
To generate robust plans, we use PDDLStream’s
With this cost definition, the cost threshold corresponds to the probability of succeeding during open-loop execution given uncertainty in the physical parameters.
8. Empirical evaluation
Using the childproof bottle domain, we demonstrate how the planner finds a wide variety of solutions and how the feasibility of these solutions depends on the environment. In each of the three domains, we show how accounting for uncertainty by planning robustly leads to the robot making different choices, both with respect to the strategy and with respect to the continuous choices.
In the supplemental material, we include simulation and real robot videos showing a variety of strategies in each of the three domains.
8.1. Exploring strategies
For each setting, we provide the number of steps for each strategy and the average planning time in seconds (and standard error) over five runs. *: Utilized a higher friction coefficient
In the first setting, we search over several possible fixturing strategies, fixing the push-twisting strategy to use a grasp contact. In this setting, the bottle starts at a random location on the table. In the second setting, we search over all possible push-twisting strategies. In this setting, the bottle starts on a high-friction rubber mat in order to leverage the friction of the surface for fixturing.
Because the underlying search over strategies in PDDLStream biases towards plans with the fewest actions, we incrementally invalidate the shorter strategies once found by the planner in order to force exploration of the alternative, longer strategies. For example, in the fixturing setting, we invalidate fixturing via a robot grasp as a feasible strategy by removing the second arm from the environment.
8.1.1. Fixturing setting
We first consider searching over various fixturing strategies. Looking first for the strategy with the fewest number of actions, the planner first tries to fixture the bottle against the table surface by applying additional downward force. However, the friction coefficient between the table and bottle is small enough that this is not a viable strategy: even when applying maximum downward force the robot cannot fixture the bottle. So instead, the planner fixtures using the second robot to grasp the bottle.
Removing the second arm from the environment forces the planner to discover new strategies. One strategy employed is to use a pick-and-place operation to move the bottle to a high-friction rubber mat, where it is possible to sample enough additional downward force such that the bottle is fixtured. Another strategy is to use a pick-and-place operation to move the bottle into the vise, where it can be fixtured.
8.1.2. Push-Twisting setting
We next consider searching over various push-twisting strategies. Three of the twisting strategies, contacting via a grasp, the fingertips or the palm, are of equivalent length, and are therefore equally attempted when searching over strategies. We can view the viability of finding successful parameters to these actions, and thus probability of employing that strategy, as how easy it to sample satisfying values.
For example, twisting the cap by pushing down with the palm is only stable if, given the values of the radius of the palm and the friction coefficient, the system can exert enough additional downward force. Decreasing either the radius or the friction coefficient narrows the space of feasible downward forces. Since the fingertips have a smaller radius, as compared to the palm, it is harder to sample a set of satisfying values for the fingertips.
In each of these settings, we demonstrate that the planner finds a variety of different strategies and that the choice of strategy adapts to what is feasible in the environment. This adaptability allows the planner to generalize over a wide range of environments.
8.2. Generating robust plans
We next examine how leveraging cost-sensitive planning enables the robot to generate more robust plans.
8.2.1. Childproof bottle domain
In the childproof bottle domain, we explore how accounting for robustness impacts what surface the robot uses to fixture against. As visualized in Figure 13-left, the robot can choose between fixturing on three surfaces: the low-friction table, the medium-friction blue mat, or the high-friction red mat. For all planning instances, we start the bottle on the table and set the friction coefficient between the table and the bottle to be just high enough to make fixturing feasible. At each cost threshold, we run the planner ten times. In opening the childproof bottle, the robot can fixture against the low-friction table (
), a medium-friction mat (
) or a high-friction mat (
). For each cost threshold, we run the planner ten times, noting which surface is used. As the cost threshold decreases, the robot is forced to more frequently use higher friction surfaces that are more robust to uncertainty.
If the planner does not account for robustness, the robot fixtures with the table every time. This is because doing so is feasible and results in the shortest plan.
When considering robustness, the planner evaluates that fixturing using the table produces a feasible but brittle plan, resulting in a high cost plan that is unlikely to succeed. To avoid this, the planner completes a pick-and-place action to relocate the bottle to one of the two mats that have a higher friction coefficient and thus offers a more robust fixturing surface. As we decrease the cost threshold, the planner is forced to use exclusively the high-friction red mat.
We evaluate how decreasing the cost threshold impacts what fixturing surface the planner uses in the childproof bottled domain (Figure 13). As the cost threshold decreases, the planning time increases.
8.2.2. Nut-twisting domain
In the nut-twisting domain, we explore robustness by considering a scenario where the robot must choose between several weights, of varying mass, to fixture the beam with.
First, for a given mass, we sample 100 placement locations along the beam holding the bolt and evaluate two robustness metrics: how robustly the weight fixtures the beam and how robustly the robot is able to grasp (and therefore move) the weight to this placement. Figure 14-left shows the trade-off: a heavier weight more easily fixtures the beam but is harder to grasp robustly. In finding a robust plan, and hence a low-cost plan, the planner is incentivized to act like Goldilocks and pick the weight that best balances this trade-off. In the nut-twisting domain, we consider the trade-off between the grasp cost and the fixturing cost. On the left, at each weight value, we randomly sample, 100 times, the pose of the weight along the beam and the grasp on the weight. Since, at the extremes, some costs evaluate to infinity, we plot the median and a 95% confidence interval. We then demonstrate how the trade-off impacts the choices made by the planner by considering an environment in which there are three possible masses, as shown in the center. The robot can fixture using the 2.6 kg mass (
), the 3.5 kg mass (
) or the 4.4 kg mass (
). Without accounting for robustness, the robot chooses any of the masses. When planning robustly, the robot more often picks the medium weight, which balances the trade-off in costs. In both cases, we run the planner ten times, noting which weight is used.
We can see this in action when running the planner in a setting with three weights of various masses (2.6, 3.5, 4.4 kg), shown in Figure 14-center, where a darker color corresponds to a larger mass. Figure 14-right shows that when the planner does not account for robustness, the three weights are selected equally, since all can be used to produce feasible plans. However, when accounting for robustness the planner more often selects the medium weight, which balances the trade-off between the cost functions. In both instances, the planner was run ten times.
8.2.3. Vegetable cutting domain
In the vegetable cutting domain, we explore how planning robustly impacts the continuous choice of the grasp on the knife.
We return to the two possible grasps on the knife shown in Figure 7. The side grasp, shown on the bottom, relies on normal reaction forces to resist the torque experienced while exerting the downward force, the first half of the cutting action. As such, the grasp is very stable and robust, regardless of the location of the grasp along the length of the handle (the annotated
To demonstrate this, we restrict our grasp set to only those that grasp the knife from the top (i.e. like Figure 7-top). Additionally, we consider cutting a softer object, reducing the magnitude of the downward force (
Across several cost thresholds, considering only the cost with respect to the grasp stability during the downward cut, we plot the location of the grasp along the handle in Figure 15. For each cost threshold, the planner was run ten times. When the planner does not account for robustness, shown at the top of Figure 15, the robot selects any grasp along the handle. Accounting for robustness, as the cost threshold decreases, the planner selects grasps that are closer to the blade and thus create a smaller lever arm, decreasing the amount of torque the grasp must be stable with respect to. In the vegetable cutting domain, we show how robust planning leads the planner to select grasps that are closer to the blade of the knife, because doing so creates a smaller torque that the grasp needs to resist. For each cost threshold, the planner is run 10 times and the grasping offset is plotted. An offset of 0 corresponds to a grasp at the butt of the knife.
We evaluating slicing success for three grasps (from top to bottom:
For each grasp and each food, we execute the same downward force-motion, without evaluating stability or robustness. For the cucumber and the banana with the peel, we repeat this fifteen times. For the banana without the peel, we repeat this ten times. Table 4 classifies the results into three categories: success (fully slicing through the object), partial success (slicing through at least half of the object), and failure (not significantly slicing the object).
While we cannot precisely know the required downward force (
Finally, we estimate that the cucumber requires the high force setting. As predicted in Figure 7, the
As a final note, one might wonder why we do not fix the planner to always use
First, we currently use a generic uninformed grasp sampler that is common across all of the handled tools (the pusher tool in the childproof bottle domain, spanner in the nut-twisting domain and knife in cutting domain). Prescribing a preferred grasp for each tool and task, via an informed grasp sampler, would both increase the modeling effort and decrease the generalization. Instead, with an uninformed sampler, we allow the planner to reason over the best grasp for the specific scenario. As shown in Table 3, this is at some computational cost.
Second, while the side grasp may seem to be the best grasp with respect to this constraint, in the context of a multi-step manipulation problem, there are many, often competing, constraints. While for simplicity we have focused on evaluating the grasp with respect to the downward force, the grasp must also be stable with respect to the translational slice. Additionally, as shown in Figure 4, Figures 5 and 7, due to the geometry of the robot’s end effector, side_grasp is only collision-free if the object is significantly raised. For foods where such a grasp is not strictly necessary, the planner can select a more reachable grasp. This flexibility is critical to enabling the planner to adapt to a variety of different environments.
9. Discussion
This paper proposes a planning framework for solving forceful manipulation tasks. We define forceful manipulation as a class of multi-step manipulation tasks that involve reasoning over and executing forceful operations, where forceful operations are defined as the robot applying a wrench at a pose.
9.1. Summary of contributions
Solving forceful manipulation tasks requires planning over a hybrid space of discrete and continuous choices that are coupled by force and motion constraints. We frame the primary force-related constraint as the system’s ability to stably exert the desired task wrench. To capture this, we propose the forceful kinematic chain constraint which evaluates if every joint in the chain is stable under the application of the imparted wrench and gravity. For each class of joint in the chain, which may be robot joints, grasps or frictional contacts, we discuss a model for evaluating its stability.
In addition to the forceful kinematic chain constraint, the planner must reason over other force-related constraints, such as the requirement to fixture objects, and over motion constraints, such as the requirement to find collision-free paths. To plan multi-step sequences that respect these constraints, we augment an existing task and motion planning framework, PDDLStream (Garrett et al., 2020b). We illustrate our system in three example domains: opening a push-and-twist childproof bottle, twisting a nut on a bolt and cutting a vegetable.
While PDDLStream finds a satisficing plan, the plan may not be robust to uncertainty. To find robust plans, we propose using cost-sensitive planning to select actions that are robust to perturbations. We specifically focus on uncertainty in the physical parameters that determine the stability of the forceful kinematics chains. Our demonstrations show how cost-sensitive planning enables the robot to make more robust choices, both with respect to the strategy, such as what fixturing method to use, and the continuous choices, such as which grasp to pick.
9.2. Future directions
In this work we assume that the planner has access to many different parameters, such as the magnitude and form of the forceful operation, object models and object poses. We also assume the domain is given in the form of the fact types and lifted operators, with their preconditions, effects, and controllers. Techniques from machine learning, combined with this planning framework, could be used to relax these assumptions and enable wider generalizations (Konidaris et al., 2018; Wang et al., 2021; Silver et al., 2021; Liang et al., 2022).
Supplemental Material
Supplemental Material - Robust planning for multi-stage forceful manipulation
Supplemental Material for Robust planning for multi-stage forceful manipulation by Rachel Holladay, Tomás Lozano-Pérez and Alberto Rodriguez in The International Journal of Robotics Research.
Footnotes
Acknowledgements
Declaration of conflicting interests
Funding
Supplemental Material
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
