Abstract
Keywords
1. Introduction
How do biological systems, like humans and animals, execute complex movements in a versatile and creative manner?
In the past decades, researchers of neurobiology and motor control have made a significant effort trying to answer this research question, and their experimental findings lead to the formulation of the
Dynamic Movement Primitives (DMPs) have their roots in the motor control of biological systems and can be seen as a rigorous mathematical formulation of the motion primitives as stable nonlinear dynamical systems (Schaal, 2006a, 2006b). In this respect, DMPs represent one of the first attempts to answer the research question:
How artificial systems, like (humanoid) robots, can execute complex movements in a versatile and creative manner?
Beyond their biological motivation, DMPs have a simple and elegant formulation, guarantee convergence to a given target, are sufficiently flexible to create complex behaviors, are capable of reacting to external perturbations in real-time, and can be learned from data using efficient algorithms. These properties explain the “success” of DMPs in robotic applications, where they have been established as a prominent tool for learning and generation of motor commands. Since their formulation in the pioneering work from Ijspeert et al. (Ijspeert et al., 2002c; Schaal, 2006), DMPs have been successfully exploited in a variety of applications, becoming de facto the first approach that novices in the Imitation Learning (IL) field use on their robots.
1.1. Existing surveys and tutorials
Comparison between existing papers, reviews, and tutorials about DMPs and our tutorial survey.
Schaal et al. (2007) presented the classical DMPs as an attempt to unify nonlinear dynamical systems and optimal control theory,
However, the abovementioned papers, reviews, and tutorials primarily focused on the methods and advancements within their respective research group and/or focused on a specific problem or field of application. On the other hand, the DMP-related literature is extensive and broad, with contributions from many research groups that made advancements in several important fields of application. Therefore, the proposed survey and tutorial on DMPs aim to scan a wider range and present a tutorial with unified and structured formulations for various DMP methods and advancements up to date. This should make it clearer for the users to see the differences and connections between various methods and can contribute to easier application. In addition, we provide a more comprehensive and categorized survey of all major DMP application areas in robotics. This can help to inspire the readers to apply the DMPs in various areas.
In the tutorial part, we present mathematical formulations, implementation details, and potential issues of existing DMP formulations starting from the classical DMPs presented in (Ijspeert et al., 2002c; Schaal, 2006) up to recent extensions of DMPs to Riemannian geometry and Symmetric Positive Definite (SPD) matrices (Abu-Dakka and Kyrki, 2020). In the survey part, we meticulously review existing literature on DMPs in a comprehensive and methodological manner by focusing on the quality and significance of their continuations without putting a bias on any particular research group. Details on the systematic review procedure are given as follows.
1.2. Systematic review process
We performed an automatic search for documents containing the string
We manually inspect all the papers and removed the ones that do not explicitly use DMPs and that only compare against DMP in their literature review. The first and foremost selection criteria were the technical quality of work and the significance of the contribution with respect to the DMP state of the art prior to the publication of any particular paper. In other words, we asked the question “did the paper make a significant step change in the field?”. Therefore, we discarded papers that presented similar (or same) ideas multiple times or that made insignificant improvements to the state of the art. If multiple papers presented the same/similar idea, we included the one with the most comprehensive technical quality, and if the quality was similar, the next deciding factors were publication in more prestigious journals/venues or the most cited ones. This manual selection led to 321 papers on DMPs (out of a total of 373 references) analyzed in this work.
1.3. A taxonomy of DMP-related research
The systematic review of DMP literature led to the taxonomy shown in Figure 1, which also describes the structure of this paper. DMPs are placed at the root of the tree and branch into two nodes, namely, the Structure of this tutorial survey on DMPs.
The tutorial part spans Sections 2 and 3. Section 2 embraces DMP formulations for
The survey part spans Sections 4 and 5. Section 4 presents DMP integration in larger executive frameworks for
The paper ends with a discussion (Section 6) of presented approaches with the aim of providing, where possible, guidelines to select the most suitable DMP approach for specific needs. We have also collected available DMP implementations (see Table 4) and contributed to the community with further open-source implementations available at https://gitlab.com/dmp-codes-collection. Section 6 terminates with a discussion on open issues and possible research directions.
1.4. Contribution overview
Our paper has several key contributions that are summarized as follows.
Concerning the tutorial part: • We present the classical DMP formulation and existing variations of this formulation in a unified manner with rigorous mathematical terms, providing implementation details and discussing the advantages and limitations of different approaches (Section 2). • We describe • We release to the community several implementations of described approaches. Detailed information on these code repositories is provided in Table 4 and Section 6. Moreover, we search for existing open-source implementations of the presented formulations and list them in our repository (Section 6.2).
Concerning the survey part: • We perform a systematic literature search to provide a comprehensive and unbiased review of the topic (Sections 4 and 5). • We categorize existing work on DMPs into different streams and highlight prominent approaches in each category (Figure 1 and Sections 4 and 5). • We present guidelines to select the most suitable approach for different applications, discuss limitations inherent to the DMP formalism, and highlight open issues and possible research directions (Section 6).
2. Formulation of DMP types
Description of key notations and abbreviations. Indices, super/subscripts, constants, and variables have the same meaning over the whole text.
2.1. Discrete DMP
The discrete DMP is used to encode a point-to-point motion into a stable dynamical system. In the following subsections, we will go through the formulation and main features of discrete DMPs starting with the classical one operating in
2.1.1. Classical DMP
The classical discrete DMPs, first introduced by Ijspeert et al. (2002c), encapsulate training data into a linear, second-order dynamics (a mass–spring–damper system) with an additive, nonlinear forcing term learned from a single demonstration. A DMP for a single DoF trajectory
2.1.1.1. Learning the forcing term
For a discrete motion, given a demonstrated trajectory
By stacking each
Locally Weighted Regression (LWR) (Atkeson et al., 1997; Schaal and Atkeson, 1998; Ude et al., 2010) is a popular approach used to update the weights
In the previous equations, A classical DMP is used to generate a discrete motion connecting 
LWR has been the standard method to learn the weights of DMPs and therefore
In general, the problem of learning and retrieving
2.1.1.2. Phase stopping and goal switching
The phase variable
DMPs also provide an elegant way to adapt the trajectory generation in real-time through goal-switching mechanisms (Ijspeert et al., 2013)
DMPs in their standard formulation are not suitable for direct encoding of skills with specific geometry constraints, such as orientation profiles (represented in either unit quaternions or rotation matrices), stiffness/damping, and manipulability profiles (encapsulated in full SPD matrices). For instance, direct integration of unit quaternions does not ensure the unity of the quaternion norm. Any representation of orientation that does not contain singularities is non-minimal, which means that additional constraints need to be taken into account during integration.
2.1.1.3. Alternative phase variables
Equation (3) describes an exponential decaying phase variable that has been widely used in the DMP literature. The main drawback of the exponential decaying phase is that it rapidly drops to very small values toward the end of the motion. This “forces” the learning algorithm to exploit relatively high weights Possible phase variables used in different discrete DMP formulations. All the different possibilities ensure that 
To overcome this limitation, Kulvicius et al. (2011) propose the sigmoidal decay phase
The sigmoidal decay in Figure 3 has a tail effect since it vanishes after
2.1.2. Orientation DMP
The classical DMP formulation described in Section 2.1.1 applies to single DoF motions. Multidimensional motions are generated independently and synchronized with a common phase. In other words, equations (1) and (2) are repeated for each DoF while the phase variable in (3) is shared. This works when the evolution of different DoFs is independent, like for joint space or Cartesian position trajectories. Unlike Cartesian position, the elements of orientation representations like unit quaternion or rotation matrix are constrained. In this section, we present approaches that extend the classical DMP formulation to represent Cartesian orientations.
2.1.2.1. Quaternion DMP
Unit quaternion
An early attempt to encode unit quaternion profiles using DMP was presented by Pastor et al. (2011). Unlike Abu-Dakka et al.’s formulation, Pastor et al.’s formulation does not take into account the geometry of
Equation (17) can be integrated as follows
Both mappings become one-to-one, continuously differentiable, and inverse to each other if the input domain of the mapping Log
A unit quaternion DMP is used to generate a discrete motion connecting 
Phase-stopping (11) can be rewritten as follows
Ude et al. (2014) extended DMP quaternion-based formulation by rewriting (13) to include goal switching mechanism
As shown by Saveriano et al. (2019) using Lyapunov arguments, both the quaternion DMP formulations in Pastor et al. (2011) and in Abu-Dakka et al. (2015a) and Ude et al. (2014) asymptotically converge to the target quaternion
2.1.2.2. Rotation matrix DMP
In their work on orientation DMPs, Ude et al. (2014) extended DMP formulation in order to encode orientation trajectories represented in the form of rotation matrices
The function
The generated rotation matrices can be obtained by integrating (24) as follows A rotation matrix DMP is used to generate a discrete motion connecting

2.1.3. SPD matrices
Abu-Dakka and Kyrki (2020) generalized DMP formulation in order to encode robotic manipulation data profiles encapsulated in the form of SPD matrices. The importance of SPD matrices comes from the fact that many robotics data are encapsulated in such matrices,
An SPD DMP is used to generate a discrete motion connecting

Moreover, Abu-Dakka and Kyrki (2020) rewrote (13) for smooth goal adaptation in case of sudden goal switching as follows
2.2. Periodic DMP
The periodic DMP (sometimes called rhythmic DMP) is used when the encoded motion follows a rhythmic pattern.
2.2.1. Classical DMP
The classical periodic (or rhythmic) DMPs were first introduced by Ijspeert et al. (2002b), where they redefined the second-order differential equation system described in (1) and (2) as follows
Similar to (4),
Similar to discrete DMPs, LWR (Schaal and Atkeson, 1998) can be used to update the weight to learn a desired trajectory. In a standard periodic DMP setting (Gams et al., 2009; Ijspeert et al., 2002b), the desired shape
The initial value of the parameters is A classical DMP is used to reproduce a rhythmic motion (brown solid line in the top left panel). The desired trajectory is obtained by adding Gaussian noise to 
The classical periodic DMP described by (34)–(36) does not encode the transient motion needed to start the periodic one. Transients are important in several applications like humanoid robot walking where usually the first step made from a rest position is a transient needed to start the periodic motion. To overcome this limitation, Nakanishi et al. (2004) presented a formulation of rhythmic DMPs including transient to achieve limit cycle with an application to biped locomotion. Ernesti et al. (2012) modified the classical formulation of periodic DMPs to explicitly consider transients as a motion trajectory that converges toward the limit cycle (i.e.
2.2.2. Orientation DMP
The same argument used in Section 2.1.2 is valid here too. Unlike Cartesian position, the elements of orientation representations like unit quaternion or rotation matrix are constrained. In this section, we present approaches that extend the classical periodic DMP formulation to represent periodic Cartesian orientations.
2.2.2.1. Quaternion periodic DMP
To encode unit quaternion trajectories accurately, the dynamic system in equations (34) and (35) is reformulated by Abu-Dakka et al. (2021), taking inspiration from the research on discrete quaternion DMPs (Abu-Dakka et al., 2015a; Ude et al., 2014).
The integration of (44) is done similarly as in (19), that is,
2.3. Formulation summary
Summary of DMP basic formulations.
3. DMP extensions
3.1. Generalization
A desirable property of motion primitives is the ability to generalize to unforeseen situations. In this section, we present approaches that allow to adapt DMP motion trajectories to novel executive contexts.
3.1.1. Start, goal, and scaling
Classical DMPs are time-invariant, meaning that time scaling
In order to remedy those issues, Pastor et al. (2009) proposed to modify the transformation system as follows
The goal can also change over time and, in this case, the tracking performance of the DMP mostly depends on the gains
Dragan et al. (2015) showed that DMPs solve a trajectory optimization problem in order to minimize a particular Hilbert norm between the demonstration and the new trajectory subject to start and goal constraints. In this light, DMP adaptation capabilities to different start and goals can be improved by choosing (or learning) a proper Hilbert norm that reduces the deformation in the retrieved trajectory.
3.1.2. Via-points
A via-point can be defined as a point in the state space where the trajectory has to pass. Failing to pass a via-point may cause the robot to fail the task execution. Therefore, having a motion primitive representation with the capability of modulating the via-points is important in robotic scenarios. It is not surprising that researchers have extended the DMP formulation to consider intermediate via-points in the trajectory generation process.
Ning et al. (2011, 2012) extend the classical DMP to satisfy position and velocity constraints at the beginning and at the end of a sample trajectory. Their approach to traverse via-points consists of creating a sample trajectory by combining locally linear trajectories connecting the via-points. This sample trajectory is used to fit a DMP that is constrained to pass the via-points.
Weitschat and Aschemann (2018) considered each via-point as an intermediate goal (
The problem of generalizing to via-point close (interpolation) and far (extrapolation) from the demonstration is faced by Zhou et al. (2019). Their approach, namely,
3.1.3. Task parameters
Reaching a different goal, or passing through via-points, may not be enough to successfully execute a task in a different context. Approaches presented in this section adapt the DMP motion to new situations by adjusting the weights
Weitschat et al. (2013) considered that
The approach by Forte et al. (2011, 2012) also assumes that
The aforementioned approaches follow a 2-steps procedure where first the shape parameters
A Mixture of Motor Primitives (MoMP) is proposed in Mülling et al. (2010, 2013) and used to generalize table tennis skills like hitting and batting a ball. MoMP uses an augmented state that contains robot position and velocity as well as the meta-parameters of the table tennis task like the expected hitting position and velocity. The adapted motion is generated by the weighted summation of
In high DoF systems, like humanoid robots, it is non-trivial to find a relationship between the task and the DMP parameters. This is especially true when the DMPs are used to encode joint space trajectories. Bitzer and Vijayakumar (2009) showed that such a relationship is easier to find in a latent (lower-dimensional) space obtained from training data. Therefore, they used dimensionality reduction techniques to find the latent space where to fit a DMP and show that the interpolation of DMP weights in the latent space results in better generalization performance.
3.2. Joining multiple DMPs
Open-source implementations of DMP-based approaches that we have released to the community. The source code for each approach is available at https://gitlab.com/dmp-codes-collection.

Results obtained by applying the zero velocity switch approach to join two DMPs trained on synthetic data. The training trajectory for the position and the orientation are shown as black-dashed lines in (a)–(b) and (f) –(g), respectively. Results are obtained with the open-source implementation available at https://gitlab.com/dmp-codes-collection.

Constant goal, moving target, and delayed goal obtained with

Results obtained by applying the target crossing approach to join two DMPs trained on synthetic data. The training trajectory for the position and the orientation are shown as black-dashed lines in (a)–(b) and (f)–(g), respectively. Results are obtained with the open-source implementation available at https://gitlab.com/dmp-codes-collection.

Results obtained by applying the basis functions overlay approach to join two DMPs trained on synthetic data. The training trajectory for the position and the orientation are shown as black dashed lines in (a)–(b) and (f)–(g), respectively. Results are obtained with the open-source implementation available at https://gitlab.com/dmp-codes-collection.
3.2.1. Velocity threshold
A properly designed DMP reaches the desired target with zero velocity and acceleration,
The velocity threshold approach is simple and effective since it directly applies to the DMP formulations in Sections 2.1.1 and 2.1.2. For instance, Saveriano et al. (2019) showed how to join multiple quaternion DMPs 2 (see Section 2.1.2.1) with the velocity threshold approach.
Results in Figure 8 are obtained when the velocity threshold is applied to merge 2 DMPs separately trained to fit minimum jerk trajectories (black-dashed lines). Figures 8(a)–(e) show the position and Figures 8(f)–(j) the orientation (unit quaternion) parts of the motion. The merged trajectory is generated by following the first DMP until the distance from the via-point is below 0.01 [m] and 0.01 [rad]. As shown in Figures 8(d) and (i), the switch occurs after about 4.7 [s]. Figures 8(e) and (j) show that the desired trajectory is accurately reproduced. More or less accurate trajectories can be obtained by tuning the distance from the via-point. However, the value of this distance is the time duration of the generated trajectory—a bigger (smaller) distance results in a shorter (longer) trajectory. For instance, in the considered case, the total motion ends after 9.5 [s] while the demonstration lasts for 10 [s]. Depending on the application, the time difference may cause failures; therefore, it has to be taken into account. Finally, the velocity threshold approach may generate discontinuities if the target of the current DMP is far from the demonstrated initial point of the following primitive.
3.2.2. Target crossing
There exist movements like hitting or batting that are correctly executed only if the target is reached with a non-zero velocity. To this end, Kober et al. (2010b) extend the classical DMP formulation in Section 2.1.1 to let the DMP to track a target moving at a given velocity. In their approach, the DMP passes the target with a given velocity exactly after
By inspecting (55) and (56), and considering that the term −
Saveriano et al. (2019) extended this idea to quaternion DMP. The angular acceleration in (16) is modified as follows
The presented
Results in Figure 10 are obtained when the velocity threshold is applied for merging 2 separately trained DMPs to fit the minimum jerk trajectories (black-dashed lines). Figures 10(a)–(e) show the position and Figures 10(f)–(j) the orientation (unit quaternion) parts of the motion. The merged trajectory is generated by following the first DMP for
3.2.3. Basis functions overlay
The approach by Kulvicius et al. (2011, 2012) combines multiple DMPs into a complex one, guaranteeing a smooth transition between the primitives by ensuring that the basis functions composing
The classical acceleration dynamics in (1) is modified as follows
Similar to target crossing, Kulvicius et al. used a moving target
The nonlinear forcing term
Having presented the main differences with the canonical approach, it is possible to focus on how Kulvicius et al. (2012) solved the problem of joining
Note that, being
Assuming that
The centers and widths computed in (63) and (64), respectively, overlap at the transition points allowing for smooth transitions between consecutive DMPs. The weights of the joined DMP are obtained by stacking the
Saveriano et al. (2019) extended the basis functions overlay approach to quaternion DMPs. Assuming that a sequence of
The angular velocity in (67) is computed for each
Results in Figure 11 are obtained when the velocity threshold is applied to merge 2 DMPs separately trained to fit the minimum jerk trajectories (black-dashed lines). Figure 11(a)–(e) show the position and Figure 11(f)–(j) the orientation (unit quaternion) parts of the motion. This approach does not require a switching rule and automatically generates a smooth trajectory—with continuous velocity as shown in Figure 11(c) and (h)—that passes close to the via-point which favors the overall reproduction accuracy (Figure 11(e) and (j)). However, the distance from the via-point depends on the weights of the joined primitives and cannot be separately decided. The trajectory generated with this approach tends to last longer than the demonstrations. This is due to the sigmoidal phase that vanishes after
3.3. Online adaptation
The standard periodic DMP learning approach approximates the shape
3.3.1. Robot obstacle avoidance and coaching
In Hoffmann et al. (2009), Park et al. (2008), and Tan et al. (2011), the detected obstacle was fitted with a potential field function to change the shape of the DMP to avoid it. More in detail, Tan et al. (2011) used the potential field to compute a time-varying goal and modified the resulting DMP trajectory, while Hoffmann et al. (2009), Park et al. (2008), and Zhai et al. (2022) added an extra forcing term to the DMP. Similarly in Gams et al. (2016), the human arm was fitted with a potential field function, which was used to reshape the DMP to perform coaching. The potential field was coupled to the position of the human hand to make pointing gestures and indicate the direction in which the robot arm position trajectory should change:
The added coupling term
Alternatively, the faulty segment of collision DMP trajectory can also be directly adjusted online by the human demonstrator (Karlsson et al., 2017b). On the other hand, the method in Kim et al. (2015) considers obstacle avoidance as a constraint of an optimization problem, which modifies the DMP trajectory to prevent collisions.
3.3.2. Robot adaptation based on force feedback
Similarly, as for obstacle avoidance, task dynamics can also be incorporated into DMP as coupling terms. In Gams et al. (2014), task dynamics were coupled on the acceleration and velocity level of the DMP. The presented method was utilized for interaction tasks, where the human changed the behavior of the robot based on the exerted dynamics on the manipulator
In Kramberger et al. (2018) this approach was extended, with a force feedback loop coupled to the velocity (2) and the goal
Here
3.3.3. Exoskeleton joint torque adaptation
In Peternel et al. (2016), human effort was used to provide information about the direction in which the assistive exoskeleton joint torque DMP should change in order to minimize it. The human was included into the robot control loop by replacing the error calculation in (41) with the human effort feedback term
The effort feedback term
3.3.4. Trajectory adaptation based on reference velocity
In many LfD scenarios, it is desired to modify both the spatial motion and the speed of the learned motion at any stage of the execution. Speed-scaled dynamic motion primitives first presented in Nemec et al. (2013a) are applied for the underlying task representation. The original DMP formulations from (1) and (2) were extended by adding a temporal scaling factor
From (75) and (76), it is evident that the velocity term is a function of phase and therefore encoded with a set of RBFs similarly as in (4). This method allows for modification of the spatial motion as well as the speed of the execution at any stage of the trajectory execution. The authors demonstrated the proposed method in a learning scenario, where after every learning cycle (using Iterative Learning Control [ILC]) a new velocity profile was encoded based on the wrench feedback and thus converged to an optimal velocity for the specific task. Vuga et al. (2016) extended the approach by incorporating a compact representation for non-uniformly accelerated motion as well as simple modulation of the movement parameters.
Later on, in Nemec et al. (2018) the authors extended the previous approach to also incorporate velocity scaling of the encoded orientation trajectories represented with unit quaternions. The outcome of the presented work is a unified approach to velocity scaling for tasks executed in Cartesian space. Furthermore, a reformulation of the velocity approach called Arc-Length-DMPs (AL-DMPs) was presented by Gašpar et al. (2018). In this work, they present a method, where the spatial and temporal components of the motion are separated, by means of the arc-length based on the time-parameterized trajectory. Arc-length, based on the differential geometry of curves, is related to the speed of the movement, given as the time derivative of the demonstrated trajectory. The approach is well suited when multiple demonstrations are compared for the extraction of relevant information for learning. Following the AL-DMPs idea, Simonič et al. (2021) introduced a constant speed DMPs to fully decouple the spatial and temporal part of the task. Pahic et al. (2021) used deep NN to map images into spatial paths represented by AL-DMPs. Weitschat and Aschemann (2018) added an extra forcing term to keep the velocity within a certain predefined limit. The aim of this work is to guarantee a safe execution of the robot task when interacting with humans, as well as provide a framework for safe interaction in a changing environment where the robot position and velocity have to change over time. For a full formulation of the coupling term, see Weitschat and Aschemann (2018).
To maintain consistency along the trajectory while ensuring a bounded velocity for each DoF, it is essential to decrease the velocities uniformly across all dimensions. This rationale prompts the consideration of temporal coupling, which involves modifying the DMP time constant, equivalent to the duration of path traversal, rather than spatial coupling. By increasing the time constant, the temporal evolution of all DoFs can be simultaneously reduced by the same factor. A temporal coupling approach based on tracking error was introduced in Ijspeert et al. (2013) and an enhanced version was proposed in Karlsson et al. (2017a). However, this method requires distorting the path in order to slow down the trajectory, and therefore, it cannot be directly utilized to limit the velocity or acceleration. One year later, the authors provided stability analysis of temporally coupled DMPs in Karlsson et al. (2018). Dahlin and Karayiannidis (2020) in their work proposed a temporal coupling based on a repulsive potential, keeping the DMP velocity within the predefined velocity limits while ensuring the path shape invariance. Subsequently, Dahlin and Karayiannidis (2021) introduced a temporal coupling method for DMPs that enables control over the velocity and acceleration constraints of the generated trajectory. This approach incorporates a filtering mechanism that proactively decelerates the trajectory as the acceleration constraints are approached, thereby mitigating the potential risk of infeasibility.
3.4. Robots with flexible joints
DMPs are commonly employed for generating trajectories in manipulators with noticeable elasticity. This choice is often justified by the relatively low execution speeds, resulting in lower acceleration and jerk levels. However, when rapid motions with high acceleration and jerk are required, combining DMPs for trajectory generation with inverse dynamics for feedforward controls can lead to oscillations in robots with flexible modes. In industrial settings, where execution speed is typically faster than the speed during demonstration due to cycle time requirements, this becomes a significant challenge. To overcome this issue, Wahrburg et al. (2021) proposed an extended framework for DMPs toward flexible joint robots, dominated as FlexDMP, by introducing a fourth-order system for generating trajectories as follows
3.5. Alternative formulations
LfD is a wide research area, and many different approaches have been developed to reproduce human demonstrations (Billard et al., 2016). As already mentioned, the aim of this tutorial survey is to provide a comprehensive overview of DMP research, and we intentionally skip the rich literature in the field of LfD. However, we found some representations that are closely related to the DMP formulation. This section briefly reviews them.
Calinon et al. (2009) computed an acceleration command for the robot in a PD-like form
Herzog et al. (2016) computed an acceleration command for the robot from the linear system
Regarding periodic motions, Ajallooeian et al. (2013) proposed a dynamical system-based framework to learn rhythmic movements with an arbitrary shape and basin of attraction. They exploit phase-based scaling functions to represent the mapping between a known, base limit cycle and a desired periodic orbit. The basic limit cycle can be, for example, the one generated by periodic DMPs, which makes the approach of Ajallooeian et al. (2013) a more general formulation of periodic primitives.
4. DMPs integration in complex frameworks
This section reviews approaches where DMPs have been integrated into bigger executive frameworks. We categorize these approaches into five main research areas, namely,
4.1. Manipulation tasks
4.1.1. Grasping and tool usage
Successfully grasping an object is the first step toward robotic manipulation. Grasping necessitates the perception of the environment, particularly visual perception, to locate the object and determine suitable grasping points based on its geometric characteristics. In this context, even slight uncertainties can lead to object slippage and failed grasps. To improve the robustness of vision-driven grasping, Krömer et al. (2010a) augmented DMPs with a potential field based on visual descriptors to adapt hand and finger trajectories according to the local geometry of the object. This grasping strategy was integrated within a hierarchical control architecture where the upper level determines the object’s grasp location and the lower level locally adjusts the motion to achieve a robust grasp of the object (Krömer et al., 2010b). Stein et al. (2014) proposed a point cloud segmentation approach based on the convexity and concavity of surfaces. This approach is particularly well-suited for recognizing object handles, and DMPs are employed to execute grasping with a real robot.
The ability to grasp and use tools is also desirable to perform daily-life manipulation. In this respect, Guerin et al. (2014) proposed the so-called
4.1.2. Motion primitives sequencing
Beyond object grasping, everyday manipulation requires a precise execution of complex movements. Often such complex movements are hard to encode into a single motion primitive, but they can be conveniently split into simpler motions (e.g. An example of hierarchical task decomposition and motion primitives sequencing from Agostini et al. (2020).
The possibility of exploiting DMPs as the building blocks of complex tasks was investigated in Caccavale et al. (2018, 2019) and Ramirez-Amaro et al. (2015). In these works, a human teacher demonstrated a relatively complex task consisting of several actions performed on different objects. The demonstration was then automatically segmented into
4.1.3. Data collection
Collecting demonstrations becomes an issue of kinesthetic teaching or marker-based motion trackers cannot be used. The latter requires an expensive sensor infrastructure that is hard to build in real-world scenarios like factory floors. Kinesthetic teaching needs torque-controlled/collaborative robots that are still uncommon in industrial scenarios. To remedy this issue, Mao et al. (2015) exploited a low-cost RGB-D camera and track the human hand using the markerless approach proposed by Oikonomidis et al. (2011). Collected data were then segmented into basic motions and used to fit DMPs. Also, the approach in Yang et al. (2022) does not require tracking markers or manual annotations. Instead, authors exploit videos of random unpaired interactions with objects by the robot and human for unsupervised learning of a keypoint model of visual correspondences. Bayesian optimization is then used to find the parameters of rhythmic DMPs from a single human video demonstration within a few robot trials.
The described approaches assume that human teachers always provide consistent and noiseless task demonstrations. Ghalamzan E. et al. (2015) encoded noisy demonstrations into a GMM and computed a noise-free trajectory using GMR. The noise-free trajectory was then used to fit a DMP that generalized to different start, goal, and obstacle configurations. Dong et al. (2023) propose to fit a DMP from correct (positive) and incorrect (negative) demonstrations to increase the representation and generalization capabilities of the model. Niekum et al. (2012, 2015) designed a framework that learns from unstructured demonstrations by segmenting the task demonstrations, recognizing similar skills, and generalizing the task execution. Interestingly, a user study on 10 volunteers conducted by Gutzeit et al. (2018) showed that existing strategies for segmentation and learning are sufficiently robust to enable automatic transfer of manipulation skills from humans to robots in a reasonable time.
4.1.4. Task learning and execution
Some work (Deniša and Ude, 2013a, 2013b, 2015; Denisa et al., 2021) exploited transition graphs and trees to embed parts of a trajectory and search algorithms to discover a sequence of partial parts and generate motions that have not been demonstrated. Approaches that rely on a hierarchical, tree-like structure to represent the task have limited task generalization capabilities. Lee and Suh (2013) used probabilistic inference and object affordances to infer the adequate skill that can handle uncertainties in the executive context. Beetz et al. (2010) learned stereotypical task solutions from observation and used task planning and symbolic reasoning to execute novel mobile manipulation tasks. A generative learning framework was proposed by Wörgötter et al. (2015) to augment the robot’s knowledge base with missing information at different levels of the cognitive architecture, including symbolic planning as well as object and action properties. Paxton et al. (2016) used task and motion planning to generalize the execution of complex assembly tasks and proposed a learning by demonstration approach to ground symbolic actions. Agostini et al. (2020) performed task and motion planning by combining an object-centric description of geometric relations between objects in the scene, a symbol to motion hierarchical decomposition depending on three consecutive actions in the plan, and the LfD approach developed in Caccavale et al. (2019) (Figure 12). A manipulation task was described at three different levels by Aein et al. (2013). The top level provides symbolic descriptions of actions, objects, and their relationships. The mid-level uses a finite state machine to generate a sequence of action primitives grounded by the lower level. A common point among these approaches is that they use DMP to execute the task on real robots.
4.2. Variable impedance learning control
Impedance control can be used to achieve compliant motions, in which the controller resembles a virtual spring–damper system between the environment and robot end-effector (Hogan, 1985). Such an approach permits smooth, safe, and energy-efficient interaction between robots and environments (possibly humans). A standard model for such interaction is defined as follows
In fact, Variable Impedance Control (VIC) plays an important role when a robot needs to interact with any environment in order to avoid high impact forces and damage for the environment or the robot (i.e.
In this review, we will mention some of the works that integrate DMP with VIC in a VILC framework. Figure 13 shows a simple generic example where DMP is integrated in a VIC control scheme. General control scheme of VIC and DMP.
Buchli et al. (2011a) proposed one of the earliest approaches that integrates DMP with Policy Improvement with Path Integrals (PI2) algorithm (Theodorou et al., 2010) to learn movements (position and velocity presented by DMP) while optimizing impedance parameters. Later, the authors exploited a diagonal stiffness matrix and expressed the variation (time derivative) of each diagonal entry as follows
Basa and Schneider (2015) introduced an extension to DMP formulation by adding a second nonlinear function to cope with elastic robots as follows
Nemec et al. (2016) proposed a cooperative control scheme that enables a dual-arm robot to adapt its stiffness online along with the executed trajectory in order to provide accurate evolution. Umlauft et al. (2017) used GP along with DMPs (as proposed in Fanger et al. (2016)) to predict the trajectories. During the execution, their admittance controller adapts both stiffness and damping online. The energy-tanks passivity-based control method has been integrated with DMPs to enforce passivity in order to stably adapt to contacts in unknown environments by adapting the stiffness online (Shahriari et al., 2017; Kramberger et al., 2018; Kastritsi et al., 2018).
Methods in Bian et al. (2019), Peternel et al. (2014, 2018b, 2018a), and Yang et al. (2018, 2019) designed different multi-modal interfaces to let the human to explicitly teach an impedance behavior to the robot. Most of them combined EMG-based variable impedance skill transfer with DMP-based motion sequence planning, inheriting the merits of these two aspects for robotic skill acquisition. Hu et al. (2018) used Covariance Matrix Adaptation-Evolution Strategies (CMA-ES) to update the parameters of DMPs and variable impedance controller in order to reduce the impact during the robot motion in noisy environments. Dometios et al. (2018) integrated a Coordinate Change-DMPs (CC-DMP) with a vision-based motion planning method to adapt the reference path of a robot’s end-effector and allow the execution of washing actions.
Travers et al. (2016, 2018) proposed a shape-based compliance controller for the first time in locomotion, by implementing amplitude compliance on a snake robot moving in a complex environment with obstacles. Their approaches allow snake-like robots to blindly adapt to such complex unstructured terrains, thanks to their proprioceptive gait compliance techniques.
Recently, an adaptive admittance controller is proposed (Wang et al., 2020) which integrates GMR for the extraction of human motion characteristics, DMP to encode a generalizable robot motion, and an RBF-NN-based controller for trajectory tracking during the reproduction phase. In their work, Spector and Zacksenhouse (2021) introduced a residual admittance policy within the framework of DMPs. This policy explicitly learned full asymmetric stiffness matrices and aimed to correct the movements generated by a baseline policy. The effectiveness of the learned policy was demonstrated through successful peg insertion tasks involving various shapes and sizes of pegs. Additionally, the policy exhibited robustness in handling uncertainties in hole location and peg orientation and showed good generalization to new shapes. Moreover, the learned policy demonstrated successful transferability from simulations to real-world scenarios.
Novel LfD approaches explicitly take into account that training data are possibly generated by certain Riemannian manifolds with associated metrics. Abu-Dakka and Kyrki (2020) reformulated DMPs based on Riemannian metrics, such that the resulting formulation can operate with SPD data in the SPD manifold. Their formulation is capable to adapt to a new goal-SPD-point.
Recently, biomimetic controller has been integrated with DMPs (Zeng et al., 2021) in order to learn and adapt compliance skills.
4.3. Reinforcement Learning (RL)
In RL, an agent tries to improve its behavior via trial-and-error by exploring different strategies (
4.3.1. DMPs as control policies
One possibility is to use parameterized policy and use RL to search for an optimal, finite set of policy parameters. In this respect, DMPs have been widely used as policy parameterization. The general idea is shown in Figure 14. More in detail, (Peters and Schaal, 2008a; Schaal, 2006) showed that various policy gradient and actor-critic RL approaches can be effectively applied to improve robotic skills parameterized as DMPs. Other research focused on developing policy search algorithms specifically for parameterized policies. Inspired by stochastic optimal control, Theodorou et al. (2010) proposed PI2 which is an application of path integral optimal control to DMPs. PI2 and DMPs have been successfully applied in several domains including VILC (Buchli et al., 2011a, 2011b) and in-contact tasks (Hazara and Kyrki, 2016), grasping under-state estimation uncertainties (Stulp et al., 2011a), bi-manual manipulation (Zhao et al., 2020), nonprehensile manipulation (Sun et al., 2022), and robot-assisted endovascular intervention (Chi et al., 2018). Kober and Peters (2011) derived from expectation-maximization the so-called Policy Learning by Weighting Exploration with the Returns (PoWER). PoWER and DMPs have been successfully applied to perform highly dynamic tasks including ball-in-a-cup (Kober and Peters, 2011) and pancake flipping (Kormushev et al., 2010). General block scheme of DMP-based policy improvement.
4.3.2. Limit the search space
Even with parameterized policies the number of rollouts needs to search for optimal policy parameters may become large, especially for robots with many DoFs. Dimensionality reduction techniques can be exploited to perform policy search in a reduced space (Colomé and Torras, 2014). The effectiveness of this approach was demonstrated in the challenging task of clothes (i.e.
4.3.3. DMP generalization and sequencing
Instead of using generalization to provide a better initial policy, some researchers exploit RL to improve and generalize the motion primitive. André et al. (2015) adapted DMP policies to walk on sloped terrains. Mülling et al. (2010) generalized to new situations using a mixture of DMPs. In their approach, RL was used to estimate the shape parameters as well as to estimate the optimal responsibility of each DMP. Mülling et al. (2013) used episodic RL to estimate meta-parameters like the temporal and spatial interception point of the ball and the racket typical of table tennis tasks. Lundell et al. (2017) used parameterized kernel weights and RL to search for optimal parameters, while Forte et al. (2015) augmented the given demonstration using RL-based state space exploration to autonomously expand the robot’s task knowledge. Metric RL was exploited by Hangl et al. (2015) to smoothly switch between learned DMP policies and execute a task in new situations.
RL can be also applied to sequence multiple motion primitives and perform more complex tasks; a successful strategy when the robot has to perform, for instance, a manipulation task (Section 4.1). To sequence multiple primitives, it is also of importance to learn the goal of each motion. Tamosiunaite et al. (2011) used continuous value function approximation to optimize the goal parameters of a DMP used to perform a pouring task. Kober et al. (2011, 2012) learned a meta-parameter function that maps the current state to a set of meta-parameters including the goal and duration of the movement. Instead of separating shape and goal learning into different processes, Stulp et al., 2011b, 2012) extended PI2 to simultaneously learn the shape and goal of a sequence of DMPs. Wang et al. (2022) describe both complex manipulation tasks and user execution preferences as logic and temporal constraints and use RL to find a set of DMP parameters that fulfill the constraints.
4.3.4. Skills transfer
Learned skills can be potentially transferred across different tasks to speed up the learning process and increase robot autonomy. To this end, Fabisch and Metzen (2014) considered the case where the robot can actively choose which task to learn to make the best progress in learning. The process of actively selecting the task was considered as a non-stationary bandit problem for which a suitable algorithmic solution exists while intrinsic motivation heuristics were exploited to reward the agent after the selection. Cho et al. (2019) defined the complexity of a motor skill based on the temporal and spatial entropy of multiple demonstrations and used the measured complexity to generate an order for learning and transferring motor skills. Their experimental findings provided useful guidelines for skill learning and transfer. In short, humans have to demonstrate, when possible, the most complex task and then the robot is able to transfer the motor skills. Vice versa, if demonstrations are not given, it is more effective to start learning simple skills first and then transfer the simpler skills to more complex tasks.
4.3.5. Learning hierarchical skills
RL often lacks scalability to high-dimensional continuous state and action spaces. To remedy this issue, hierarchical RL exploits a
Stulp and Schaal (2011) proposed to represent different options as DMPs to sequence. PI2 was extended to optimize shape and (sub-)goal of each DMP at different levels of temporal abstraction. In particular, the shape was adjusted based on the cost up to the next primitive in the sequence, while the sub-goal considers the cost of the entire sequence of two DMPs. Layered direct policy search in End et al. (2017) did not rely on a set of predefined sub-policies and/or sub-goals but instead used information theoretic principles to uncover a set of diverse sub-policies and sub-goals.
Reducing the number of rollouts required to discover optimal policies is also important in Hierarchical RL (HRL). As already mentioned, IL is a valuable option to find good initial policies. However, there are applications like manipulation with multi-fingered robotics hands for which it is hard or impossible to provide expert demonstrations. To make policy search more efficient, Ojer De Andres et al. (2018) used HRL where the upper level considers discrete action and state spaces to search for optimal finger gaiting and synchronization among the fingers. This information was passed to the lower level where rhythmic DMPs and PI2 generated continuous commands for the fingers. Another possibility to increase data-efficiency is to use model-based approaches for RL. Colome et al. (2015) exploited a friction model to improve a DMP policy and manipulate soft tissues (a scarf). A model-based HRL approach was proposed by Kupcsik et al. (2017) for data-efficient learning of upper level policies that generalize well across different executive contexts. Li et al. (2018) proposed a hybrid hierarchical framework where the higher level computes optimal plans in Cartesian space and converts them to desired joint targets using an efficient solver. The lower level is then responsible to learn joint space trajectories under uncertainties using RL and DMPs. Recently, Davchev et al. (2022) proposed residual LfD, a framework that combines DMPs and RL to learn a residual correction policy for assembly tasks. In the paper, they show that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of the DMPs.
4.4. Deep learning
A popular method of machine learning is NNs. Due to their non-parametric nature, they can effectively represent nonlinear mappings. A major drawback of NNs in the past was their computational complexity of learning. In recent years, there is a renewed interest in NNs. New deep learning approaches were successfully (LeCun et al., 2015) applied in machine vision and language processing.
In recent years, deep learning has been applied also in robotics to learn task dynamics (Yang et al., 2016) and movement dimensionality reduction (Chen et al., 2015). The authors (Chen et al., 2015, 2016) introduced a framework called AutoEncoded DMP (AEDMP) which uses deep auto-encoders to find movements represented in latent feature space. In this space DMPs can optimally be generalized to new tasks, as well as the architecture enables the DMPs to be trained as a unit. Pervez et al. (2017b) in their work coupled the vision perception data for object calcification with task-specific movement definitions represented with DMPs. The data was modeled with Convolutional Neural Networks (CNNs), where the images and the associated movements were directly processed by the deep NN, thus preserving the associated DMP properties and eliminating the need for extracting the task parameters during motion reproduction. Later on, Kim et al. (2018b) combined deep RL with DMPs to learn and generalize robotic skills from demonstration. The framework builds on an RL approach to learn and optimize a new DMP skill based on a demonstration. The RL approach is backed up with a hierarchical search strategy, reducing the search space for the robot, which allows for more efficient learning of complex tasks. Furthermore, Pan and Manocha (2018) presented a deep learning approach for motion planning of high-dimensional deformable robots in complex environments. The locomotion skills are encoded with DMPs, and an NN is trained for obstacle avoidance and navigation. The data is further optimized with deep Q-Learning showing that the learned planner can efficiently plan and navigate tasks for high-dimensional robots in real-time.
Pahic et al. (2018) proposed a deep learning approach for perception-action couplings, demonstrating the coupling between the vision-based images and associated movement trajectories. Later on, they extended the approach to incorporate CNNs and give a distinguishing property formulation for the approach (Pahič et al., 2020), which utilizes a loss function to measure the physical distance between the movement trajectories as opposed to measuring the distance between the DMPs parameters which have no physical meaning, leading to better performance of the algorithm. Recently, they extended the usage of GPR to create a database needed to train autoencoder NNs for dimensionality reduction (Lončarević et al., 2021). Mavsar et al. (2022) in their work presented a recurrent neural architecture, capable of transforming variable-length input motion videos into a set of parameters describing a robot trajectory which is later encoded with DMP, where predictions can be made after receiving only a few frames, in addition, a simulation environment is utilized to expand the training database and to improve the generalization capability of the network, which is used for handover robotic tasks. Furthermore, Jaques et al. (2021) in their study introduced the Newtonian Variational Autoencoder (Newtonian VAE), a framework for learning latent dynamics. Drawing inspiration from Newton’s second law, they define a linear dynamic system in a hidden space. This system is based on a rigid-body model with
4.5. Lifelong/Incremental learning
Lifelong (incremental) learning is a framework which provides continuous learning of tasks arriving sequentially (Chen and Liu, 2018; Fei et al., 2016; Thrun, 1996). The essential component of this framework is a database that maintains the knowledge acquired from previously learned tasks General framework of lifelong/incremental learning approach.
Churchill and Fernando (2014) proposed a cognitive architecture capable of accumulating adaptations and skills over multiple tasks in a manner which allows recombination and re-use of task-specific competences. Lemme et al. (2014) segmented demonstrations based on geometric similarities and subsequently created a motion primitives library. The library is updated by removing unused skills and including new ones. Multiple demonstrations are used by Reinhart and Steil (2014, 2015) to build a parameterized skill memory that connects low-dimensional skill parameterization to motion primitive parameters. This low-dimensional embedding is then leveraged for efficient policy search. Piece-wise linear phase is used to improve incremental learning performance (Samant et al. 2016). Duminy et al. (2017) designed a framework for learning which data collection strategy is most efficient for acquiring motor skills to achieve multiple outcomes and generalize over its experience to achieve new outcomes for cumulative learning.
A generative learning framework is proposed to augment the robot’s knowledge-base with missing information at different levels of the cognitive architecture including symbolic planning as well as object and action properties (Wörgötter et al., 2015). Aforementioned approaches use DMPs to represent the learned skills and execute them on real robots.
Wang et al. (2016) proposed a modified formulation of DMPs called DMP+ which is capable of efficiently modifying learned trajectories by improving the usability of existing primitives and reducing user fatigue during IL. Later, DMP+ had been integrated into a dialogue system with speech and ontology to learn or re-learn a task using natural interaction modalities (Wu et al., 2018).
In the literature, it has been shown that incremental learning provides better generalization than the isolated learning approaches in terms of interpolation, extrapolation, and the speed of learning (Hazara and Kyrki, 2017). Hazara and Kyrki (2018) improved their Global Parametric Dynamic Movement Primitive (GPDMP) (Lundell et al., 2017) in order to construct, incrementally, a database of motion primitives, which aims to improve the generalization to new tasks. Furthermore, it has been transferred incrementally from simulation to the real world (Hazara and Kyrki, 2019). Moreover, authors endow incremental learning with a task manager, which is capable of selecting a new task by maximizing future learning while considering the current task performance (Hazara et al., 2019).
5. DMPs in application scenarios
We categorize the applications into several subsections based on different topics. We first separate the use of DMPs for robot interaction with the
5.1. Robots in contact with passive environment
Most of the daily tasks that the robots perform involve some kind of physical interaction with the environment that requires control of forces or positions. Nevertheless, simultaneous control of force and position in the same axis is not possible (Stramigioli, 2001) 4 , and thus the control approaches have to make a compromise between prioritizing position control or force control (Schindlbeck and Haddadin, 2015). The key to such control is for the robot to learn appropriate force or position reference trajectories that can lead to the desired task performance in interaction with the environment. Factors such as positional information, muscle stiffness of the human arm, and contact force with the environment are crucial in comprehending and generating robot manipulation behaviors that resemble those of humans. In Wang et al. (2021), both positional and contact force profiles are represented using DMPs to facilitate the transfer of human–robot skills.
5.1.1. Demonstration of interaction tasks
A common approach to teaching robot motion trajectories is kinesthetic guidance (Figure 16- Human operators teach the robot how to perform different tasks. 
An alternative to learning force trajectories is to learn the impedance of the robot by learning the desired stiffness trajectories. The ability to change the impedance of the arm is crucial to simplify the physical interaction in unpredictable and unstructured environments (Burdet et al., 2001; Hogan, 1984). In Peternel et al. (2018a), teleoperation was used with a push-button interface to command the robot impedance, which was learned by DMPs that enabled the robot to perform various collaborative assembly tasks. For example, the learned position and stiffness DMPs were used to insert a peg in a groove to bind the two parts or to screw a bolt (Peternel et al., 2018a). A similar approach was used in Yang et al. (2018) to learn DMPs used for a vegetable cutting task.
While teleoperation-based methods are very effective to teach the robot DMPs for interaction tasks, it usually involves a complex and expensive system. The method in Abu-Dakka et al. (2018) enabled the robot to learn stiffness profiles through measurement of interaction force with the environment to perform a valve turning task. The method in Peternel et al. (2017a) used human demonstration and EMG to learn stiffness DMPs from human muscle activity measurements in order to perform sawing and wiping (Figure 17) tasks. Using DMPs for adapting to changing surfaces (e.g.
Nevertheless, adaptation of a single trajectory is unlikely to generate an appropriate solution for more general cases, where the task execution needs to change significantly. After learning the initial DMP motion trajectories through kinesthetic guidance, the robot can then adapt them based on the measured force of interaction while performing the task. Pastor et al. (2011) introduced a method for real-time adaptation of demonstrated DMP trajectories depending on the measured sensory data. They developed an adaptive regulator for trajectory adaptation based on estimated and actual force data. Prakash et al. (2020) extended the real-time adaptation approach incorporating a fuzzy fractional-order sliding mode controller in order to efficiently and stably adapt the demonstrated DMP trajectory to fast movements, such as a ping pong swing. Recently, Cui et al. (2022) presented a method for coupling multiple DMPs for modeling robot tasks for transportation tasks of deformable objects.
Sutanto et al. (2018) presented a data-driven framework for learning a feedback model from demonstrations. They used an RBF-NN to represent the feedback model for the movement primitive. Similar to this research, Gams et al. (2010) proposed a method for adaptation of demonstrated movements depending on the desired force, with which the robot should act on the environment. Thus, they ensured the adaptation of the learned movements to different surfaces. This approach was later expanded (Pastor et al., 2011) to provide the statistically most likely force–torque profile (Pastor et al., 2012), and furthermore, force–torque data was used for training a classifier (Straizys et al., 2020) in order to modulate the demonstrated trajectory for the use with delicate tasks such as tissue or fruit cutting.
Moving onward from policy learning, Do et al. (2014) presented an adaptation framework, where not only the desired adaptation force or trajectory but also the entire skill can be learned. They demonstrated the method with a wiping task under different environmental conditions.
5.1.2. Assembly tasks
Assembly presents one of the more challenging tasks to automate, where not only position trajectories but also task dynamics have to be taken into account. To deal with this challenge, various methods were proposed. Abu-Dakka et al. (2015a) proposed a method that can learn the orientation aspect of the complex physical interaction, like the peg-in-the-hole assembly tasks (Figure 18). The proposed method was integrated into an industrial assembly framework where the key challenge was to adapt to uncertainties presented by the assembly task (Abu-Dakka et al., 2014; Krüger et al., 2014). An example of using DMPs in assembly tasks (e.g.
Complex assembly tasks that are subject to change cannot be demonstrated and executed on the fly; therefore, adaptation methods are required for ensuring a successful execution. Nemec et al. (2020) used exception strategies, modeled as DMPs, for dealing with complex assembly cases. Sloth et al. (2020) presented an exception strategy framework, combining discrete and periodic DMP, coupled with force control to learn an assembly task under tight tolerances. Angelov et al. (2020) incorporated several different control policies by taking into account the dynamics and sequencing of the task. The approach uses DMPs to generate free motions and convolutional neural networks for the assembly.
In some cases, active exploration and autonomous database expansion can be used for learning assembly policies automatically. In Petric et al. (2015), the proposed algorithm can build and combine CMP motion knowledge from a database in an autonomous manner.
Complementary to assembly tasks, disassembly is also challenging by solely using the demonstrated trajectories. As described in Ijspeert et al. (2013), DMPs have a unique point attractor in the specified goal parameter of the movement, essentially repelling the idea of reversibility. Therefore, Nemec et al. (2018) proposed a framework, where the disassembly challenge was tackled by learning two separate DMPs from a single demonstrated motion: one forward and one backward. Iturrate et al. (2019) took the idea further and reformulated the DMP phase system with a logistic differential equation to obtain two stable point attractors. This Reversible Dynamic Movement Primitive (RevDMP) approach provided a reversibility formulation of the dynamical system and demonstrated the effectiveness of the algorithm on a peg-in-hole assembly task.
5.1.3. Learning methods for contact adaptation
Desired force–torque profiles can be tracked using ILC (Gams et al., 2014, 2015b). In repetitive robotic tasks, iterative learning has been gaining increased popularity (Bristow et al., 2006) due to its effectiveness and robustness. However, in order to achieve effective results, a careful tuning of learning parameters is required. Norrlöf (1991) and Tayebi (2004) presented an adaptive learning approach for automated tuning of learning parameters.
Another approach is to use RL to adapt DMPs. For example, in Buchli et al. (2011b, 2011a), stiffness parameters were adjusted during the task execution by RL.
Alternatives to the feedback-based adaptation of DMPs and RL are scalability and generalization approaches. Matsubara et al. (2011) proposed an algorithm for the generation of new control policies from existing knowledge, thereby achieving an extended scalability of DMPs, while a mixture of motor primitives was used for generation of table tennis swings (Mülling et al., 2010). On the other hand, generalization of DMPs was combined with model predictive control by Krug and Dimitrov (2015) or applied to DMP coupling terms by Gams et al. (2015a), which were learned and later added to a demonstrated trajectory to generate new joint space trajectories.
Stulp et al. (2013) proposed to learn a function approximator with one regression in the full space of phase and task parameters, bypassing the need for two consecutive regressions. Forte et al. (2012) performed a comparison study of LWR and GPR for trajectory generalization. This work shows that higher accuracy can be achieved with LWR trajectory approximation. Koropouli et al. (2015) presented a generalization approach for force control policies. By learning both the policy and the policy difference data using LWR, they could estimate the policy at new inputs through superposition of the training data.
Deniša et al. (2016a) used GPR-based generalization over combined joint position trajectories and torque commands in the framework of CMPs. To showcase the versatility of the approach, Petric et al. (2018) applied it for robot-based assembly tasks. Finally, Kramberger et al. (2017) extended the approach to account for variations of the desired tasks,
5.2. Human–robot co-manipulation
While control of robot interaction with the passive environment can solve the majority of the tasks, in some cases the robot needs to interact with an active agent (e.g.
In Peternel et al. (2014), the collaborative robot was taught online through teleoperation how to perform collaborative sawing with a human co-worker. The impedance was commanded to the robot through the muscle activity measurement using EMG. DMPs were used to encode coordinated phase-dependent motion and impedance as demonstrated by the human teleoperator. Teaching through teleoperation is an effective way to convey the physical interaction skill to the collaborative robot; however, the setup can be expensive and is not widely available.
An intuitive alternative to teleoperation is for the robot to learn the skill directly through physical interaction with the human partner while they are collaborating. Numerous methods have focused on learning the synchronized motion between collaborative partners (Gams et al., 2014; Kulvicius et al., 2013; Lu et al., 2022; Peternel et al., 2018b; Prada et al., 2013; Sidiropoulos et al., 2019, 2021; Ugur and Girgin, 2020; Umlauft et al., 2014; Wu et al., 2022; Zhou et al., 2016a). For example, in Kulvicius et al. (2013), the interactive movements were encoded with DMPs and adapted based on the measured force arising from the disagreements between agents during co-manipulation. Similarly, in Gams et al. (2014), the collaborative movements were encoded with DMPs and adapted using force feedback and ILC. The approach in Zhou et al. (2016a) combines two DMPs to encode the movements of each partner’s arm, which are coupled in a leader-follower manner.
Besides adapting the collaborative movements, in Peternel et al. (2018b), the robot used DMPs to also learn the impedance online directly from the co-manipulation with the human (Figure 19). The robot started with a basic skill set that enabled it to collaborate with the human in a pure follower role. Through the collaborative task execution, the robot then learned the motion and impedance trajectories online and encoded them with DMPs. When the human became fatigued, the robot used the learned advanced skill to take over the majority of the task execution. An example of using DMPs for collaborative human–robot sawing from Peternel et al. (2018b).
The method in Ben Amor et al. (2014) proposed an upgraded version of standard DMPs called
There are also other types of co-manipulation scenarios, such as within-hand bi-manipulation or human–robot object handover. For example, in Amadio et al. (2022), Gao et al. (2019), and Koene et al. (2014) DMPs were used to perform bi-manipulation, while in Abdelrahman et al. (2020), Iori et al. (2023), Lafleche et al. (2019), Prada et al. (2014), and Solak and Jamone (2019) DMPs were used for human–robot object handover.
When the environment is hazardous for human workers or when there are too many robots compared to the number of human workers, the obvious solution is to make robots collaborate between themselves. The method in Peternel and Ajoudani (2017) used DMPs to make novice robots learn from the expert robot through co-manipulation. Initially, the novice robot remained compliant to let the expert robot lead the task execution. In the first stage, the novice robot learned the reference motion through DMPs. In the second stage, it became stiff to perform the newly learned motion, while the expert robot initiated stiff/compliant phases expected in the collaborative task execution. Finally, the novice robot then learned in which phases of the task to increase or decrease the impedance and encoded this impedance behavior with DMPs.
5.3. Human assistance, augmentation, and rehabilitation
The most common type of co-manipulation is the classic human–robot collaboration, where a human and a robotic agent are physically performing industrial or daily tasks. Another type of co-manipulation occurs when a human is using a wearable robot such as an exoskeleton. There are different types of functions that the exoskeleton can be used for. One function is augmentation where the current human motion is amplified to augment existing (healthy) human capabilities such as in tasks involving heavy loads. When human capabilities are impaired, the exoskeleton has to act in an assistance function. If human capabilities are impaired to a larger degree, the exoskeleton can be employed in a rehabilitation function to perform physical therapy. In the augmentation function, DMPs can be employed on the robot to offload a hard and/or repetitive motion of healthy human workers, while in the assistance and rehabilitation functions, they can be used to assist impaired humans in their daily tasks or perform repetitive physical therapy on patients that would lead to recovery.
Besides the type of exoskeleton function, another important aspect is the shared load between the exoskeleton and the human during physical human–robot interaction. For example, in the case of highly impaired patients in early-stage rehabilitation, DMPs generated by the exoskeleton may take almost all the load of the movement and assume the role of the leader to perform passive physiotherapy. As the recovery progresses and more active exercise is preferred, the majority of the load can be shifted to the human who leads the movements, while the DMP system can act in a support capacity as a follower. When full recovery is not possible, DMP can provide assistance in daily tasks where the load can be partially shared. In case of augmentation of existing (healthy) human capabilities, the DMP system can adapt to human movement and add extra power on the top of the human effort that is needed to perform a specific heavy-load task. In some cases, the exoskeleton can learn DMP from human movements to take over a repetitive action completely.
The methods in Lauretti et al. (2017, 2018) obtained DMPs offline by learning by demonstration, which were then used by an arm exoskeleton to support human movements. In Peternel et al. (2016), the control method employed DMPs to interactively adapt the joint torques required to perform the arm exoskeleton assistance and compensate all the underlying dynamics for periodic movements (Figure 21-
Gait-related assistance and rehabilitation with exoskeletons is a very common application of DMPs, and there are numerous examples (Abu-Dakka et al., 2015; Amatya et al., 2020; Escarabajal et al., 2023; Huang et al., 2016a; Hwang et al., 2019, 2021; Schaal, 2006; Yuan et al., 2020; Zou et al., 2021). In Abu-Dakka et al., 2015, 2020), a parallel robot was used for ankle rehabilitation, where the movements were generated by DMPs (Figure 20). A similar parallel robot was applied in Escarabajal et al. (2023) for knee rehabilitation, where RevDMPs was used to enable a patient to reverse the movement in order to maintain their own desired pace. In Huang et al. (2016a), DMPs were used to learn the gait motion trajectories for a lower body exoskeleton. This approach was then extended with an RL method to adapt a force coupling term (similar to earlier approaches presented in Section 3.3.2) to enable online adaption of motion trajectories (Huang et al., 2016b). In Luo et al. (2022), DMPs were adapted to the different starting and ending locations of the foot in the swing phase of gait. The method in Xu et al. (2023) used DMPs for leg exoskeleton movements in a mirror therapy concept where the motion of the healthy limb is transferred to the impaired limb. Hong et al. (2023) used DMPs to plan obstacle-avoidance leg movements during walking with a prosthesis. An example of using DMPs for teaching passive exercises for ankle rehabilitation (Abu-Dakka et al., 2015, 2020).
Besides normal gait, DMPs were also applied for stair-ascend (Xu et al., 2020) and sit-to-stand (Kamali et al., 2016) assistive movements of lower body exoskeletons. In Joshi et al. (2019), a robotic arm was used to assist humans with putting the clothes on their body, where the movements were generated by DMPs. Ding et al. (2022) developed a framework for assistance of older adults combining DMPs admittance control for mobility assistance and manipulation support. The framework was implemented on a mobile platform with a robotic arm utilized for LfD.
Besides assistive body movement and rehabilitation, DMPs were also applied for relaxation purposes. For example, in Li et al. (2020), a robotic arm provided massage movement through DMPs.
5.4. Teleoperation
Teleoperation is one of the major fields of robotics and enables a human to have a direct and real-time control over a (remote) robot. Typically, the control is done through interfaces that can capture the human commands to be sent to the robot and that can provide haptic feedback from the robot. While teleoperation focuses on giving the human operator a full or shared control over the robot, DMPs are used to encode autonomous robot behaviors. Therefore, here we mostly examine cases where teleoperation is used to teach the robot new autonomous behavior encoded by DMPs.
In Kormushev et al. (2011), a combination of kinesthetic teaching and teleoperation was employed to form the DMP-based robot skill for ironing. After the motion trajectories were learned through kinesthetic guidance, the corresponding forces were recorded by using a haptic device and a teleoperation system. In Peternel et al. (2014), teleoperation was used to teach the robot how to physically collaborate with another human. Since there was no haptic feedback, the teleoperation setup was unilateral, but the human was able to teach also the impedance of the robot in addition to motion. The former was commanded by muscle activity measurement through EMG, while the latter was commanded by the movement of the human operator’s arm as measured by an optical motion capture system. Similar teleoperation-based DMP learning approaches were used in Lentini et al. (2020) and Yang et al. (2018).
In Peternel et al. (2018a), the human operator taught the robot through teleoperation how to perform autonomous assembly actions (Figure 16-
A real robot is not always necessary to acquire new skills. In Beik-Mohammadi et al. (2020), the robot and the environment were simulated and the human operator used a virtual reality system. A combination of DMPs and RL was used to form an adaptive skill. The scenario proposed in Abu-Dakka et al. (2015a) was teleoperation in its basis; however, the human demonstrator did not just pretend that he/she is embodied in the robot, but the robot task environment was cloned at the human side (Figure 16-
Multiple demonstrations through teleoperation can be inconsistent, especially if done in a multi-agent shared-control setting. The method proposed in Pervez et al. (2019) can synchronize inconsistent demonstrations through shared-control teleoperation and encode them with DMPs. Maeda (2022) investigated the possibility of using DMPs to implicitly blend human and robot policies without requiring the design of task-specific arbitration functions or the need to provide multiple (possibly inconsistent) demonstrations.
5.5. High DoF robots
DMPs provide an elegant and fast way to deal with systems with high-dimensional space by sharing one canonical system (3) among all DoFs and maintain only a separate set of transformation systems. By high-dimensional space, we are referring to systems with 10 or more DoFs (i.e. Left photo shows arm exoskeleton application from Peternel et al. (2016). The right photo shows high-DoF humanoid robot Walk-man (Tsagarakis et al., 2017) performing sawing in Peternel and Ajoudani (2017).
Ijspeert et al. (2002b, 2002a) used DMPs in an IL framework to learn tennis forehand, a tennis backhand, and rhythmic drumming using 30-DoFs humanoid robot. Pastor et al. (2009) used DMPs to encode a 10-DoFs exoskeleton robot arm. Luo et al. (2015) integrated DMPs with stochastic policy gradient RL and GPR in order to design an online adaptive push recovery control strategy. The approach had been applied to PKU-HR5 humanoid robot with 20-DoFs. André et al. (2015, 2016) implemented a predictive model of sensor traces that enable early failure detection for humanoids based on an associative skill memory to periodic movements and DMPs. They applied their algorithm on DARwIn-OP with 20-DoFs in simulation. Pfeiffer and Angulo (2015) represented gestures by applying DMPs on REEM robotic platform with 23-DoFs. Nah et al. (2020) proposed an approach to optimize DMP parameters in order to deal with the complexity of high DoF system like a whip. They tested their approach in simulation for 10-, 15-, 20-, and 25-DoF systems. In order to reduce the number of required rollouts for adaptation to new task conditions, Queißer and Steil (2018) used CMA-ES to optimize DMP parameters. In addition, they introduced a hybrid optimization method that combines a fast coarse optimization on a manifold of policy parameters with a fine-grained parameter search in the unrestricted space of actions. The approach was successfully illustrated in simulation using a 10-DoF robot arm. Liu et al. (2020) proposed DMP-based trajectory generation to enable a full-body humanoid robot with 10-DoFs (for the two legs) to realize adaptive walking. Liang et al. (2021) developed an efficient approach to enable service robots with 26 DoFs with the skill of performing sign language motions.
Travers et al. (2016, 2018) proposed a framework that integrates DMP with Gaussian-shaped spatial activation windows in order to plan the motion for high DoF robotic systems (e.g.
5.6. Motion analysis and recognition
DMPs tend to fit topologically similar trajectories with similar shape parameters
The shape parameters
Motion recognition can also be used to determine whether the robot is correctly executing a task by comparing sensed data with a movement template. In this respect, André et al. (2016) used an associative skill memory, like the one in Pastor et al. (2011), as a predictive model of sensor traces that enables early failure detection. In this work, DMPs were used to compactly encode the associative skill memory and speed up the failure detection. Described approaches demonstrate that DMPs are a valuable option for gesture recognition especially for systems with limited computational power. Liang et al. (2021) presented a solution to the motion retargeting problem for generating dual-arm sign language motions. Their approach involves an offline-constrained optimization technique that minimizes the deviation from trajectories generated by DMPs, which encodes the human demonstrations. It should be noted that their approach was exclusively applied and tested in an offline setting.
Humans tend to perform the same task in slightly different manners. Sometimes, differences in the execution style contain useful information to adapt the motion to different executive contexts. This is the case, for instance, of a reaching motion with and without an obstacle on the way. To capture the execution style, Matsubara et al. (2010) augmented the forcing term of the DMP with a style parameter learned from multiple demonstrations. At run time, different style parameters can be used to smoothly interpolate between demonstrated behaviors. Zhao et al. (2014) not only employed movements with different styles but also learned a smooth mapping between style parameters and goal to improve the generalization.
When humans provide seamless demonstrations, DMPs can be used for online segmentation and recognition. To this end, Meier et al. (2011) assumed that a library of DMPs is given and used it to recognize motion segments during a task demonstration. Instead of using exemplar templates for each class of primitives, Chang and Kulić (2013) segmented a video stream using motion to non-motion transitions, fitted DMPs on segmented data, and performed clustering to group similar motion segments in an unsupervised fashion. Song et al. (2020) performed unsupervised trajectory segmentation using the concept of key points,
DMPs have been developed as a computational model of the neurobiological motor primitives (Schaal et al., 2007). Experimental findings from neurophysiology related to the spinal force fields in frogs have inspired the modification of DMP formulation in Hoffmann et al. (2009). As discussed in Section 3.1.1, this multidimensional representation overcomes limitations of classical DMPs like trajectory overshooting and dependence of the trajectory from the reference frame used to describe the motion. Hoffmann et al. (2009) also derived a collision avoidance strategy for DMPs, inspired by the way humans avoid collisions during arm motion. DeWolf et al. (2016) investigated the human ability to cope with changes in the arm dynamics and kinematic structure during motion control. They proposed a spiking neuron model of the motor control system that uses DMPs to implement the preparation and planning functionalities of the premotor cortex. The effects of changes in the robot’s dynamic parameters on the tracking performance of a DMP trajectory were studied in Kuppuswamy and Alessandro (2011). Their findings suggest that the change in the body parameters should be explicitly considered in the DMP learning process. Hotson et al. (2016) augmented a brain–machine interface that captures neural signals with a DMP model of the endpoint trajectories executed by a non-human primate. The system was used to decode real trajectories from a primate manipulating four different objects.
5.7. Autonomous driving and field robotics
DMPs can be utilized in various autonomous non-stationary fields of robotics. Perk and Slotine (2006) utilized DMPs for defining flight paths and obstacle avoidance for Unmanned Aerial Vehicles (UAVs), where the trajectories were generated based on the joystick movements controlling the throttle of the UAV motors. Later, Fang et al. (2014) extended the approach to encode user-demonstrated UAV data, extracting and encoding the rhythmic and linear segments of the flight trajectory, and combining them into a flight control skill. Furthermore, Tomić et al. (2014) formulated the UAV movements as an optimal control problem. The output of the optimal control solver was encoded with DMPs, enabling them to generalize and apply in-flight modifications to the UAV flight trajectories in real-time. Similarly, Lee et al. (2018) and Kim et al. (2018a) presented a framework for UAV cooperative areal manipulation tasks, based on an adaptive controller which adapts the movement of the UAV in relation to the mass and inertial properties of the payload. In addition, DMPs were incorporated in the control scheme to modify the flight trajectories and avoid obstacles on the fly. The approach was later extended to incorporate path optimization, where DMPs play a significant role in real-time obstacle avoidance (Lee et al., 2020).
As mentioned before, DMPs represent a versatile movement representation, which can be implemented in various tasks and scenarios. One of the recent applications in this field is also Autonomous Underwater Vehicles (AUVs). Carrera et al. (2015) integrated the DMPs in a learning by demonstration scenario for an AUV. The demonstrated data consisted of the manipulator and vehicle sensory outputs, which were efficiently used to demonstrate an underwater valve-turning task.
DMPs are also represented in the autonomous driving domain. In the recent work of Wang et al. (2018, 2019), the authors propose a framework which decomposes the complex driving data into a more elementary composition of driving skills represented as motion primitives. In the proposed framework, DMPs are utilized to represent the driver’s trajectory with acceptable accuracy and can be generalized to different situations.
6. Discussion
A summary of DMP features and limitations that have been solved (✓) or partially solved (❙).
6.1. Guidelines for different applications
Previous sections present different DMP formulations and extensions together with possible application scenarios. As usual, there is not a single formulation that serves all the scopes and purposes, and the suitable approach to use depends on the goal to achieve and the conditions of application. For this reason, we present some guidelines to guide the user in the process of selecting the formulation to use.
6.1.1. Discrete versus periodic
For a task with distinct starting and ending points, discrete DMPs are a logical option to encode the movement trajectories between them. Examples of these tasks include reaching and pick-and-place (Caccavale et al., 2019; Deniša et al., 2016a; Forte et al., 2012; Stulp et al., 2009), specific actions of assembly (Krüger et al., 2014; Abu-Dakka et al., 2014; Nemec et al., 2020; Angelov et al., 2020), and cutting (Straizys et al., 2020; Yang et al., 2018).
When the starting and ending points coincide, periodic DMPs are the logical option, since the encoded movements can be repeated over and over again. Good examples of their application are repetitive tasks such as locomotion (Nakanishi et al., 2004; Rückert and d’Avella, 2013; M. Wensing and Slotine, 2017), human body augmentation/rehabilitation (Peternel et al., 2016), wiping a surface (Gams et al., 2016; Kramberger et al., 2018; Peternel et al., 2017a), and sawing (Peternel et al., 2018b). Nevertheless, even typically non-repetitive tasks that are executed just once every now and then can still be encoded with periodic DMPs when the starting and ending points coincide (Peternel et al., 2018a).
There are cases where it is not possible to clearly distinguish if the motion is periodic or discrete. For instance, Ernesti et al. (2012) have shown that the first step in a gait of a humanoid robot is a transient toward a periodic motion. Their representation is a good candidate to encode transients converging to limit cycle trajectories. Finally, in some cases like in complex assembly, the task requires a combination of discrete and periodic DMPs (Sloth et al., 2020).
6.1.2. Space representation
The original formulations of DMPs were and are still successfully applied to multidimensional independent data with each DoF
In many early works, orientation trajectories were learned and adapted without considering their geometry constraints (Pastor et al., 2009), leading to improper orientation and hence requiring an additional re-normalization. In a different example, Umlauft et al. (2017) used eigendecomposition for impedance adaptation.
In order to comply with such geometry constraints, researchers provided a new formulation of DMPs that ensures proper unit quaternions or rotation matrices over the course of orientation adaptation (Abu-Dakka et al., 2015a; Koutras and Doulgeri, 2020a; Saveriano et al., 2019; Ude et al., 2014 and proper SPD matrices over the course of the adaptation of SPD profiles (e.g.
6.1.3. Weights learning method
DMPs represent motion trajectories as stable dynamical systems with learnable weights that define the shape of the motion. In the LfD paradigm, DMP weights are usually learned in a supervised manner using human demonstrations. The procedure used to transform human demonstrations into training data for the DMP forcing term is highlighted in Section 2.1.1.1. The number of weights, which corresponds to the number of RBFs used to approximate the forcing term, is a hyper-parameter that is typically provided by the user. As practical tuning guidelines, one has to consider that the number of RBFs increases with the length of the trajectory and with its complexity, which depends on changes in concavity and frequency and magnitude of picks. Given the training data, and the number of RBFs, different techniques can be used to fit the weights.
LWR is widely used when the forcing term is a combination of RBFs as in (4). However, in the literature one can use RBF-NN as in Si et al. (2021) or if multiple demonstrations are given, one can exploit GMM/GMR as in Li et al. (2021b) and Pervez et al. (2017a) or GPR as in Fanger et al. (2016) to represent the forcing term and use expectation–maximization to fit the (hyper-)parameters. Deep NNs, typically trained via back-propagation, seem an appealing possibility to map input images into forcing terms (Pervez et al., 2017b), mimicking the human perception-action loop. Although appealing, the possibility of exploiting deep learning techniques as motion primitives requires further investigations.
In real applications, there can be a misplacement between the DMP trajectory and the robot motion. Typical examples include assembly or other tasks that require physical interaction with the environment (see Section 5.1). In this situation, the DMP motion can be incrementally adjusted to improve the robot’s performance. ILC arises as an interesting approach to iteratively update the DMP weights as it ensures a rapid convergence to the desired performance (Abu-Dakka et al., 2015a; Gams et al., 2014; Kramberger et al., 2018). However, ILC assumes that a target behavior to reproduce is given. When the target behavior cannot be easily specified and the robot performance is not satisfactory, RL solutions have to be adopted. As detailed in Section 4.3, DMPs are effective control policies and, combined with policy search algorithms like PI2 or PoWER, are able to solve complex and highly dynamic tasks.
6.1.4. Online adaptation
Performing robotic tasks in the real world requires adaptation capabilities. When adaptation of DMPs based on some feedback is required, one of the extension methods should be applied. For example, to change the existing movement based a detected obstacle, the method in Gams et al. (2016), Hoffmann et al. (2009), Park et al. (2008), and Tan et al. (2011) can be used (see Section 3.3.1). If it is necessary to adaptively learn the movement dynamics based on real-time effort feedback, the method in Peternel et al. (2016) can be employed (see Section 3.3).
Furthermore, for industrial tasks, such as assembly or polishing, adaptation strategies combining force control with demonstrated trajectories can be applied (Abu-Dakka et al., 2015a; Gams et al., 2010; Kramberger et al., 2016), ensuring the system will follow the predefined trajectory and adapt to the environmental uncertainties. For online adaptation DMPs can be used as a trajectory generator, which output represents an input to the force control algorithm, on the other hand, force feedback can directly be incorporated as a coupling term in the DMP formulation (see Section 3.3.2), eliminating the need for an additional force controller. A similar approach can also be utilized for velocity-based adaptation of the movements (see Section 3.3.4).
6.1.5. Impedance versus force
In physical interaction tasks, DMPs can be used to either learn force or impedance (Peternel et al., 2017a). If the task requires position control, then the impedance should be learned with DMPs in combination with the reference position. If the task requires to control a specific force,
Furthermore, to overcome any undesirable movements, the control policy can be augmented with a tank-based passivity approach (Shahriari et al., 2017). This approach monitors the energy flow between the modeled sub-systems,
6.2. Resources and codes
The availability of code and datasets is useful to speed up the setup of novel applications without the need of re-implementing a promising approach from scratch. We have searched for available DMP implementation and found out that several researchers published their DMP codes in various open-source repositories. We decided to list the available implementations on the Git repository that accompanies this paper (https://gitlab.com/dmp-codes-collection/third-party-dmp). For each implementation, we mention the type of DMP, the author, the URL to download the code, and the used programming language. We also provide a short description of the key features.
Apart from listing existing approaches, the Git repository that accompanies this paper contains an implementation that we decided to release to the community. The list of provided implementations is given in Table 4.
6.3. Limitations and open issues
As any motion primitive representation, DMPs have strengths but also inherent limitations. The advantages of the DMPs have been widely discussed in previous sections. Here, we present the main limitations of the DMPs and discuss open issues that require further investigation. A summary of these limitations is presented in Table 5.
6.3.1. Implicit time dependency
The phase variable used to suppress the nonlinear forcing term and ensure convergence to a given goal introduces an implicit time dependency in the DMP formulation. The reason for representing the time dependency implicitly as a dynamical system is that such a phase variable can be conveniently manipulated. For example, in Section 2.1.1.2, we have seen how to manipulate the phase variable to slow down (or even stop) the execution. A drawback of the time dependency is that the shape of the DMP motion is significantly affected by the time evolution of the phase variable. If the phase vanishes too early, the last part of the trajectory is executed with a linear dynamics converging to the goal. If the phase lasts too long, the trajectory may overshoot and fail to reach the goal within the desired time. In both cases, the DMP motion may significantly deviate from the demonstration. A properly designed phase-stopping mechanism can remedy the issue, but the proper phase-stopping to adopt depends on the specific application.
In order to overcome this limitation, several authors focused on learning stable and time-independent (or autonomous) dynamical systems from demonstrations. A globally stable and autonomous system generates a vector field that converges to the given goal from any initial state. Without the need for a phase variable, the generated motion depends only on the current state of the system. Notable approaches to learn stable and autonomous systems exploit Lyapunov theory (Khansari-Zadeh and Billard, 2011, 2014), contraction theory (Blocher et al., 2017; Ravichandar and Dani, 2015), diffeomorphic transformations (Perrin and Schlehuber-Caissier, 2016; Neumann and Steil, 2015), and passivity considerations (Kronander and Billard, 2015). These approaches have been effectively used to learn complex movements from demonstrations.
In general, autonomous systems have the potential to represent much more complex movements than DMPs. For example, autonomous systems can encode different motions in different regions of the state-space. In this respect, DMPs can only generate a stereotypical trajectory connecting the start to the goal, regardless of where the initial state is placed in the state-space. However, the stereotypical motion generation is also an advantage of DMPs since it makes it easier to predict the generated motion in regions of the state-space poorly covered by training data. On the contrary, it is hard to predict how an autonomous system generalizes where only a few or no training data are available. DMPs are known to scale well in high-dimensional spaces since the learned forcing term always depends on a shared, scalar phase variable. Autonomous systems perform learning directly on the high-dimensional state-space, which poses numerical challenges and requires much more training data. In synthesis, each representation has its own advantages and disadvantages and the choice between time-dependent and autonomous motion primitives depends on the specific application.
6.3.2. Stochastic information
Representing the demonstrated motion as a probability distribution has several advantages. For example, in a probabilistic framework, the generalization to new a goal (or a via-point) is achieved using conditioning on the new goal (via-point), while the covariance computed from the probability distribution can represent couplings between different DoFs (Paraschos et al., 2013). As a matter of fact, classical DMPs are deterministic and lack stochastic information on the modelled motion.
Ben Amor et al. (2014) proposed an approach to estimate the predictive distribution
The ProMP framework (Paraschos et al., 2013) proposed an alternative movement primitive representation that contains information about the variability across different demonstrations as well as different DoFs in the form of a covariance matrix. This enables to explicitly encode the couplings between different directions and to increase the generalization by conditioning on a desired goal, via-point, or intermediate velocity. The covariance computed by ProMPs represents the variability and the correlation in the demonstrations. In other representations, like GPR, the covariance is a measure of the model uncertainty due to the lack of training data. An attempt to unify ProMP and DMP formulations was made in Li et al. (2023). Kernelized Movement Primitives (KMPs) (Huang et al., 2019; Silvério et al., 2019) offer the possibility of modelling variability, correlation, and uncertainty in the same framework. However, KMP’s computational cost can be elevated compared to DMP in longer trajectories due to the computation of the inverse of the kernel matrix.
6.3.3. Closed-loop implementation and issues
A vast majority of methods employ DMPs only as a reference trajectory generator for the closed-loop controller, which then actually executes it. However, the DMPs can also be used as a part of the closed-loop controller itself where the sensor measurements, for example, forces and torques, are used as a coupling term in the DMP for changing its behavior. In other words, in the open-loop case, the DMP serves as the plan and does not change online during the execution (perhaps iteratively after each execution), while in the closed-loop case, DMP serves as the action generator and changes online during the execution. Until now, only a few methods explored the closed-loop concept. For example, in Peternel et al. (2016), the DMPs are directly torque generators for exoskeleton actuators in the control loop, which is closed by feedback from the human user’s muscle activity. Nevertheless, in such scenarios, the closed-loop stability and passivity become crucial considerations that have to be addressed and resolved before the wide-spread application (Kramberger et al., 2018).
6.3.4. Coping with high-dimensional inputs
One of the main limitations of DMP is that it encodes human and robot trajectories explicitly with the time (i.e.
As the DMP models trajectories using basis functions, this works effectively when learning time-driven trajectories (i.e.
Alternative approaches in the literature, such as GMM/GMR (Calinon, 2016), Task-Parameterized GMM (TP-GMM) (Calinon, 2016), and KMP (Huang et al., 2019, 2021), can be directly applied for learning demonstrations comprising high-dimensional inputs.
6.3.5. Multi-attractor systems
The well-known second-order dynamic properties of the DMPs strive toward a single attractor system (Ijspeert et al., 2002a). The properties,
On the other hand, Iturrate et al. (2019) introduced an alternative formulation with two stable attractor systems. The first attractor is defined at the starting point
7. Concluding remarks
Since their introduction in the early 2000s, DMPs have established as one of the most used and popular approaches for motor command generator systems in robotics. Several authors have exploited and extended the classical formulation to overcome some limitations and fulfill different requirements. Their research resulted in a large number of papers published over the last two decades.
One of the aims of this paper is to categorize and review the vast literature on DMPs. We took a systematic review approach and automatically searched for DMP-related papers in a popular database. A manual inspection of the resulting papers, guided by clear and unbiased criteria, led to the papers included in this tutorial survey.
Another aim of our work is to provide a tutorial on DMPs that presents the classical formulation and the key extensions in rigorous mathematical terms. We made an effort to unify the notation among different approaches in order to make them easier to understand. Moreover, we provide useful guidelines that guide the reader to select the right approach for a given application. In the tutorial vein, we have also searched for open-source implementations of the described approaches and released to the community several implementations of DMP-based approaches.
Advantages of DMPs have been discussed as well as their limitations and the open issues. We have summarized them in Table 5 where we also indicate the solved issues and the ones that require further investigation. In this respect, as research on DMP is still very active, we provide a comprehensive discussion that will help the readers to understand what has been done in the field and where they can put their research focus.
