Abstract
1. Introduction
Trajectory generation is a fundamental issue in the field of robotics, the object of which is to output a smooth and safe path that a moving robot can follow through space. The generated trajectory can be described as a function of pose or velocity with respect to time. Traditional methods generally adopt some predefined functions to describe the trajectory, then determine the parameters of the function according to optimization objectives and constraints [1, 2, 3]. This category of method has been widely and successfully applied in various robotic systems. However, an artificially predefined function with a fixed form can hardly be used to accurately describe complex trajectories. It is also difficult to modify these functions to produce similar trajectories to adapt to different dynamic situations, such as a changed destination or a moving obstacle, without replanning the whole trajectory. The most recent idea is to learn the trajectory from demonstrations, which brings challenges to the modelling of the trajectory but eases the cost of real-time replanning for new situations. Promising progress has been made in the published literature, such as GMM [4], affine functions [5] and DMPs [6].
The methods mentioned above can learn and generalize the motion styles of the demonstrated trajectories. Here, a motion style is referred to as a motion sequence which is distinct in its shape or curve tendency. However, in many situations, different styles of trajectories will be adopted for different goals. An example is depicted in Figure 1, where two motion trajectories of two different styles with the same start point but different goal points are learned. Based on the styles of the these two motions, any trajectories that start at the same point and end with a different goal are expected to share similarities in style with the learned ones and with changes that are as smooth as possible. A desired reproduced trajectory is shown in Figure 1, the style of which is different from that of the two learned trajectories but which integrates their features. The main focus of this paper is to address the problem of style-adaptive motion trajectory generation for different goals based on trajectories of various styles learned from multiple demonstrations.

A two-boundary trajectory planning example. The robot's task is to reach the goal. Trajectories in different styles could be adopted to the triangular goals. As a new circle goal is set, a new different stylistic trajectory (dashed) should be generalized to reach the new goal.
In this paper, we propose a style-adaptive trajectory generation method based on DMPs, which is called ‘Style Adaptive Dynamics Movement Primitives’ (SADMPs). In SADMPs, demonstrated trajectories are first modelled and clustered to a series of style-different principal trajectories using PDM [8]. Next, each principal trajectory is trained using the DMPs to get its weight parameters. Finally, an adaptive goal-to-style mechanism merges the weight parameters of different styles to obtain new weight parameters according to the changes in goal position.
The main contributions of this paper are:
A framework for learning and generating style-adaptive trajectories is proposed, which supports the generation of new trajectories based on multiple learned motion examples and which needs fewer demonstrations.
An algorithm for fusing and adapting different motion styles is proposed, which consists of the Least Mean Squares (LMS) method and a goal-to-style mechanism. This algorithm blends different motion styles to generate a new trajectory with a smooth transition.
SADMPs has been successfully applied to our robot in the Small Size League (SSL) for an adaptive shooting task, as well as to a humanoid robot arm to generate motions for table tennis playing with different styles.
The remainder of this paper is organized as follows. Section 2 introduces the related work about learning trajectories from demonstrations. In Section 3, we describe the discrete formalism of the original DMPs. Next, we propose a framework for the SADMPs and discuss the learning procedure for the SADMPs in Section 4. Section 5 demonstrates the effectiveness of the proposed method for the SSL robot and for a striking ball task in the humanoid robot. Section 6 concludes the paper.
2. Related Work
Billard proposed to model the non-linear dynamical point-to-point robot motions as a time-independent system using an iterative algorithm to estimate the form of the dynamical system through a mixture of Gaussian distributions [4]. Although this method is suitable for processing large amounts of motion data due to the parametric model, it generates discontinuities in the trajectories. GMM has been used in different robotics applications, such as gesture imitation [9] and handwriting [10].
Pham et al. proposed a method using affine trajectory deformation for motion imitation [5]. This method is based on the affine invariance of human motion [11], and makes no use of an exogenous basis function. This method also allows the deformed motions to preserve the similar motion styles in relation to the observed one. The style of the generated motions can only be altered by the modification of the observed trajectories.
Another pioneering work introduced by Ijspeert et al. [6] considered a robot's movement to be a linear spring system coupled with an external forcing term, which can be described by a set of differential equations. In order to obtain the weight parameters using the DMPs formulations, Ijspeert et al. applied Locally Weighted Learning (LWL). Schaal introduced an advanced version algorithm named ‘Receptive Field Weighed Regression’ (RFWR) to solve the problem [12] in which the centre and width of the kernel functions were adjusted automatically. The RFWR could also select the appropriate number of kernel functions to fit the training trajectory. The effectiveness of this method has been validated through various applications, such as walking [13], flight control [14] and hitting and batting [15].
The DMPs employed in [6] used a very basic model to learn a two-boundary trajectory without considering obstacles and kinematic restrictions. Motivated by the requirement of high-order continuity, studies[15, 16] were also conducted to develop extensions and modifications to DMPs in planning with a non-zero terminal velocity. It is also easy to extend the DMPs formula with a dynamic potential field item to achieve obstacle avoidance [17, 18]. Furthermore, a method for the comparison and clustering of robotic trajectories with human motion data is proposed in [19], which is applied to achieving motion by learning from a large set of demonstrations.
The approaches proposed in [4, 6] provide good results for style-fixed tasks – in other words, the demonstrated style is always suitable for the new goal. However, in real environments, the motion styles have functional roles to solve specific tasks, such as hitting balls or avoiding obstacles. Matsubara et al. proposed an approach called ‘Stylistic Dynamic Movement Primitives’ (SDMPs) [7] which controls the motion style by manipulating a style parameter added to the original DMPs. However, the mapping between the style parameter and the perceptual feedback of the motion goal is set by hand in their work, and is neither automatic nor smooth.
3. Dynamic Movement Primitives
We firstly briefly introduce the original DMPs using the same description as in [18]. DMPs explain a movement using the following set of differential equations for a one degree-of-freedom (DoF) motor system:
where
Equations (1) and (2) are referred to as the
where
where
4. Style-adaptive Dynamic Movement Primitives
4.1. System Architecture
Figure 2 shows the architecture of the SADMPs. We first collect and analyse different motions demonstrated by a human. The captured motions are modelled and clustered using PDM [8] in order to get the principal trajectory of each cluster, which is the average of the trajectories in a cluster [19]. Next, we train these principal trajectories using the DMPs trainer separately to get their weight parameters. Finally, we use the adapter to combine the training results with the new goal to reproduce new motions. Here, a new desired goal

The architecture of the proposed SADMPs model. This method comprises two parts: one is for learning and the other is for generalization. The dark green-shaded components constitute an adaptive goal-to-style mechanism. Two points should be noted for the extension: 1) the architecture is for one-DoF –
PDM is a useful tool for analysing trajectories. In order to analyse the trajectories using PDM, an adequate number of movements must be collected from demonstrations. The recorded trajectories will be cropped so that they start at the same point and then re-sampled using cubic spline interpolation so that they contain the same number of points. The detailed procedure of PDM is depicted in [8]. Figure 3 shows an example of the data processing procedure using the PDM.

The data processing using the PDM. In (a), two goals are set and 10 reaching motions are captured for each of the two goals. Next, the first two components are employed to present all the trajectories and perform a
4.2. Algorithm for Learning
The learning of
4.2.1. Training the Principal Trajectory by LMS
In the first step, and distinct from the training method used by [6, 12], an LMS method is used to train the parameters
where
Usually, a principal trajectory has more than one dimension and the weight parameters of each dimension should be learned independently and in parallel. Thus, in our training process, we consider two rules. First, for different principal trajectories in the same dimension, they must have the same kernel functions for training, which means that the kernel functions are the same in shape, number and distribution. Second, for one principal trajectory in different dimensions, different kernel functions could be chosen for training. For example, there are two principal trajectories in Figure 4, and each of them has two dimensions, respectively, in

Training using LMS: these two demonstrated movements are approximated using 15 kernel functions for the
4.2.2. Adapting to New Goal
In the next step, an adapter is responsible for the adaptive goal-to-style mechanism. In the original DMPs formulations, the goal position
The goal of the
where
Since
Then, the new weight parameters
It is clear that the
When a new goal is set, the non-linear term of DMPs will drive the system away from its initial state by resetting the parameters, which is similar to the original DMPs mentioned in Section 3. The difference with SADMPs is that the weight parameters are also changed to adapt to a new goal. Figure 5 shows the adaptively learned trajectories using formula (13).

Reproduced adaptive trajectories in two dimensions – two principal trajectories are used in this example. The styles of the reproduced trajectories (green) change smoothly according to the goal.
The fundamental property of the DMP formulation is that it is spatially invariant. If we obtain
5. Experiments
In this section, we test the proposed method by conducting an adaptive shooting task on an SSL platform and compare our method with the original DMPs quantitatively. Furthermore, the SADMPs are applied to a humanoid robot performing a table tennis task.
5.1. Shooting Ball Task in SSL
5.1.1. Testing Platform and Training
The Small Size League (SSL) is the fastest and most intense game among RoboCup's soccer competitions. The basic rules of SSL are based on the rules of a FIFA soccer game, but each team consists of only six robots playing on field that is 6.05 m long by 4.05 m wide. There are two cameras mounted over the field to capture images at 60 Hz for further processing in a shared vision system named ‘SSL-Vision’ [20]. This system recognizes and locates the position and orientation of the robots and the position of the ball, as shown in Figure 6(a), and then broadcasts the information package to each team via a network. The team that scores the most goals wins the game.

The robot and scene for motion-capture
The robot we used is an omnidirectional vehicle with four wheels and which can shoot the ball using an electromagnet-controlled device installed in the front, as shown in Figure 6(c). The motion of the robot can be decoupled kinematically into three DoFs: a
In our experiment, we perform a shooting task to test the method. In this task, the robot should accomplish two basic subtasks: one is to reach the correct goal pose without touching the ball; the other is to kick the ball to the shooting target 2 . As the ball may appear anywhere on the field, different stylistic trajectories will be adopted according to the position of the ball. As such, we hope that the robot moves to the ball in a suitable motion style according to the ball's location.
In order to accomplish the shooting task, we set a 9 × 6 grid on the quarter field, such that a golf ball could be placed at 53 points except for start point, as shown in Figure 7(a). The initial pose of the robot was at

The scene and demonstrated trajectories for the shooting ball task. (a) A quarter field in the SSL is used for this experiment, and the shooting task was demonstrated 10 times for every predefined pose. (b) The recorded data from multiple human demonstrations.
The experiment was carried out as follows: first, a human demonstrator moved the robot to perform a shooting ball task at the goal pose by hand, see Figure 6(b), and 10 demonstrated trajectories for each shooting pose were collected, see Figure 7(b). Next, we employed the training algorithm mentioned in Section 4.2 to train different stylistic trajectories. Note that the different trajectories share the same kernel functions and canonical system for each DoF. Finally, the adapter merged the learned
5.1.2. Results and Comparison
We compare the performance of the proposed approach and that of the original DMPs method. Because the original DMPs do not take multiple stylistic trajectories into account, we choose the weight parameters of the training pose closest to the new goal for generalization.
In the shooting ball task, the robot should not only reach the goal pose without touching the ball but also kick the ball to the shooting target. We repeated the shooting task 10 times in each goal pose and counted the score, as shown in Figure 8.

Comparison of DMPs and SADMPs in trajectory generalization and scores for every shooting point. (a) is the result of the SADMPs method, while (b) is the result of the DMPs method. For the representation of the score, a dot is drawn at the end of the trajectory, whereby the colour of the dot denotes the score (0-blue, 10-red).
Obviously, both methods can reproduce motions which generalize to the new goal accurately. However, some trajectories reproduced by the DMPs knock the ball away, such as the motion to
Figure 9 shows the locations of the 20 demonstrated trajectories in the space formed by the first two components of the PDM, which contribute 94.15

Comparison of the difference between the SADMPs method and the DMPs method in the PCA space
5.2. Striking Ball Task in Table Tennis
The proposed method is also used on a humanoid robot to play table tennis. We set up a table tennis playing scenario, as shown in Figure 10(a). The humanoid robot that we designed is 165

Marker placement for the optical motion capture system and the humanoid robot platform. The motion of the marker in the red circle is recorded and trained in Euclidean space.
Joint parameters for seven-DOF manipulation
In order to learn from the human, we use an OptiTrack S250e motion capture system to record the human motion trajectories which consists of the 3D positioning of optical markers attached on human actors, acquired over 8
In such a hitting ball task, the forehand and backhand are considered to be two totally different hitting styles, as shown in Figure 11. From the end-effector's (paddle) point of view, different sides are used, as determined by the motion of the wrist joints. Considering the end-effector's trajectory, its motion style will affect the motion planning of other joints. As such, in practice the paddle style (forehand or backhand) is determined according to the ball's speed and impact point, ensuring the accuracy of striking each round. Here, our proposed method is only used to decide the trajectory of the marker in the red circle (Figure 10(b)) in a 3D Euclidean space. In the application, the demonstrated striking position of the forehand trajectory is

Hitting ball task for a humanoid robot playing table tennis
In our experiment, the robot successfully hit the ball at different positions near the demonstrated positions if kinematically feasible. Figure 12 is the result of reproducing a style-fused trajectory in each dimension – three new reproduced trajectories are shown in the figure, the striking positions of which are

The result of reproducing a style-fused trajectory in each dimension

Generalizing to new goals in application to table tennis
6. Conclusion
In this paper, we proposed a style-adaptive trajectory generation method based on DMPs called SADMPs for learning multiple trajectories from a human's demonstrations. This method needs fewer demonstrations and leads to a smooth transition between different motion styles. In SADMPs, we first use the Point Distribution Model to cluster our demonstrated movements in a PCA space and then calculate the principal trajectory of every cluster. Next, we train the principal trajectories independently to obtain the weight parameters based on Dynamic Movement Primitives. Finally, we proposed a goal-to-style mechanism for the adaptive changes between different motion styles. We evaluate this novel approach in a SSL robot and a humanoid robot. In these cases, our novel motor primitives generalize smooth and adaptive trajectories according to the goals.
Our future work will focus on extending the method to more complicated applications of a humanoid robot and multiple-agent motion planning in a competitive and dynamic environment, such as a SSL game.
