Sage Journals: Discover world-class research

Abstract

This article focuses on the implementation of an approximate dynamic programming algorithm in the discrete tracking control system of the three-degrees of freedom Scorbot-ER 4pc robotic manipulator. The controlled system is included in an articulated robots group which uses rotary joints to access their work space. The main part of the control system is a dual heuristic dynamic programming algorithm that consists of two structures designed in the form of neural networks: an actor and a critic. The actor generates the suboptimal control law while the critic approximates the difference of the value function from Bellman's equation with respect to the state. The residual elements of the control system are the PD controller, the supervisory term and an additional control signal. The structure of the supervisory term derives from the stability analysis performed using the Lyapunov stability theorem. The control system works online, the neural networks' weights-adaptation procedure is performed in every iteration step, and the neural networks' preliminary learning process is not required. The performance of the control system was verified by a series of computer simulations and experiments performed using the Scorbot-ER 4pc robotic manipulator.

Keywords

Approximate Dynamic Programming Dual Heuristic Dynamic Programming Neural Network Robotic Manipulator Tracking Control

1. Introduction

Robotic manipulators are widely implemented in modern robotized manufacturing processes, mainly in welding, painting and assembly. The realization of the desired trajectories by the manipulator's end-effector with the required accuracy is essential in these types of tasks. The desired movement of the robotic manipulator's kinematic chain is obtained by generating the tracking control signals for the manipulator's actuators. It is a demanding task, which is widely discussed in the literature, where different control strategies are presented [1, 2, 3, 4, 5]. Problems met in the realization of the desired trajectory by robotic manipulators result from the fact that these control objects are described using nonlinear dynamic equations, where some parameters of the model can be unknown or can change during the movement, for the sake of the disturbances or manipulation of different objects. As a result, the tracking control of the robotic manipulator requires the application of effective computational control algorithms which ensure the required tracking quality and which can adjust their parameters during the realization of the desired trajectory or else are insensitive to the changes of the object's parameters. In recent decades, robust control strategies [1, 6, 7, 8] have been successfully applied to solve the tracking control problem for robotic manipulators, ensuring the correct realization of a movement despite parametric uncertainties and external disturbances. In these algorithms, parametric uncertainties are taken into account in an overt manner. It is assumed that each object parameter assumes the value from a specified range, and within this range of parameters the variability of control is performed correctly. Moreover, the switching control element is inserted into the control algorithm. The main disadvantage of this type of algorithm is the use of excessive values of parameter limits, as a result of the large uncertainty connected with the lack of full knowledge of the object's model. The algorithms, which are able to adapt their parameters in the case of parametric disturbances, are devoid of these drawbacks. Adaptive control algorithms [1, 6, 8] are examples of such solutions. However, their synthesis requires knowledge of the controlled object's mathematical model structure. These inconveniences result in a well-grounded need for the application of modern computational algorithms, like artificial intelligence (AI) methods, in the control systems of robots. One of these groups of algorithms comprises neural networks (NNs) [2, 4], which are widely applied in control tasks because of the possibility of adaptation of their weights, often lack of necessity of knowledge of the controlled object's mathematical model, and the possibility of compensating for the object's nonlinearities not included in the mathematical model. The rapid development of AI methods make the implementation of Bellman's dynamic programming (DP) concept in the form of methods called approximate dynamic programming (ADP) algorithms [9, 10, 11, 12] possible. In such algorithms, NNs are used to generate the sub-optimal control law in online processes, which makes it implementable in the real-time control problems of robotic manipulators. In the article, the implementation of the ADP algorithm in the dual-heuristic dynamic programming (DHP) configuration [9, 11, 12] in the tracking control problem of the robotic manipulator is presented. The presented tracking control algorithm guarantees high tracking performance in comparison with a neural tracking control system with multi-layer NNs [13] (MNNs) or a PD controller, as well as guaranteeing the stable realization of the desired trajectory in the face of disturbances. The DHP algorithm is composed of two structures - an actor and a critic - both realized in the form of a random vector functional link (RVFL) NNs [2]. Solutions for the control problem of robotic manipulators using ADP algorithms presented in the literature often involve theoretical considerations [13, 14]. There are not many applications of ADP algorithms in real control problems, which suggests that literature is concerned instead with the control problem of dynamic objects like mobile robots [15, 16, 17], turbo-generators [18], underactuated systems like an inverted pendulum [19], and other applications of ADP algorithms (e.g., target recognition [20] and a static compensator connected to a power system [21]). The performance of the presented discrete tracking control system of the Scorbot-ER 4pc robotic manipulator was verified by a series of computer simulations and experiments. The results of the research presented in this article continue the authors' earlier works related to the problems of the control of nonlinear dynamic systems like a ball and beam system [22], a wheeled mobile robot [17, 23, 24] and a robotic manipulator [13, 25, 26] using NNs and ADP algorithms. In [13], the authors discussed the application of MNNs in the tracking control problem of a robotic manipulator. The results of this work comprise a basis to compare the tracking performance of the neural control system with MNNs and the control system with the actor-critic structure. The article [25] presents a preliminary approach to the application of a DHP algorithm in a tracking control problem of a robotic manipulator, where only theoretical studies have been performed. Another application of NNs in the control of a robotic manipulator is presented in [26], where the problem of hybrid position/force control is discussed. In this issue, the task of tracking control is combined with a task of exerting the required force at the point of tool contact with a machined object. This problem is often met in robotized machining (e.g., drilling, grinding). This article presents a real application of the DHP algorithm to the tracking control problem of the robotic manipulator, in which the experiment was performed using the Scorbot-ER 4pc robotic manipulator; moreover, the obtained results are compared with the results of the MNNs tracking control system. The article is organized as follows: Section 1 is a short introduction to the tracking control problems of robotic manipulators. The robotic manipulator's dynamics model is given in Section 2. The ADP algorithm family and the DHP algorithm implemented in the control system are presented in Section 3. In Section 4 the proposed discrete tracking control system is presented, and the following section gives the stability analysis performed using the Lyapunov function. In Section 6, the performance of the presented tracking control system is demonstrated through a numerical test and an experiment performed using the Scorbot-ER 4pc robotic manipulator. The last section gives our conclusions.

2. Dynamics of the Robotic Manipulator

The control object is the Scorbot-ER 4pc vertical articulated manipulator, shown in Figure 1.a). Its dynamics is described using nonlinear equations. It is composed of three links coupled with rotary joints (three-degrees of freedom - three-DOF), and its schematic structure is shown in Figure 1.b). The wrist of the manipulator with the gripper attached provides residual three-DOF, but in the presented research they are fixed.

Figure 1.

a) The Scorbot-ER 4pc robotic manipulator, b) scheme of the manipulator

The point C is the end point of the manipulator's end-effector, $q_{[1]}$ , $q_{[2]}$ , $q_{[3]}$ are joint angles, $| O O^{'} | = l_{1}$ , $| O^{'} A | = l_{e}$ , $| A B | = l_{2}$ and $| B C | = l_{3}$ are the dimensions that result from the Scorbot-ER 4pc robotic manipulator's geometry, $u_{[1]}$ , $u_{[2]}$ and $u_{[3]}$ are control signals, x, y, z is a global co-ordinate system, x_j, y_j, z_j are local coordinate systems connected with links, $j = 1,2,3$ , O, A and B are points in the joints through which pass the appropriate pivots, and S₂ and S₃ are the centres of mass of links 2 and 3. The dynamical model of the three-DOF robotic manipulator was derived using the second-order Lagrange equations [13], and assumed in the form

$M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + F (\dot{q}) + G (q) + τ_{d} (t) = u,$ (1)

where $q = q (t)$ is the generalized coordinates vector composed of the values of the joint angles $q = {[q_{[1]}, q_{[2]}, q_{[3]}]}^{T}$ , the vector $q$ and all its derivatives are time dependant, as well as all vectors and matrices, which are functions of $q$ , $\dot{q}$ and $\ddot{q}$ , $M (q) \in ℜ^{3 x 3}$ is the inertia matrix, $C (q, \dot{q}) \dot{q} \in ℜ^{3 x 1}$ is the vector of momentous of the centrifugal and Coriolis forces, $F (\dot{q}) \in ℜ^{3 x 1}$ is the friction vector, $G (q) \in ℜ^{3 x 1}$ is the gravity vector, $τ_{d} (t) \in ℜ^{3 x 1}$ is the vector of the bounded disturbances, and $u$ is the control vector, $u \in ℜ^{3 x 1}$ .

The tracking errors of the joint angles $q$ and errors of the joint angular velocities $\dot{q}$ were assumed as

$\begin{array}{l} e = q - q_{d}, \\ \dot{e} = \dot{q} - {\dot{q}}_{d}, \end{array}$ (2)

where the desired trajectory ( $q_{d}$ , ${\dot{q}}_{d}$ ) was generated earlier. The problem of generating the trajectories of robotic manipulators is a complex task, where a lot of issues occur. One of them is the singularity avoidance problem, which is a complex scientific issue not considered in the presented article. The path of the manipulator's end-effector was generated offline and selected in such a way that the manipulator did not reach a singular configuration, where the Jacobian matrix has deficient rank. The failure of this assumption is a fairly open-problem in the robotics area, and it has been addressed in some publications [27, 28, 29].

On the basis of variable structure systems theory [1, 6, 7, 8], which is one of the known deterministic methods in the control theory of nonlinear systems, the original problem is replaced by the considerably easier matter of stabilization on the subspace of the state space. The idea of this method is to bring the state space trajectory to a subspace of the state space and hold it on that subspace, where the control system remains unaffected by disturbances. The system's behaviour once on the surface is called the sliding mode. In mechanical systems, this subspace can be decomposed to n one-dimensional subspaces, which are a linear combination of the state vector which, from the viewpoint of the control theory, constitutes the structure of the PD controller. This is why, on the basis of (2), the filtered tracking error $s$ was assumed in the form

$s = \dot{e} + Λ e,$ (3)

where $Λ \in ℜ^{3 x 3}$ is a positive definite, fixed diagonal matrix.

After bringing the state trajectory of the control system to the subspace of the state space, the dynamics of the control system is described by n first-order differential equations, each with one eigenvalue. The solution of such an equation tends to zero when its eigenvalue is negative, which depends on the $Λ_{[j, j]}$ coefficient in the PD controller equation. The convergence to zero of the solution depends on the eigenvalue, the greater the absolute value of the eigenvalue (the $Λ_{[j, j]}$ reinforcement coefficient), the faster the solution tends to zero. The system response is analyzed on the phase plane on which the subspace of the state space is a straight line passing through the origin of the coordinates, whose slope depends on the eigenvalue of the equation describing the dynamics of the control system, $Λ_{[j, j]}$ . From a practical point of view, this gain cannot be arbitrarily large in order that the executive system is able to bring the state trajectory on the subspace of the state space. However, in practice, it is difficult to ensure that the trajectory of the real control system of the mechanical object remains exactly on the subspace of the state space due to the disturbances. Therefore, the practical requirement concerns maintaining the state trajectory in the surrounding subspace in the state space. It should remain inside the boundary layer, which allows for the existence of the so-called steady state error. Inside the boundary layer, the variable-structure control is approximated by the continuous control. The assumption of the $K_{D [j, j]}$ control gains has an effect on the size of the steady state error, as the increasing of gains minimizes the steady state error. However, the $K_{D [j, j]}$ control gain cannot be arbitrarily large for practical reasons - it can lead to the mechanical system vibration excitation.

The presented tracking control system of the robotic manipulator is discrete. A continuous model of the controlled object's dynamics (1) was discretized using Euler's method (the forward rectangular rule). The values of the robotic manipulator's movement parameters were computed (in the simulation) or measured (in the experiment) in discrete moments of time $q (t_{{k}})$ , $\dot{q} (t_{{k}})$ , where $t_{{k}} = k h$ , k is the integer, $k = 1, \dots, N$ , N is a number of samples, $h = t_{{k + 1}} - t_{{k}}$ is a time discretization parameter, and $q_{{k}}$ and $q_{{k + 1}}$ are the values of the joint angles vector in the discrete moments $t_{{k}}$ and $t_{{k + 1}}$ (for steps k and $k + 1$ ). The discrete model of the robotic manipulator's dynamics was assumed in the form

$\begin{array}{l} z_{1 {k + 1}} & = & z_{1 {k}} + h z_{2 {k}}, \\ z_{2 {k + 1}} & = & z_{2 {k}} - h {[M (z_{1 {k}})]}^{- 1} [C (z_{1 {k}}, z_{2 {k}}) z_{2 {k}} \\ & & + F (z_{2 {k}}) + G (z_{1 {k}}) + τ_{d {k}} - u_{{k}}], \end{array}$ (4)

where $z_{1 {k}} = {[z_{1 [1] {k}}, z_{1 [2] {k}}, z_{1 [3] {k}}]}^{T}$ is a vector that corresponds to the continuous vector $q$ composed of joint angles, and $z_{2 {k}} = {[z_{2 [1] {k}}, z_{2 [2] {k}}, z_{2 [3] {k}}]}^{T}$ is a vector that corresponds to the continuous vector $\dot{q}$ composed of joint angular velocities. The state vector was assumed in the form $z_{{k}} = {[z_{1 {k}}, z_{2 {k}}]}^{T}$ . On the basis of Equations (2) and (3), the discrete tracking errors of the joint angles $z_{1 {k}}$ and the errors of the joint angular velocities $z_{2 {k}}$ were assumed as

$\begin{array}{l} e_{1 {k}} = z_{1 {k}} - z_{d 1 {k}}, \\ e_{2 {k}} = z_{2 {k}} - z_{d 2 {k}}, \end{array}$ (5)

and the filtered tracking error $s_{{k}}$ was assumed as

$s_{{k}} = e_{2 {k}} + Λ e_{1 {k}},$ (6)

where the desired trajectory $z_{d {k}} = {[z_{d 1 {k}}, z_{d 2 {k}}]}^{T}$ was generated earlier.

Substituting the robotic manipulator's dynamics model (4) and the tracking errors $e_{1 {k}}$ , $e_{2 {k}}$ (5) into $s_{{k + 1}}$ calculated on the basis of (6), the filtered tracking error was defined in the form

$\begin{matrix} s_{{k + 1}} & = & z_{2 {k + 1}} - z_{d 2 {k + 1}} + Λ (z_{1 {k + 1}} - z_{d 1 {k + 1}}) = \\ & = & Y_{d} (z_{{k}}, z_{d {k}}, z_{d 3 {k}}) - Y_{f} (z_{1 {k}}, z_{2 {k}}) + \\ & & - Y_{τ} (z_{1 {k}}) + h {[M (z_{1 {k}})]}^{- 1} u_{{k}}, \end{matrix}$ (7)

where

$\begin{array}{l} Y_{d} (z_{{k}}, z_{d {k}}, z_{d 3 {k}}) = z_{2 {k}} - z_{d 2 {k + 1}} + Λ h z_{2 {k}} + \\ + Λ [z_{1 {k}} - z_{d 1 {k + 1}}] = z_{2 {k}} - (z_{d 2 {k}} + h z_{d 3 {k}}) + \\ + Λ h z_{2 {k}} + Λ [z_{1 {k}} - (z_{d 1 {k}} + h z_{d 2 {k}})] = \\ = s_{{k}} + Y_{e} (z_{2 {k}}, z_{d 2 {k}}, z_{d 3 {k}}), \\ Y_{e} (z_{2 {k}}, z_{d 2 {k}}, z_{d 3 {k}}) = h [L e_{2 {k}} - z_{d 3 {k}}], \\ Y_{f} (z_{1 {k}}, z_{2 {k}}) = h {[M (z_{1 {k}})]}^{- 1} F (z_{2 {k}}) + \\ h {[M (z_{1 {k}})]}^{- 1} [C (z_{1 {k}}, z_{2 {k}}) z_{2 {k}} + G (z_{1 {k}})], \\ Y_{τ} (z_{1 {k}}) = h {[M (z_{1 {k}})]}^{- 1} τ_{d {k}}, \end{array}$ (8)

where $z_{d 1 {k + 1}}$ is the vector of the desired discrete angles in step $k + 1$ , $z_{d 2 {k + 1}}$ is the vector of the desired discrete angular velocities in step $k + 1$ , and $z_{d 3 {k}}$ is the vector of the desired angular accelerations that derives from the expansion of the vector $z_{d 2 {k + 1}}$ using the Euler's method.

3. Approximate Dynamic Programming

ADP algorithms are also called neuro-dynamic programming (NDP) algorithms or adaptive critic designs (ACDs) [9, 10, 11, 12]. They derive from Bellman's DP, which is based on the calculation of the value function, the control law and the state of the object for every step of the process, from the last to the first. The calculation procedure is computationally expensive, especially for multi-step processes where the object can achieve many discrete states. Because of the necessity of calculating the value function and the control signal for every step backwards, DP is not applicable in the online control of the nonlinear dynamical systems. In the ADP algorithms, the application of NNs to Bellman's approach to optimal control theory makes the optimal control law and the value function approximation by the actor and the critic possible. This approach makes the online control of dynamical systems applicable. The ADP algorithms family is schematically shown in Figure 2. It consists of six algorithms, with three basic algorithms and three modified algorithms (action dependant (AD) versions of basic algorithms) which can be distinguished. The ADP algorithms differ from each other in terms of the critic's structure and the weights-adaptation rule of the actor's and the critic's NNs.

Figure 2.

The scheme of the approximate dynamic programming algorithms family

The basic structure is the heuristic dynamic programming (HDP) algorithm, in which the critic approximates the value function and the actor generates the sub-optimal control law. In the DHP algorithm, the critic approximates the difference of the value function with respect to the state of the controlled system. The actor has the same structure as in HDP. The complexity of the critic grows proportionally to the size of the state vector, because the difference of the value function with respect to the n_S -dimensional state vector is approximated by n_S critic's NNs, and the critic's weights-adaptation law is also more complex. The DHP algorithm ensures higher quality tracking control in comparison to HDP [23]. The globalized dual heuristic dynamic programming (GDHP) algorithm is built in the same way as HDP, and its characteristic feature is the critic's weights-adaptation law. It is based on the minimization of the value function and its difference with respect to the state, and can be seen as a combination of the HDP and the DHP critic's NN adaptation law. The actor structure is the same as in HDP. The rest of the ADP algorithms are AD versions of the basic algorithms, where the control law generated by the actor's NN is also the input to the critic's NN.

In the proposed discrete control system the DHP algorithm was applied, which belongs to the group of advanced ADP algorithms. The objective of the DHP algorithm is to determine the sub-optimal control law that minimizes the value function $V (s_{{k}})$ [9, 10, 11, 12] assumed in the form of the equation

$V (s_{{k}}) = \sum_{k = 0}^{N} γ^{k} L_{C} (s_{{k}}),$ (9)

where γ is a discount factor, $0 < γ \leq 1$ , N is a number of iteration steps, and $L_{C} (s_{{k}})$ is the local cost function for the k -th step, assumed in the form

$L_{C} (s_{{k}}) = \frac{1}{2} s_{{k}}^{T} R s_{{k}},$ (10)

where $R \in ℜ^{3 x 3}$ is a positive definite, fixed diagonal matrix.

The DHP algorithm is schematically shown in Figure 3.a). It consists of:

The predictive model of the robotic manipulator's closed-loop state $s_{{k + 1}}$ , assumed in the form

Figure 3.

a) Scheme of the DHP algorithm, b) scheme of the j -th actor's RVFL NN

$\begin{array}{l} s_{{k + 1}} & = & Y_{d} (z_{{k}}, z_{d {k}}, z_{d 3 {k}}) - Y_{f} (z_{1 {k}}, z_{2 {k}}) + \\ & & + h {[M (z_{1 {k}})]}^{- 1} u_{{k}}, \end{array}$ (11)

where $u_{{k}}$ is the overall tracking control signal, the structure of which derives from the stability analysis presented in the next section. In the DHP algorithm, the controlled object's dynamical model is necessary in the synthesis of both the actor's and the critic's NN's weights-adaptation law.

The critic, realized in the form of three RVFL NNs that approximate the difference of the value function (9) with respect to the state of the controlled system, expressed by the formula

$λ_{{k}} = [\begin{array}{l} \frac{\partial V (s_{{k}})}{\partial s_{[1] {k}}} \\ \frac{\partial V (s_{{k}})}{\partial s_{[2] {k}}} \\ \frac{\partial V (s_{{k}})}{\partial s_{[3] {k}}} \end{array}] .$ (12)

In a case of the three-DOF robotic manipulator control, where the state vector is $n_{S} = 3$ dimensional, the difference of the value function with respect to the state is approximated using three NNs, assumed in the form

${\hat{λ}}_{[j] {k, l}} (x_{C j {k}}, W_{C j {k, l}}) = W_{C j {k, l}}^{T} S (D_{C}^{T} x_{C j {k}}),$ (13)

where l is an index of the internal loop iteration, $x_{C j {k}} = Θ_{C} {[1, s_{[j] {k}}]}^{T}$ is the input vector of the j -th critic's NN containing the normalized value of the filtered tracking error $s_{[j] {k}}$ , $x_{C j [i] {k}} \in < - 1; 1 >$ , $j = 1, 2, 3$ , $Θ_{C} \in ℜ^{i_{C} x i_{C}}$ is the constant diagonal matrix of the scaling coefficients, $i = 1, \dots, i_{C}$ , i_C is the number of inputs to the critic's NN, $W_{C j {k, l}} \in ℜ^{n_{C} x 1}$ is the vector of the output layer weights of the j -th critic's NN, n_C is the number of neurons, $S (.) \in ℜ^{n_{C} x 1}$ is the vector of sigmoidal bipolar neuron activation functions, and $D_{C} \in ℜ^{i_{C} x n_{C}}$ is the matrix of the fixed input weights selected randomly in the critic's NNs' initialization process.

The critic's weights-adaptation procedure in the DHP algorithm is based on the minimization of the difference of the temporal difference error [9, 12], expressed by the formula

$\begin{array}{l} e_{C {k, l}} = \frac{\partial L_{C} (s_{{k}})}{\partial s_{{k}}} + {[\frac{\partial u_{{k}}}{\partial s_{{k}}}]}^{T} \frac{\partial L_{C} (s_{{k}})}{\partial u_{{k}}} + \\ + γ {[\frac{\partial s_{{k + 1}}}{\partial s_{{k}}} + {[\frac{\partial u_{{k}}}{\partial s_{{k}}}]}^{T} \frac{\partial s_{{k + 1}}}{\partial u_{{k}}}]}^{T} {\hat{λ}}_{{k + 1, l}} (x_{C {k + 1}}, W_{C {k, l}}) \\ - {\hat{λ}}_{{k, l}} (x_{C {k}}, W_{C {k, l}}), \end{array}$ (14)

using the gradient method according to the equation

$\begin{array}{l} W_{C j {k, l + 1}} & = & W_{C j {k, l}} - e_{C [j] {k, l}} Γ_{C j} S (D_{C}^{T} x_{C j {k}}) + \\ & & - k_{C} | e_{C [j] {k, l}} | Γ_{C j} W_{C j {k, l}}, \end{array}$ (15)

where ${\hat{λ}}_{{k + 1, l}} (x_{C {k + 1}}, W_{C {k, l}})$ is the output of the critic's NNs generated on the basis of the predicted state for the step $k + 1$ , $Γ_{C j} \in ℜ^{n_{C} x n_{C}}$ is the fixed diagonal matrix of positive learning rates of the j -th critic's NN, and k_C is a positive constant. The last term in Equation (15) is a weights regularization mechanism [13] that prevents the over-fitting of the NNs.

The actor is realized in the form of three RVFL NNs that generate the sub-optimal control law $u_{A {k, l}} = {[u_{A [1] {k, l}}, u_{A [2] {k, l}}, u_{A [3] {k, l}}]}^{T}$ and is expressed by the formula

$u_{A [j] {k, l}} (x_{A j {k}}, W_{A j {k, l}}) = W_{A j {k, l}}^{T} S (D_{A}^{T} x_{A j {k}}),$ (16)

where $x_{A j {k}} = Θ_{A} [1, s_{[j] {k}}, z_{1 {k}}^{T}, z_{2 {k}}^{T}, z_{d 1 {k}}^{T}, z_{d 2 {k}}^{T},$ ${e_{1 [j] {k}}, e_{2 [j] {k}}]}^{T}$ is the input vector of the j -th actor's NN consisting of the normalized values of the filtered tracking error $s_{{k}}$ , the errors $e_{1 {k}}$ and $e_{2 {k}}$ , the desired ( $z_{d 1 {k}}$ , $z_{d 2 {k}}$ ) and the performed ( $z_{1 {k}}$ , $z_{2 {k}}$ ) joint angles and angular velocities, $x_{A j [i] {k}} \in < - 1; 1 >$ , $j = 1,2,3$ , $Θ_{A} \in ℜ^{i_{A} x i_{A}}$ is the constant diagonal matrix of the scaling coefficients, $i = 1, \dots, i_{A}$ , i_A is the number of inputs to the actor's NN, $W_{A j {k, l}} \in ℜ^{n_{A} x 1}$ is the vector of the output layer weights of the j -th actor's NN, n_A is the number of neurons, and $D_{A} \in ℜ^{i_{A} x n_{A}}$ is the matrix of the fixed input weights selected randomly in the NNs initialization process. The actor's NN's weights are adapted by the minimization of the quality rating assumed in the form

$\begin{array}{l} e_{A {k, l}} & = & \frac{\partial L_{C} (s_{{k}})}{\partial u_{{k}}} + \\ & & + γ {[\frac{\partial s_{{k + 1}}}{\partial u_{{k}}}]}^{T} {\hat{λ}}_{{k + 1, l}} (x_{C {k + 1}}, W_{C {k, l}}), \end{array}$ (17)

using the gradient method according to the equation

$\begin{array}{l} W_{A j {k, l + 1}} & = & W_{A j {k, l}} - e_{A [j] {k, l}} Γ_{A j} S (D_{A}^{T} x_{A j {k}}) + \\ & & - k_{A} | e_{A [j] {k, l}} | Γ_{A j} W_{A j {k, l}}, \end{array}$ (18)

where $Γ_{A j} \in ℜ^{n_{A} x n_{A}}$ is the fixed diagonal matrix of the positive learning rates of the j -th actor's NN and k_A is a positive constant. The scheme of the j -th actor's RVFL NN is shown in Figure 3.b).

The DHP algorithm, schematically shown in Figure 3.a), consists of an actor, a critic and a predictive model. The inputs to the algorithm are the filtered tracking error $s_{{k}}$ , the controlled object's state parameters - not pointed out on the scheme - and the residual control signals ( $u_{P D {k}}$ , $u_{S {k}}$ and $u_{E {k}}$ ) discussed in the next section. The actor generates the control signal $u_{A {k, l}}$ in a k -th step. The overall control signal $u_{{k}}$ is a sum of the actor's control signal and the residual control signals. It is an input to the predictive model which, on the basis of the actual state of the object, generates the prediction of the filtered tracking error $s_{{k + 1}}$ . This signal is the input to the critic's NNs, which generate the prediction of the difference of the value function with respect to the state. The signal ${\hat{λ}}_{{k + 1, l}}$ , taking into account the dynamics model of the controlled object, is used to generate the error $e_{A {k, l}}$ according to Equation (17). Next, the signal $e_{A {k, l}}$ is used to update the actor's NNs' weights according to Equation (18). The critic's NNs are also used to generate the output ${\hat{λ}}_{{k, l}}$ in a k -th step. This signal - with others - is used to generate the error $e_{C {k, l}}$ according to Equation (14). Next, the error $e_{C {k, l}}$ is used to update the critic's NNs' weights according to Equation (15).

The adaptation process of the NNs' weights is an interesting feature of ADP algorithms. It is realized in the form of an internal loop with the iteration index l [9]. In every step k of the discrete control process, computations connected to the actor's and the critic's weights-adaptation procedure are executed according to the scheme shown in Figure 4.

Figure 4.

Schematic conception of the ADP structure adaptation process

The ADP structure adaptation process is organized in the following way, at the beginning of every k -th iteration step $l = 0$ . The actor's NNs' weights are adapted according to the assumed adaptation law (18) by the minimization of the error rate (17). This part of the algorithm is called the “control law improvement routine” [9]. It leads to the evaluation of the actor's NNs' weights $W_{A j {k, l + 1}}$ . The next step consists of the adaptation of the critic's NNs' weights - it is called the “value function determination operation”. The critic's NNs' weights are adapted according to the assumed adaptation law (15) by the minimization of the error rate (14). This leads to the calculation of the critic's NNs' weights $W_{C j {k, l + 1}}$ . Next, the internal loop iteration index l is increased and a new cycle of the ADP algorithm adaptation is begun. In the presented algorithm, the internal loop breaks when the number of internal iterations $l \geq l_{m x}$ , where $l_{m x}$ is the maximal number of iteration cycles, or when the error $e_{A [j] {k, l}}$ is smaller than the assumed positive limit $E_{A [j]}$ , $e_{A [j] {k, l}} < E_{A [j]}$ , $j = 1,2,3$ . When one of these conditions is satisfied, $W_{A j {k, l + 1}}$ becomes $W_{A j {k + 1, l}}$ and $W_{C j {k, l + 1}}$ becomes $W_{C j {k + 1, l}}$ . The next index k is increased. The actor's NNs generate control signals and the DHP structure receives information about the new state of the controlled object. In the next sections, the index l is omitted for the sake of simplicity.

4. Neural Tracking Control

The discrete neural tracking control system consists of the ADP algorithm in the DHP configuration ( $u_{A {k}}$ ), the PD controller ( $u_{P D {k}}$ ), the supervisory term ( $u_{S {k}}$ ) and the additional control signal ( $u_{E {k}}$ ). The structure of the supervisory term derives from the stability analysis performed using the Lyapunov stability theorem. The additional control signal $u_{E {k}}$ derives from the process of the robotic manipulator dynamics model discretization. The overall tracking control signal was assumed in the form

$u_{{k}} = - h^{- 1} M (z_{1 {k}}) [u_{A {k}} + u_{P D {k}} + u_{E {k}} - u_{S {k}}],$ (19)

where

$\begin{array}{l} u_{P D {k}} & = & K_{D} s_{{k}}, \\ u_{E {k}} & = & h [Λ e_{2 {k}} - z_{d 3 {k}}], \\ u_{S {k}} & = & I_{S} u_{S {k}}^{*}, \end{array}$ (20)

where $K_{D} \in ℜ^{3 x 3}$ is a fixed diagonal matrix of positive PD controller gains, $I_{S} \in ℜ^{3 x 3}$ is a diagonal matrix with the elements $I_{S [j, j]} = 1$ if $| s_{[j] {k}} | > ρ_{[j]}$ , or $I_{S [j, j]} = 0$ in the other case, and $j = 1,2,3$ , $ρ_{[j]}$ is a positive constant.

The scheme of the tracking control system with the actor-critic structure in the DHP configuration is shown in Figure 5.

Figure 5.

Scheme of the tracking control system with the DHP algorithm

5. Stability Analysis

The stability analysis was performed using sliding mode control theory [1, 6, 8]. An ideal sliding mode exists only when the state trajectory of the controlled system agrees with the desired trajectory in every step k. This may require the fast switching of the control signal, which is disadvantageous in real systems because it leads to oscillations within the neighbourhood of the switching surface. These oscillations are called “chattering” and are undesirable, since they involve very high control activity and may excite the high-frequency dynamics of the object. Hence, chattering must be reduced for the controller to perform correctly. This is achieved by introducing a thin boundary layer neighbouring the switching surface

$B_{{k}} = {z_{{k}}, | s_{[j]} (z_{{k}}) | \leq ρ_{[j]}},$ (21)

where $ρ_{[j]}$ is the positive, constant boundary layer thickness. Outside $B_{{k}}$ , the sliding control law guarantees boundary layer attraction. In the stability analysis, it was assumed that $I_{S [j, j]} = 1$ , which means that $| s_{[j] {k}} | > ρ_{[j]}$ . If $I_{S [j, j]} = 0$ , it means that $| s_{[j] {k}} | \leq ρ_{[j]}$ , and the state trajectory remains inside the boundary layer so that the tracking errors are bounded. The aim of the actor-critic structure is to approximate the controlled system's nonlinearities, while the PD controller guaranties stability and generates the control signal in a cases when the compensation is not perfect. The additional control signal $u_{E {k}}$ has a structure that derives from the discretization of the continuous mathematical model of the robotic manipulator. It is based on the reference or measurable signals - easily accessible for the control system - so it may be explicitly included in the control law. The supervisory term has a structure analogical to the sliding mode controller, with the boundary layer neighbouring the switching surface, but it is inactive inside the boundary layer where the residual control elements generate the control law. This provides for the stability of the control system and prevents the occurrence of chattering.

Theorem: If the overall control signal of the dynamical system (7) is assumed in the form of Equation (19), the positive definite Lyapunov candidate function takes the form

$L_{{k}} = \frac{1}{2} s_{{k}}^{T} s_{{k}},$ (22)

and if the supervisory term's control signal $u_{S {k}}$ is selected adequately, then the closed-loop system is stable and the difference of the Lyapunov function (22) is a negative definite.

Proof: Substituting (19) into (7), the closed-loop system equation is expressed by the formula

$\begin{array}{l} s_{{k + 1}} = Y_{d} (z_{{k}}, z_{d {k}}, z_{d 3 {k}}) - Y_{f} (z_{1 {k}}, z_{2 {k}}) + \\ - Y_{τ} (z_{1 {k}}) - [u_{A {k}} + u_{P D {k}} + u_{E {k}} - u_{S {k}}] . \end{array}$ (23)

The difference of the Lyapunov candidate function (22)

$Δ L_{{k}} = s_{{k + 1}}^{T} s_{{k + 1}} - s_{{k}}^{T} s_{{k}} < 0,$ (24)

may be written in the form

${‖ s_{{k + 1}} ‖}^{2} < {‖ s_{{k}} ‖}^{2} .$ (25)

On the assumption (25), [30] can be written

$‖ s_{{k}} ‖ ‖ s_{{k + 1}} ‖ < {‖ s_{{k}} ‖}^{2},$ (26)

from the Cachy-Schwartz inequality result

$| s_{{k}}^{T} s_{{k + 1}} | \leq ‖ s_{{k}} ‖ ‖ s_{{k + 1}} ‖,$ (27)

and so the inequality (26) is equivalent to

$| s_{{k}}^{T} s_{{k + 1}} | < {‖ s_{{k}} ‖}^{2} .$ (28)

The inequality (28) is equivalent to

$s_{{k}}^{T} [s_{{k + 1}} - s_{{k}}] < 0,$ (29)

and as a result

$Δ L_{{k}} = s_{{k}}^{T} [s_{{k + 1}} - s_{{k}}] < 0.$ (30)

Substituting (23) into (30), $Δ L_{{k}}$ takes the form

$\begin{array}{l} Δ L_{{k}} = - s_{{k}}^{T} K_{D} s_{{k}} + s_{{k}}^{T} u_{S {k}}^{*} + \\ + s_{{k}}^{T} [- Y_{f} (z_{1 {k}}, z_{2 {k}}) - u_{A {k}} - Y_{τ} (z_{1 {k}})] . \end{array}$ (31)

On the assumption that all the elements of the vector of the disturbances are bounded, $Y_{τ [j]} (z_{1 {k}}) < b_{d [j]}$ , where $b_{d [j]}$ is a positive constant boundary of disturbances, the difference of the Lyapunov candidate function takes the form

$\begin{array}{l} Δ L_{{k}} \leq - s_{{k}}^{T} K_{D} s_{{k}} + \\ + \sum_{j = 1}^{3} | s_{[j] {k}} | [| Y_{f [j]} (z_{1 {k}}, z_{2 {k}}) | + | u_{A [j] {k}} | + b_{d [j]}] + \\ + \sum_{j = 1}^{3} s_{[j] {k}} u_{S [j] {k}}^{*} . \end{array}$ (32)

The supervisory term's control signal was assumed in the form

$u_{S [j] {k}}^{*} = - sgn (s_{[j] {k}}) [F_{[j]} + | u_{A [j] {k}} | + b_{d [j]} + σ_{[j]}],$ (33)

where $| Y_{f [j]} (z_{1 {k}}, z_{2 {k}}) | \leq F_{[j]}$ , $F_{[j]}$ is a positive constant, and $σ_{[j]}$ is a small positive constant. Substituting (33) into (32) as a result,

$Δ L_{{k}} \leq 0.$ (34)

The difference of the Lyapunov function is a negative definite. The designed control signal guarantees the reduction of the filtered tracing error $s_{{k}}$ , when $| s_{[j] {k}} | > ρ_{[j]}$ . In the case of the initial condition $| s_{[j] {k = 0}} | \leq ρ_{[j]}$ , the filtered tracking error is bounded to $| s_{[j] {k}} | \leq ρ_{[j]}$ , $\forall k > 0$ , $j = 1, 2, 3$ .

6. Research Results

The proposed discrete tracking control system of the robotic manipulator was tested in a series of numerical tests and verification experiments performed using the laboratory stand schematically shown in Figure 6.

Figure 6.

Scheme of the laboratory stand

The laboratory stand is composed of the Scorbot-ER 4pc robotic manipulator, a power supply and a PC equipped with the dSpace DS1102 digital signal processing board and software: MATLAB/Simulink and dSpace Control Desk. The Scorbot-ER 4pc is equipped with 12 [V] DC servo motors with gears and optical encoders. The basic dimensions of the Scorbot-ER 4pc robotic manipulator are: $l_{1} = 0.35$ [m], $l_{e} = 0.026$ [m], $l_{2} = 0.22$ [m], $l_{3} = 0.22$ [m]. The length of the links and the degree of rotation of the joints determine the robot's work envelope, where the maximum operating radius of the gripper is equal to $l_{C m a x} = 0.61$ [m]. The robotic manipulator weighs $m_{r} = 11.5$ [kg]. The maximum path velocity of the manipulator's end-effector is equal to $v_{C} = 0.6$ [m/s].

The matrices $M (q)$ , $C (q, \dot{q})$ , and the vectors $F (\dot{q})$ and $G (q)$ in Equation (1), take the form

$\begin{array}{l} M (q) = [\begin{matrix} M_{[1,1]} & 0 & 0 \\ 0 & p_{[6]} & M_{[2,3]} \\ 0 & M_{[3,2]} & p_{[7]} \end{matrix}], \\ C (q, \dot{q}) = [\begin{matrix} - b {\dot{q}}_{[2]} - c {\dot{q}}_{[3]} & - b {\dot{q}}_{[1]} & - c {\dot{q}}_{[1]} \\ b {\dot{q}}_{[1]} & 0 & C_{[2,3]} \\ c {\dot{q}}_{[1]} & C_{[3,2]} & 0 \end{matrix}], \\ F (\dot{q}) = [\begin{matrix} p_{[8]} {\dot{q}}_{[1]} + p_{[11]} sgn ({\dot{q}}_{[1]}) \\ p_{[9]} {\dot{q}}_{[2]} + p_{[12]} sgn ({\dot{q}}_{[2]}) \\ p_{[10]} {\dot{q}}_{[3]} + p_{[13]} sgn ({\dot{q}}_{[3]}) \end{matrix}], \\ G (q) = [\begin{matrix} 0 \\ p_{[1]} g \cos (q_{[2]}) \\ p_{[2]} g \cos (q_{[3]}) \end{matrix}], \end{array}$ (35)

where

$\begin{array}{l} M_{[1,1]} = 2 p_{[2]} (l_{e} + l_{2} \cos (q_{[2]})) \cos (q_{[3]}) + \\ + 2 p_{[1]} l_{e} \cos (q_{[2]}) + \frac{1}{2} p_{[3]} \cos (2 q_{[2]}) + \\ + \frac{1}{2} p_{[4]} \cos (2 q_{[3]}) + p_{[5]}, \\ M_{[2,3]} = M_{[3,2]} = p_{[2]} l_{2} \cos (q_{[3]} - q_{[2]}), \\ b = p_{[1]} l_{e} \sin (q_{[2]}) + p_{[2]} l_{2} \sin (q_{[2]}) \cos (q_{[3]}) \\ + \frac{1}{2} p_{[3]} \sin (2 q_{[2]}), \\ c = p_{[2]} (l_{e} + l_{2} \cos (q_{[2]})) \sin (q_{[3]}) + \frac{1}{2} p_{[4]} \sin (2 q_{[3]}), \\ C_{[2,3]} = - p_{[2]} l_{2} \sin (q_{[3]} - q_{[2]}) {\dot{q}}_{[3]}, \\ C_{[3,2]} = p_{[2]} l_{2} \sin (q_{[3]} - q_{[2]}) {\dot{q}}_{[2]}, \end{array}$ (36)

where $p = {[p_{[1]}, \dots, p_{[13]}]}^{T}$ is the vector of robotic manipulator's parameters that results from its mass distribution, mass geometry and resistances to motion [13]. The nominal values of the parameters of the Scorbot-ER 4pc robotic manipulator were assumed as p=[0.0065,0.0018,0.0113,0.0064,0.0114,0.0113,0.0065,0.5276,0.5232,0.5232,0.0195,0.0182,0.0183]^T.

6.1. Simulation Results

The performance of the proposed discrete tracking control system was tested during a series of numerical simulations performed using the MATLAB/Simulink software environment. In the following part of the article, the notation for the variables is simplified by omitting the index k. The same set of parameters was used during the numerical tests and the experiments. The time discretization parameter was equal to $h = 0.01$ [s]. The actor and the critic were built using NNs with eight neurons each. The initial output layer weights were set to zero, the input layer weights were fixed and set randomly in the initialization process from the range $D_{A [i, n]} \in < - 0.2; 0.2 >$ , $i = 1, \dots, i_{A}$ , $n = 1, \dots, n_{A}$ , where $n_{A} = 8$ , $D_{C [i, n]} \in < - 0.2; 0.2 >$ , $i = 1, \dots, i_{C}$ , $n = 1, \dots, n_{C}$ , where $n_{C} = 8$ . The parameters of the PD controller $K_{D} = diag {0.4,0.7,0.7}$ , $Λ = diag {1,1,1}$ were assumed. In the presented article, the goal was not to demonstrate the maximal quality of the tracking control attainable using the highest feasible by applying the PD controller gains, but to illustrate the increase in the quality of the tracking control after adding to the control system a part that compensates for the nonlinearities of the controlled object. The results of applying MNNs in this role and the actor-critic structure in the DHP configuration are shown in the presented article. The matrix $R$ in the cost function was set to $R = diag {1,1,1}$ and the discount factor was equal to $γ = 0.5$ . The learning rates of the actor's NNs were equal to $Γ_{A 1 [i, i]} = 0.26$ , $Γ_{A 2 [i, i]} = 0.2$ , $Γ_{A 3 [i, i]} = 0.2$ and $k_{A} = 0.1$ . The learning rates of the critic's NNs were equal to $Γ_{C 1 [i, i]} = 2$ , $Γ_{C 2 [i, i]} = 1$ , $Γ_{C 3 [i, i]} = 1$ and $k_{C} = 0.1$ . The parameters of the supervisory term were set to $ρ_{[j]} = 0.2$ , $F = {[0.25,0.25,0.35]}^{T}$ and $σ_{[j]} = 0.01$ .

The tracking control task was to perform the desired trajectory in the configuration space, the results of which would be the movement of the robotic manipulator's end-effector (point C) on the desired path in the operational space. To perform the movement, adequate control signals of the actuators are generated by the control system. The desired path of the point C is shown in Figure 7.a), where the point $S (0.45,0)$ marked by the triangle is the starting point, and simultaneously the end-point of the one motion cycle, point $G (0.25,0)$ is the turning point. In the Figure 7.a), the point C is marked by “x” in a position at the beginning of trajectory realization (in point S). The trajectory of the end-effector was performed in the following way. Point C moves from point S to point G over the desired path, and then returns to the starting position - this is the first cycle of movement. In the second cycle, the end-effector is loaded by an additional mass $m_{L} = 1.5$ [kg] and moves to point G, where the additional load is removed. Next, the robotic manipulator's end-effector moves to the starting position. Reaching the point S ends the second cycle of movement. The third cycle is identical to the second one, where the load is moved from point S to point G and then unloaded, and the end-effector returns to the starting point. In the simulation, these parametric disturbances, connected with displacing the load, were simulated, and their influence on the movement parameters, the values of the control signals and the NNs' weights are marked by rounded rectangles. During the occurrence of the parametric disturbance, the nominal set of parameters $p$ was changed to $p_{d} = [0.01,$ 0.007,0.02,0.02,0.02,0.02,0.02,0.53,0.53,0.53,0.025,0.025, ${0.025]}^{T}$ , and then restored to the set of nominal values. The occurrence of the parametrical disturbances is schematically shown in Figure 7.d), when $d_{p} = 0$ nominal values of the parameters $p$ are set, and when $d_{p} = 1$ ( $t \in < 21; 27 >$ [s] and $t \in < 33; 40 >$ [s]), the manipulator moves the load; in simulation, the $p_{d}$ set of parameters is set. The realization of the two final cycles of movement by the manipulator may be seen as performing a pick-and-place task, often met in industrial applications. The desired joint angles are shown in Figure 7.b) and the desired joint angular velocities are shown in Figure 7.c).

Figure 7.

a) The desired path of the manipulator's end-effector, b) the desired joint angles, $z_{d 1 [1]}$ , $z_{d 1 [2]}$ and $z_{d 1 [3]}$ , c) the desired joint angular velocities, $z_{d 2 [1]}$ , $z_{d 2 [2]}$ , $z_{d 2 [3]}$ , and d) the d_p value (the occurrence of the disturbance)

The reference path of point C (blue line) and the performed path of point C (red line) in the operational space are presented in Figure 8.a). The performed trajectory is shown only for the first motion cycle in order to make the figure more explicit and simplify the interpretation. The position of point C was marked for t =15 [s], when the manipulator's end-effector stops at point S. The errors of the manipulator's end-effector coordinates in the operational space, $e_{x} = x_{d C} - x_{C}$ , $e_{y} = y_{d C} - y_{C}$ , $e_{z} = z_{d C} - z_{C}$ , are shown in Figure 8.b). It should be noted that the purpose of the control algorithm is not the minimization of the manipulator's end-effector coordinates' errors in the operational space, e_x, e_y, e_z but rather minimization of the tracking errors in the configuration space. In the simulation tests, the plot of the manipulator's end-effector coordinate errors in the operational space is shown in order to compare it with the results of the verification. Similar error values lead to the conclusion that the assumed mathematical model of the controlled object is correct.

Figure 8.

a) The desired and performed path of the manipulator's end-effector, point C, b) the errors of the manipulator's end-effector coordinates in the operational space, e_x, e_y, e_z

The desired trajectory of the manipulator's end-effector was performed by the implementation of the proposed control system which generated the overall control signals $u$ shown in Figure 9.a). These control signals consist of the control signals generated by the actor's NN $u_{A}$ , shown in Figure 9.b), the PD control signals $u_{P D}$ (Figure 9.c), and the additional control signals $u_{E}$ (Figure 9.d). In the presented numerical test, $u_{S}$ equalled zero because the demanded tracking quality of the residual elements of the control system was provided. At the beginning of the numerical test ( $t \in < 0; 3.5 >$ [s]), the desired position of the point C is fixed, but the manipulator is weighted by gravitational forces. This results in the need to generate control signals that prevent the free-fall of the manipulator's arm. The proper values of the control signals are generated for the motor drives of links 2 and 3 mainly by the actor's NNs (a green line and a blue line in Figure 9.b). The residual control signals are close to zero. The control signals of the actor's NNs take the main part of the overall control signals. The modification of the manipulator's set parameters - which simulates the disturbance caused by the transported load - result in a change in the control signal's values for links 2 and 3. The control signal $u_{[2]}$ obtained lower absolute values (the green line in Figure 9.a), and the control signal $u_{[3]}$ obtained higher values (the blue line) at the time of the occurrence of the disturbances. This was a result of the geometry of the manipulator and the realization of the desired trajectory. The change in the robotic manipulator's parameters is compensated for by a change in the actor-critic structure's control signals values $u_{A [2]}$ and $u_{A [3]}$ . The values of the residual control signals do not change to a noticeable degree.

Figure 9.

a) The overall tracking control signals $u_{[1]}$ , $u_{[2]}$ and $u_{[3]}$ , b) the actor's NN's control signals $U_{A [1]}$ , $U_{A [2]}$ and $U_{A [3]}$ , $U_{A} = - h^{- 1} M (z_{1}) u_{A}$ , c) the PD control signals $U_{P D [1]}$ , $U_{P D [2]}$ and $U_{P D [3]}$ , $U_{P D} = - h^{- 1} M (z_{1}) u_{P D}$ , and d) the control signals $U_{E [1]}$ , $U_{E [2]}$ and $U_{E [3]}$ , $U_{E} = - h^{- 1} M (z_{1}) u_{E}$

The desired trajectory was performed with the tracking errors of the joint angles, shown in Figure 10.a), and the tracking errors of the joint angular velocities, shown in Figure 10.b). Non-zero values of the tracking errors occurred at the beginning of the experiment and were connected with the influence of gravitational forces on the robotic manipulator's links. The control signals generated by the actor-critic structure minimize the values of these errors. A noticeable increase in the tracking errors' values occurs at the time of the simulated disturbances, but it is reduced by the change in the actor NNs' control signals.

Figure 10.

a) Tracking errors of the joint angles, $e_{1 [1]}$ , $e_{1 [2]}$ and $e_{1 [3]}$ , and b) tracking errors of the joint angular velocities, $e_{2 [1]}$ , $e_{2 [2]}$ and $e_{2 [3]}$

The values of the DHP algorithm NNs' weights are shown in Figure 11 for all the actor's and critic's NNs. In the numerical test, zero initial weight-values were used. The weights-values of the NNs connected with generating the control signals for links 2 and 3 ( $W_{A 2}$ , $W_{C 2}$ , $W_{A 3}$ , $W_{C 3}$ ) are adapted from the beginning of the numerical test to generate control signals that compensate for the influence of the gravitational forces. At the time of the disturbances, the changes in the weights' values occurred as a result of the adaptation performed in order to reduce the tracking errors.

Figure 11.

a) Weights of the actor's one-RVFL NN $W_{A 1}$ , b) weights of the critic's one-RVFL NN $W_{C 1}$ , c) weights of the actor's two-RVFL NN $W_{A 2}$ , d) weights of the critic's two-RVFL NN $W_{C 2}$ , e) weights of the actor's three-RVFL NN $W_{A 3}$ , and f) weights of the critic's three-RVFL NN $W_{C 3}$

6.2. Verification Results

After a series of numerical tests, an experiment was performed using the Scorbot-ER 4pc robotic manipulator. The proposed discrete tracking control system with the DHP algorithm was compared to the neural tracking control system with MNNs, described in detail in [13], and to the PD controller. The control signals were computed in real-time in accordance with the assumed control law, using the dSpace DS1102 digital signal processing board. In the experiment, the same desired trajectory and the same parameters of the control system as in the numerical test were used. The values of the signals from the experiment were not filtered.

The reference path of point C (blue line) and the performed path of point C (red line) in the operational space are presented in Figure 12.a) for the control system with the DHP structure, and in Figure 12.c) for the neural control system with MNNs. The performed trajectory is shown only for the first motion cycle in order to make the figure more explicit and to simplify the interpretation. The position of point C was marked for t =15 [s] when the manipulator's end-effector stops at point S. The errors of the manipulator's end-effector coordinates in the operational space, $e_{x} = x_{d C} - x_{C}$ , $e_{y} = y_{d C} - y_{C}$ , $e_{z} = z_{d C} - z_{C}$ , are shown in Figure 11.b) and d) for the adequate control system. It should be noted that the purpose of the control algorithm is not the minimization of the manipulator's end-effector coordinates errors in the operational space, e_x, e_y, e_z but rather the minimization of the tracking errors in the configuration space. However, the comparison of the e_x, e_y, e_z error values for different control algorithms provides additional information about the control quality. It is worth noting that the experiment was performed using the educational robotic manipulator Scorbot-ER 4pc, where the links are cog-belt driven, where there are clearances in the gears, and where the angle measurements are performed using incremental encoders. Therefore, one should not be influenced by the accuracy of information of the path-realization expressed in metres in comparison with industrial robotic manipulators with a much higher stiffness in the kinematic chain. In the case of the verification research, the presented diagrams allow for an indirect comparison of the tracking control quality for the presented control system with the ADP algorithm in the DHP configuration and the neural control system with MNNs. In the case of the neural control system with MNNs, using assumed initial conditions, the end-effector's path mapping quality is lower than when using the proposed control algorithm with the DHP structure, which results in higher values of the coordinate errors in the operational space. It is especially evident in the first phase of movement, when $t \in < 3; 7 >$ [s]. The occurrence of disturbances increases the values of the errors, especially e_y. This is related to moving an additional load by the gripper which affects the inertia of the manipulator's arm. A smaller increase in the path realization errors in the operational space can be noticed in the case of the proposed control algorithm in comparison with the neural control system with MNNs.

Figure 12.

a) The desired and performed path of the manipulator's end-effector, point C, for the control system with the DHP structure, b) errors of the manipulator's end-effector coordinates in the operational space, e_x, e_y, e_z, c) the desired and performed path of the manipulator's end-effector, point C, for the control system with MNNs, and d) errors of the manipulator's end-effector coordinates in the operational space, e_x, e_y, e_z

The control signals are shown in Figure 13. The disturbances were realized by the additional load moved by the manipulator attached to its end-effector at the same time as simulated in the numerical tests. The PD control signals, shown in Figure 13.c) were noised because they are based on the tracking errors calculated on the basis of the performed trajectory, computed on the basis of signals from the incremental encoders. This has an effect on the overall control signals shown in Figure 13.a). As compared with the PD control signals, the control signals generated by the actor's NNs, shown in Figure 13.b), and the additional control signals shown in Figure 13.d), were smoother. At the beginning of the experiment ( $t \in < 0; 1 >$ [s]), the manipulator's end-effector was fixed to prevent its downward movement. Afterwards, it was released and the control system began to generate control signals to compensate for the influence of gravitational forces. As in the numerical test, at the time of moving the load, the values of the actor NNs' control signals changed to compensate for the effect of the robotic manipulator's dynamics change.

Figure 13.

a) The overall tracking control signals $u_{[1]}$ , $u_{[2]}$ and $u_{[3]}$ , b) the actor's NNs' control signals $U_{A [1]}$ , $U_{A [2]}$ and $U_{A [3]}$ , $U_{A} = - h^{- 1} M (z_{1}) u_{A}$ , c) the PD control signals $U_{P D [1]}$ , $U_{P D [2]}$ and $U_{P D [3]}$ , $U_{P D} = - h^{- 1} M (z_{1}) u_{P D}$ , and d) the control signals $U_{E [1]}$ , $U_{E [2]}$ and $U_{E [3]}$ , $U_{E} = - h^{- 1} M (z_{1}) u_{E}$

The tracking errors of the joint angles for the proposed control system with the DHP algorithm are shown in Figure 14.a), while the tracking errors of the joint angular velocities are shown in Figure 14.b). Figures 14.c) and d) show the values of the tracking errors of the joint angles and the angular velocities obtained using the control system with the MNNs, respectively, and Figures 14.e) and f) show the values of the tracking errors of the joint angles and the angular velocities for the PD controller with the parameters $K_{D} = diag {1,1,1}$ , $Λ = diag {1,1,1}$ . In all the NNs of the DHP actor-critic algorithm and in the output layer of the MNNs, zero initial weights were used. In the experiment, where the control system with the DHP algorithm was used, the tracking errors of the joint angles have comparable values during the experiment, and it is only when the disturbances occur that the errors' values increase. Small values in the tracking errors at the beginning of the NNs' weights-adaptation process - despite using zero initial weights-values - result from the implementation of the complex method of the actor-critic structure's NNs' weights-adaptation process in the internal loop. In the case of the neural tracking control system with MNNs, the values of the tracking errors of the joint angles at the beginning of the experiment are nearly three-times higher than when using the DHP algorithm, after which they are reduced during the NNs' weights-adaptation process. The values of the errors for the PD controller were the highest among the tested control systems - they increase during the occurrence of the disturbance.

Figure 14.

a) Tracking errors of the joint angles, $e_{1 [1]}$ , $e_{1 [2]}$ and $e_{1 [3]}$ for the control system with the DHP structure, b) tracking errors of the joint angular velocities, $e_{2 [1]}$ , $e_{2 [2]}$ and $e_{2 [3]}$ , c) tracking errors $e_{1 [1]}$ , $e_{1 [2]}$ and $e_{1 [3]}$ for the control system with the MNNs, d) tracking errors $e_{2 [1]}$ , $e_{2 [2]}$ and $e_{2 [3]}$ for the control system with the MNNs, e) tracking errors $e_{1 [1]}$ , $e_{1 [2]}$ and $e_{1 [3]}$ for the PD controller, and f) tracking errors $e_{2 [1]}$ , $e_{2 [2]}$ and $e_{2 [3]}$ for the PD controller

The values of the DHP structure's NNs' weights are shown in Figure 13. In the experiment, zero initial weight-values were used. The weight-values of the NNs connected with generating the control signals for links 2 and 3 ( $W_{A 2}$ , $W_{C 2}$ , $W_{A 3}$ , $W_{C 3}$ ) are adapted from the time $t_{1} = 1$ [s] to generate control signals that compensate for the influence of the gravitational forces on the manipulator. At the time of the disturbance's occurrence, changes in the weights' values occurred as a result of the adaptation performed in order to reduce the tracking errors.

The tracking quality of the proposed discrete control system with the DHP algorithm was compared with the results obtained by the neural tracking control system with MNNs and by the PD controller. Every experiment was performed under the same conditions, using the same type of disturbance realized in the form of the load moved by the manipulator's end-effector.

Using the following quality ratings, the tracking control quality was evaluated,

maximal values of the tracking errors of the joint angles ( $e_{1 m a x [1]}$ , $e_{1 m a x [2]}$ , $e_{1 m a x [3]}$ [rad]),

root mean square error (RMSE) of the tracking errors of the joint angles $e_{1 [1]}$ , $e_{1 [2]}$ and $e_{1 [3]}$ [rad],

$ε_{e [j]} = \sqrt{\frac{1}{N} \sum_{k = 0}^{N} e_{1 [j] {k}}^{2}},$ (37)

where $j = 1,2,3$ , $N = 4000$ ,

maximal values of the filtered tracking errors ( $s_{m a x [1]}$ , $s_{m a x [2]}$ , $s_{m a x [3]}$ ) [rad/s],

RMSE of the filtered tracking errors $s_{[1]}$ , $s_{[2]}$ and $s_{[3]}$ [rad/s],

$ε_{s [j]} = \sqrt{\frac{1}{N} \sum_{k = 0}^{N} s_{[j] {k}}^{2}} .$ (38)

The values of quality ratings are shown in Table 1.

Table 1.

Values of the quality ratings

Control algorithm:	PD controller	control system with MNNs	control system with the DHP algorithm
$e_{1 m a x [1]} [rad]$	0.0718	0.0303	0.0116
$e_{1 m a x [2]} [rad]$	0.183	0.0257	0.0077
$e_{1 m a x [3]} [rad]$	0.1908	0.018	0.013
$ε_{e [1]} [rad]$	0.042	0.0072	0.0043
$ε_{e [2]} [rad]$	0.0981	0.005	0.0043
$ε_{e [3]} [rad]$	0.0938	0.004	0.0054
$s_{m a x [1]} [rad / s]$	0.1475	0.0788	0.0518
$s_{m a x [2]} [rad / s]$	0.2476	0.1002	0.0585
$s_{m a x [3]} [rad / s]$	0.2549	0.0608	0.0582
$ε_{s [1]} [rad / s]$	0.0636	0.0146	0.0114
$ε_{s [2]} [rad / s]$	0.1109	0.0125	0.0116
$ε_{s [3]} [rad / s]$	0.1073	0.0113	0.0114

The maximal values of the filtered tracking errors ( $s_{m a x [1]}$ , $s_{m a x [2]}$ and $s_{m a x [3]}$ ) are shown in Figure 16.a), and the values of the RMSE of the filtered tracking errors $s_{[1]}$ , $s_{[2]}$ and $s_{[3]}$ are shown in Figure 16.b).

Figure 15.

Figure 16.

a) Maximal values of the filtered tracking errors ( $s_{m a x [1]}$ , $s_{m a x [2]}$ and $s_{m a x [3]}$ ), and b) RMSE of the filtered tracking errors $s_{[1]}$ , $s_{[2]}$ and $s_{[3]}$

On the basis of the obtained results, the higher tracking quality of the control systems with the DHP algorithm and the neural control system with MNNs in comparison with the PD controller can be noticed. The values of the quality ratings obtained in the experiment, when the proposed tracking control system with the DHP algorithm was used, are lower than the quality ratings obtained when using the tracking control system with MNNs. The difference in the values of the quality ratings is especially high in the case of the maximal values of the tracking errors and the filtered tracking errors, while the residual values of the quality ratings have the same order of magnitude.

7. Conclusion

The article presents a discrete tracking control system for the Scorbot-ER 4pc robotic manipulator. The control system consists of an ADP algorithm in a DHP configuration, with additional elements like the PD controller, the supervisory term and the additional control signal deriving from the discretization of the controlled object's model. The PD controller and the supervisory term ensure the stability of the tracking control in the case of disturbances and at the beginning of the experiment, where zero or random initial NNs' output-layer weights are applied. The DHP algorithm consists of an actor-critic structure realized in the form of NNs and a predictive model of the robotic manipulator. The actor generates the sub-optimal control law, while the critic approximates the difference of the value function with respect to the state. The NNs' weights for the actor-critic structure are adapted online in the internal loop during the experiment. This method of NNs' weights-adaptation is computationally complex, but ensures the high quality of the tracking control in the case of disturbances. From the performed research results, the proposed tracking control system ensures higher quality tracking control in comparison with the PD controller or the neural tracking control system with complex MNNs, despite using only simple, one-layer NNs in the DHP structure. An especially considerable difference occurs with the maximal values of the tracking errors, which indicates that the proposed tracking control system with a DHP structure does not require a process of preliminary learning for the NNs (in contrast with the neural control system with MNNs). Even in the case of zero initial weights for the NNs' application - or in the case of disturbances - the proposed control system guarantees a stable tracking process. The values of the errors and the NNs' weights are bounded. The discrete tracking control system works online. The performance of the control system was verified by a series of numerical tests and experiments performed using the Scorbot-ER 4pc robotic manipulator. The proposed control system could be applied to the tracking control problem of robotic manipulators with more than three-DOF; however, it is connected with an increase in the dimensionality of the problem and the need to use additional NNs in the actor-critic structure, and thus with the increased computational complexity of the algorithm. The main contribution of this paper lies in the application of a discrete tracking control system of a robotic manipulator to the real-time control of a Scorbot-ER 4pc robotic manipulator, where an ADP algorithm in a DHP configuration was implemented to approximate the nonlinearities of the controlled object. Moreover, the obtained result was compared with the results of an experiment realized using the MNNs tracking control system. The influence of the NNs' weights-adaptation method of the actor and the critic in the DHP algorithm on tracking control quality (even in the case of disturbances or the disadvantageous case of zero initial weights) was discussed. It was also demonstrated in the results of the experiment to be more beneficial than the adaptation algorithm used in the neural control algorithm with MNNs, despite using single-layer NNs with a less complex structure and smaller adaptation possibilities in the DHP algorithm.

References

de Wit

Canudas C

Bastin

Siciliano

, editors. Theory of Robot Control. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edition, 1996. DOI:10.1007/978-1-4471-1501-4

Lewis

Jagannathan

Yesildirak

, Neural Network Control of Robot Manipulators and Nonlinear Systems. Taylor & Francis, Inc., Bristol, PA, USA, 1998.

Sciavicco

Siciliano

, Modelling and Control of Robot Manipulators. Springer-Verlag, London, 2000. DOI:10.1007/978-1-4471-0449-0

Zuo

Wang

Huang

, Intelligent hybrid control strategy for trajectory tracking of robot manipulators. Journal of Control Science and Engineering. 2008; 2008:1–13. DOI: 10.1155/2008/520591

Qiao

, Robust adaptive PID control of robot manipulator with bounded disturbances. Mathematical Problems in Engineering. 2013; 2013:1–13. DOI:10.1155/2013/535437

Astrom

Wittenmark

, Adaptive Control. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1994.

Sira-Ramirez

, Differential geometric methods in variable-structure control. International Journal of Control. 1988; 48:1359–1390. DOI: 10.1080/00207178808906256

Slotine

, Applied Nonlinear Control. Prentice Hall, Inc., New Jersey, 1991.

Barto

Powell

Wunsch

, Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence). John Wiley & Sons, Inc., Publication, IEEE Press, USA, 2004.

10.

Sutton

Barto

, Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.

11.

Powell

, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics). John Wiley & Sons, Inc., Publication, USA, 2007. DOI: 10.1002/9780470182963

12.

Prokhorov

Wunsch

, Adaptive critic designs. IEEE Transactions on Neural Networks. 1997; 8:997–1007.

13.

Zylski

Gierlak

, Verification of multilayer neural-net controller in manipulator tracking control. Solid State Phenomena. 2010; 164:99–104. DOI: 10.4028/www.scientific.net/SSP.164.99

14.

Sadati

Emamzadeh

, A novel fuzzy reinforcement learning approach in two-level intelligent control of 3-DOF robot manipulators. In: Proceedings of IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL'07); 1–5 April 2007; Honolulu. New York:IEEE; 2007. p. 20–25

15.

Millan

J. del R

, Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot. Robotics and Autonomous Systems. 1995; 15:275–299. DOI: 10.1016/0921-8890(95)00021-7

16.

Syam

Watanabe

Izumi

, Adaptive actor-critic learning for the control of mobile robots by applying predictive models. Soft Computing. 2005; 9:835–845. DOI:10.1007/s00500-004-0424-1

17.

Szuster

Hendzel

, Discrete globalised dual heuristic dynamic programming in control of the two-wheeled mobile robot. Mathematical Problems in Engineering. 2014; 2014:1–16. DOI: 10.1155/2014/628798

18.

Venayagamoorthy

Wunsch

Harley

, Adaptive critic based neurocontroller for turbogenerators with global dual heuristic programming. In: Proceedings of IEEE Power Engineering Society Winter Meeting; 23–27 Jan 2000; Singapore. New York:IEEE; 2000.p. 291–294

19.

Barto

Sutton

Anderson

, Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics. 1983; 13:834–846. DOI: 10.1109/TSMC.1983.6313077

20.

Iftekharuddin

, Transformation invariant on-line target recognition. IEEE Transactions on Neural Networks. 2011; 22:906–918. DOI:10.1109/TNN.2011.2132737

21.

Mohagheghi

Venayagamoorthy

Harley

, Adaptive critic design based neuro-fuzzy controller for a static compensator in a multimachine power system. IEEE Transactions on Power Systems. 2006; 21:1744–1754. DOI:10.1109/TPWRS.2006.882467

22.

Hendzel

Burghardt

Szuster

, Reinforcement learning in discrete neural control of the underactuated system. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2013; 7894LNAI(PART 1):64–75. DOI: 10.1007/978-3-642-38658-9_6

23.

Hendzel

Szuster

, Discrete model-based adaptive critic designs in wheeled mobile robot control. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2010; 6114LNAI(PART 2):264–271. DOI: 10.1007/978-3-642-13232-2_32

24.

Hendzel

Szuster

, Discrete neural dynamic programming in wheeled mobile robot control. Communications in Nonlinear Science and Numerical Simulation. 2011; 16:2355–2362. DOI:10.1016/j.cnsns.2010.04.046

25.

Gierlak

Szuster

Zylski

, Discrete dual-heuristic programming in 3DOF manipulator control. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2010; 6114LNAI(PART 2):256–263. DOI: 10.1007/978-3-642-13232-2_31

26.

Gierlak

, Hybrid position/force control of the Scorbot-er 4pc manipulator with neural compensation of nonlinearities. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2012; 7268LNAI(PART 2):433–441. DOI: 10.1007/978-3-642-29350-4_52

27.

Nakamura

Hanafusa

, Inverse kinematic solutions with singularity robustness for robot manipulator control. Journal of Dynamic Systems, Measurement and Control. 1986; 108:163–171. DOI: 10.1115/1.3143764

28.

Pechev

, Inverse kinematics without matrix inversion. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA2008); 19–23 May 2008; Pasadena:IEEE; 2008.p. 2005–2012

29.

Vargas

Leite

Costa

, Overcoming kinematic singularities with the filtered inverse approach. In: Proceedings of 19th World Congress The International Federation of Automatic Control (IFAC); 24–29 August 2014; Cape Town; 2008.p. 8496–8502.

30.

Koshkouei

Zinober

, Sliding mode control of discrete-time systems. Journal of Dynamic Systems, Measurement and Control. 2000; 122:793–802. DOI: 10.1115/1.1321266