Abstract
Keywords
1. Introduction
Tactile sensing is an essential component for enabling controlled physical interaction between a robot and its surroundings. A technique known as
One approach for tactile servo control with high-resolution soft tactile sensors is to estimate the contact pose relative to an object surface using a convolutional neural network (CNN) and use this as a feedback signal: the tactile sensor can then slide over unknown curved surfaces and push unknown objects to goal locations (Lepora and Lloyd (2020, 2021); Lloyd and Lepora (2021)). However, soft tactile sensors shear after contact, which our previous work treated as an undesirable ‘nuisance variable’ that interferes with the primary goal of contact pose estimation. The new viewpoint taken in this paper is that post-contact shear is a useful attribute that can aid servoing and manipulation tasks. For example, when tracking a moving object that rotates and translates relative to a contact region, it is essential that a robot can detect and respond to both shear and normal contact interactions with the tactile sensor.
In this paper, we investigate how the surface pose estimation model can be extended to include shear effects, and we utilise these combined pose-and-shear models to develop a tactile robotic system that can be programmed for diverse non-prehensile manipulation tasks, such as object tracking, surface-following, single-arm object pushing and dual-arm (i.e. stabilised) object pushing. While we have previously demonstrated tactile surface-following and single-arm pushing, our new system is capable of performing these tasks with a continuous smooth motion instead of using discrete, position-controlled movements. Significant new capabilities such as tactile object tracking and dual-arm pushing are only possible because the tactile robotic system can respond to shearing motion.
Two technical challenges had to be overcome to realise a pose-and-shear-based tactile servoing system with a soft high-resolution tactile sensor. Firstly, slippage under shear can cause
Our main novel contributions are summarised below and pictured in Figure 1: 1. A Gaussian-density neural network model that predicts contact pose and post-contact shear with uncertainty from tactile images, and which represents the pose-and-shear as a single, unified vector that transforms under 2. A discriminative Bayesian filter that reduces the error and uncertainty of the combined pose-and-shear predictions in 3. Feedforward-feedback control methods using velocity control that are driven by tactile pose-and-shear estimation for tactile servo control, supplemented with controllers for goal-based tasks such as object pushing. 4. The application and assessment of these techniques for smoothly and accurately controlling single- and dual-armed tactile robotic systems for object tracking, surface-following, single-arm pushing and dual-arm pushing. Pose-and-shear-based tactile servo control (left) applied to four tasks (right): (a) object tracking; (b) surface-following; (c) single-arm object pushing; (d) dual-arm object pushing. Here the servo control loop for each tactile robot has: (1) a Gaussian-density network (GDN) model for predicting the contact pose and post-contact shear with uncertainty from a tactile image; (2) an

This paper is organised as follows. Section 2 gives an overview of tactile pose-and-shear estimation followed by a survey of tactile servoing and object pushing. In the
A video of the experiments is included as supplementary material and is publicly available on YouTube. We have released the data and code for this paper on github.com/dexterousrobot with a guide and summary on lepora.com/pose-and-shear.
2. Background and related work
2.1. Tactile pose-and-shear estimation
Contemporary methods for tactile pose estimation can be broadly categorised according to whether they estimate a
In early work on tactile pose estimation, Bicchi et al. proposed a theoretical model for estimating pose-and-shear information, and described a framework for designing tactile sensors that have this capability (Bicchi et al. (1993)). More specifically, their theoretical model addressed the problem of how to determine the location of a contact, the force at the interface and the moment about the contact normals.
In the context of high-resolution vision-based tactile sensors, Yuan et al. showed that the GelSight sensor can be used to estimate the normal contact pose between the sensor and an object surface, but was limited in its contact angle range due to its rather flat sensor geometry (Yuan et al. (2017)). Similarly, Lepora et al. showed that the TacTip soft biomimetic optical tactile sensor could be used to predict 2D contact poses (Lepora et al. (2017; 2019)) and, more recently, 3D contact poses (Lepora and Lloyd (2020; 2021)).
The estimation of post-contact shear is less well-explored than pose estimation. Yuan et al. showed how a GelSight sensor can be used to measure post-contact shear by including printed markers on the sensing surface (Yuan et al. (2015)). Cramphorn et al. described a similar approach for the TacTip sensor based on the shear of marker-tipped pins (Cramphorn et al. (2018)). More recent work has considered data-efficient methods of decoupling the confounding post-contact shear from the primary goal of contact pose estimation, either through Principal Component Analysis (Aquilina et al. (2019)) or the latent feature space of a CNN model (Gupta et al. (2022)). The present study takes a different approach in seeking a predictive model of the components of post-contact shear that can be used alongside a model of contact pose for tactile servo control.
2.2. Tactile servoing and object pushing
Methods for robotic tactile servoing can be grouped according to whether they control attributes in the
Historically, Berger and Khoslar first used image-based tactile feedback on the location and orientation of edges together with a feedback controller to track straight and curved edges in 2D (Berger and Khosla (1991)). Chen et al. used a task-space tactile servoing approach, using an ‘inverse tactile model’ similar in concept to a pose-based tactile servoing model, to follow straight-line and curved edges in 2D (Chen et al. (1995)). Zhang and Chen used an image-based tactile servoing approach and introduced the concept of a ‘tactile Jacobian’ to map image feature errors to task space errors (Zhang and Chen (2000)). They used their system to track straight and curved edges in 2D and to follow cylindrical and spherical surfaces in 3D. Sikka et al. drew inspiration from image-based
Later on, Li et al. advanced the tactile servoing approach of Zhang and Chen to demonstrate a wider selection of servoing tasks including 3D object tracking and surface-following (Li et al. (2013)). Lepora et al. used a TacTip soft optical tactile sensor with a bio-inspired active touch perception method and a simple proportional controller to demonstrate contour following around several complex 2D edges and ridges (Lepora et al. (2017)), following a related contour-following method with an iCub fingertip (Martinez-Hernandez et al. (2017; 2013)). Sutanto et al. used learning-from-demonstration to build a tactile servoing dynamics model and used it to demonstrate 3D contact-point tracking (Sutanto et al. (2019)). Kappassov et al. developed a task-space tactile servoing system, similar to the earlier system developed by Li et al., and used it for 3D edge following and object co-manipulation (Kappassov et al. (2020)). More recently, Lepora and Lloyd described a pose-based tactile servoing approach that uses a deep learning model to map from the tactile image space to pose space, firstly in 2D (Lepora et al. (2019)) and then in 3D (Lepora and Lloyd (2021)). They used this approach to demonstrate robotic surface and edge following on complex 2D and 3D objects.
Most current approaches for robotic object pushing also fall into two main categories:
In the case of analytical, physics-based object pushing, Mason derived a simple rule known as the
In the case of data-driven approaches, Kopicki et al.used a modular data-driven approach for predicting the motion of pushed objects (Kopicki et al. (2011)). Bauza et al. developed models that describe how an object moves in response to being pushed in different ways and embedded these models in a model-predictive control (MPC) system (Bauza et al. (2018)). Zhou et al. developed a hybrid analytical/data-driven approach that approximated the limit surface for different objects using a parametrised model (Zhou et al. (2018)). Other researchers have used deep learning to model the forward or inverse dynamics of pushed object motion (Agrawal et al. (2016); Byravan and Fox (2017); Li et al. (2018)), or to learn end-to-end control policies for pushing (Clavera et al. (2017); Dengler et al. (2022)). In general, analytical approaches are more computationally efficient and transparent in their operation than data-driven approaches, but may not perform well if their underlying assumptions and approximations do not hold in practice (Yu et al. (2016)).
While most object pushing methods rely on computer vision systems to track the pose and other state information of the pushed object, a few (including ours) use tactile sensors to perform this function. Lynch et al. were the first to employ tactile sensing to manipulate a rectangular object and circular disk on a moving conveyor belt (Lynch et al. (1992)). Jia and Erdmann used a theoretical analysis to show that the pose and motion of a planar object with known geometry can be determined using only the tactile contact information generated during pushing (Jia and Erdmann (1999)). More recently, Meier et al. used a tactile-based method for pushing an object using frictional contact with its upper surface (Meier et al. (2016)).
From a control perspective, the most similar approaches to our method for single-arm robotic pushing are the ones described by Hermans (Hermans et al. (2013)) and Krivic (Krivic and Piater (2019)). The similarities and differences are described in more detail in Lloyd and Lepora (2021), but the main difference from our method is that they both used computer vision techniques to track the state of the pushed object, rather than tactile sensing and proprioception.
3. Computational methods
3.1. Contact pose-and-shear prediction with uncertainty
3.1.1. Surface contact pose-and-shear
In previous work on tactile pose estimation (Lepora and Lloyd (2020, 2021); Lloyd and Lepora (2021)), we assumed that a surface contact pose can be represented by a 6-component vector,
In this paper, we instead train a model to estimate all six non-zero components of a surface contact pose,
Thus, in our new definition of a 1. All contacted surfaces can be locally approximated as flat. 2. All sensor-surface contacts that produce a sensor output can be approximately decomposed into an equivalent normal contact motion followed by a tangential post-contact shear motion.
In practise, we find that the tactile servo control methods apply equally well to curved objects and in situations when the sensor output depends on the time history of the combined normal and shear motion. Thus, with these two simplifying assumptions in mind, we now define the surface contact poses we use to train our pose-and-shear estimation models (Figure 2) and describe the process we use to sample and generate the data. Definition and generation of surface contact poses using a two-step process of normal contact motion followed by translational and rotational shear. Step 1a: prior to normal contact motion, the sensor is rotated by Euler angles (
We start by attaching a sensor coordinate frame {
As discussed above, the surface contact motion is assumed equivalent to one that is carried out in two stages: a normal contact motion followed by a post-contact, tangential shear motion. The normal contact motion, represented by an {
3.1.2. Training data collection
We collect data for training the pose-and-shear prediction models by using a robot arm to move the sensor into different surface contact poses with a flat surface, then apply a shear motion before recording the tactile sensor image. Each data sample consists of a tactile image together with the corresponding surface pose-and-shear in extrinsic-
The tactile images associated with surface contact poses and shears ( 1. 2. 3. 4.
The sampling of
Three distinct data sets were used to develop the pose-and-shear models: a training set of 6000 samples, a validation set of 2000 samples for model selection and hyperparameter tuning, and a test set of 2000 samples for independently verifying the model performance post training. All data were collected using a 3D-printed flat surface (VeroWhite material, shown later in Figure 6(c)).
3.1.3. Pre- and post-processing
We used the following steps to collect and pre-process the tactile images of the training, validation and test sets, and to pre-process tactile images after the model is deployed: 1. Collect at 640 × 480 pixel resolution and convert to 8-bit grayscale. 2. Crop to a 430 × 430 pixel square enclosing the circular region within which the markers are located. 3. Apply a 5 × 5 median blur to remove sensor noise. 4. Apply an adaptive threshold to binarize the image. 5. Resize the images to 128 × 128 pixels. 6. Convert the 8-bit integer pixel intensities to floating point and normalise to lie in the range [0, 1].
We also pre-processed the pose-and-shear labels so that the trained model predictions are in the correct format for the subsequent filtering and controller stages: 1. Convert pose labels from their Euler representations to 4 × 4 homogeneous matrices 2. Invert the 4 × 4 matrices, so that instead of representing sensor poses in the surface feature frames { 3. Convert the inverted matrices to exponential coordinates:
3.1.4. Convolutional neural network for pose-and-shear estimation
Following previous work on tactile pose estimation (Lepora and Lloyd (2020, 2021); Lloyd and Lepora (2021)), we consider a baseline model using a CNN with a multi-output regression head. We configured this CNN architecture to be effective for predicting pose-and-shear, resulting in a sequence of convolutional layer blocks, where each block is composed of a sequence of sub-layers: a 3 × 3 2D convolution; batch normalisation (Ioffe and Szegedy (2015)); a rectified linear unit (ReLU) activation function; and 2 × 2 max-pooling. The feature map dimensions are reduced by half at each block as we move forwards through the blocks, due to the max-pooling. Hence, we balance the progressive loss of feature resolution by doubling the number of features in consecutive layer blocks.
The output of the convolutional base feeds into a fully-connected, multi-output regression head, composed of a flatten layer, dropout layer with dropout probability
We train this CNN regression model by minimising a weighted mean-squared error (MSE) loss function, defined over
We trained these CNN regression models using the Adam optimiser with a batch size of 16 and a linear rise, polynomial decay (LRPD) learning rate schedule. In our implementation of this schedule, we initialised the learning rate to 10−5 and linearly increased it to 10−3 over 3 epochs; we then maintained it for a further epoch before decaying it to 10−7 over
3.1.5. Gaussian-density network for pose-and-shear with uncertainty
In this paper, we introduce a modification of the CNN regression head to estimate the parameters of a (Gaussian) pose-and-shear distribution, rather than produce a single-point estimate. This allows us to estimate both the surface contact pose/shear and its associated uncertainty (Figure 3). The motivation for doing this was discussed in our previous work on tactile aliasing (Lloyd et al. (2021)): if we know the uncertainty associated with a pose, this information can be used to reduce the error and uncertainty using other system components such as the Bayesian filter we describe in the next section. In our previous work, we considered a Mixture Density Network (MDN) composed of a mixture of Gaussians. In the present work, we use a single Gaussian to be consistent with assumptions for deriving the update equations for the Bayesian filter in Algorithm 2. We refer to this model as a Gaussian-Density Network (GDN) because it predicts the parameters of a multivariate Gaussian PDF that captures uncertainty in the pose-and-shear outputs. CNN with regression and GDN architectures used for surface contact pose estimation. (a) Convolutional base for CNN and GDN models. (b) Convolutional block sub-layer structure. (c) CNN multi-output regression head. (d) GDN PDF estimation head.
Specifically, we use the GDN outputs
We train the GDN model by minimising a mean negative log-likelihood loss function over the label values:
As mentioned above, the GDN model can be viewed as a single-component mixture density network (MDN), which performs a similar function to the GDN but uses a Gaussian mixture model to model the output distribution (Bishop (1994; 2006)). This is relevant because the difficulties encountered when training MDNs are well-documented and include problems such as training instability and mode collapse (Hjorth and Nabney (1999); Makansi et al. (2019)). To overcome these difficulties, we incorporated several novel extensions to our architecture, described below. 1. First, rather than directly estimating the component means and standard deviations of a multivariate Gaussian pose-and-shear distribution (assuming a diagonal covariance matrix), we instead estimate the means and 2. We introduce a new where 3. We introduce a novel
Our GDN architecture uses the same convolutional base as the original CNN architecture, but instead feeds its output through a modified GDN head that includes the enhancements discussed above. Inside the GDN head, the output of the convolutional base is flattened and replicated to a set of 12 dropout layers, one for each of the 12 network outputs (
We trained the GDN model the same way as the CNN model, using the Adam optimiser with a batch size of 16 and the same LRPD learning rate schedule. As before, we terminated the training process when the validation loss reached its minimum value over a ‘patience’ of 25 epochs.
3.2. Bayesian filtering of pose-and-shear
3.2.1. Discriminative Bayesian filtering
We model the sequential pose-and-shear estimation problem using a probabilistic state-space model (Figure 4) that is defined by two interrelated sequences of conditional PDFs. This type of state-space model and inference equations form the basis of many Bayesian filtering algorithms, including the Kalman filter (Kalman (1960); Kalman and Bucy (1961)), extended Kalman filter (EKF) (see Gelb (1974)), unscented Kalman filter (UKF) (Julier et al. (1995); Julier and Uhlmann (1997)) and particle filters (see Särkkä (2013)). To simplify the notation in this section, we use lower-case italic letters to represent continuous random variables, regardless of whether they are scalars, vectors or Probabilistic state-space model used to describe the relationship between surface contact poses and shears (states) and tactile sensor images (observations) in our Bayesian filter.
The
The (conditional) PDF over states
The observation model can be viewed as a
Then we can reinterpret Equations (7) and (11) as a discriminative Bayesian filter that updates a filtered state PDF
For Kalman filters, it is standard to assume Gaussian PDFs. Then the filter is equivalent to updating the means and covariance matrix for
A similar approach has been used to derive some discriminative variations of the Kalman filter, referred to as the Discriminative Kalman Filter (DKF) and robust DKF (Burkhart et al. (2020)). However, in that work the authors modified the inference equations
3.2.2. Discriminative Bayesian filtering in SE(3)
We now describe how the discriminative Bayesian filtering from the previous section is implemented on a sequence of
We assume a sequence of observations of the surface pose-and-shear with uncertainty
The sequence of sensor poses
The observations and states described above are combined using an
The derivation of the prediction step of the
3.3. Feedforward-feedback control of pose-and-shear
3.3.1. Feedforward-feedback control in SE(3)
In our past work, we made extensive use of feedback control systems for pose-based tactile servo control (Lepora and Lloyd (2021); Lloyd and Lepora (2021)). For the feedback control, we defined the pose error as the
To define the control operations, we project this pose error into the exponential coordinates for the Lie algebra
This control signal
In the case of multi-input multi-output (MIMO) proportional control, we use equation (18) to map the observed pose
where
Since our system operates in discrete time, we typically use simple backward-Euler approximations for computing the integral and derivative errors. To reduce noise in the error signal before computing the derivative, we smooth the error using an exponentially-weighted moving average filter with decay coefficient 0.5. We also sometimes clip the integral error between pre-defined limits to mitigate any integral wind-up problems, and clip the output to limit the control signal range. Details of gain coefficients, error or output clipping ranges, feedback reference poses and feedforward velocities (velocity twists) are provided in Appendix D.
3.3.2. Tactile servoing controller
For object tracking and surface-following, we use a tactile servoing controller (Figure 5, top part only) that performs MIMO feedforward-feedback PID control as described in the previous section (see Equation (19)). The goal of this controller is to align the sensor with a reference contact pose in the surface feature frame, while at the same time moving it with a feedforward velocity (set to zero for object tracking) specified in relation to the desired pose. The reference contact pose is usually set so that the sensor is normal to the surface at a fixed contact depth. For surface-following, the feedforward velocity is usually set to be tangential to the surface. The overall effect is that the sensor moves smoothly to track or follow a surface while maintaining normal contact at a constant depth. (a) Tactile servoing controller used for all tasks, which is the sole controller for the object tracking and surface-following tasks. (b) For the tactile pushing controller the tactile servoing controller is supplemented with a target alignment controller.
In each control cycle, we start by computing the
For surface-following, we set the reference sensor frame so that its
3.3.3. Tactile pushing controller
For pushing objects across a surface towards a target, we augment the tactile servoing controller with an additional feedback control element that we refer to as the
The object pushing target is specified as a target pose
As in our previous work on pushing (Lloyd and Lepora (2021)), we zero the output of the target alignment controller when the sensor is less than a pre-defined distance
3.3.4. Single-arm and dual-arm control configurations
The tactile servoing and object pushing controllers described above can either be used in isolation to control a single robot arm for object tracking, surface-following or single-arm pushing tasks, or they can be used in combination to control multiple robot arms. In the dual-arm pushing task, one arm is controlled by an active/leader pushing controller, while the second arm is controlled using a passive/follower object tracking controller. This dual-arm configuration allows the active pushing arm to control the movement of the object towards the target while the second passive arm helps to stabilise the object to prevent it from toppling.
Another way of viewing the operation of these multi-arm configurations is that each robot arm is attempting to follow a control signal via the feedforward path, while simultaneously trying to satisfy the constraints imposed by the reference contact pose specified in the feedback path. In this scenario, the feedforward control signals can either be generated separately for each arm in a decentralised approach or they can be generated in a centralised, more coordinated manner. The ‘leader-follower’ configuration we use in our dual-arm pushing task is an example of the decentralised approach.
4. Tactile robot experimental platform
4.1. Dual-arm robot platform
For our experiments and demonstrations, we use a dual robot arm system with two Franka Emika Panda, 7 degree-of-freedom (DoF) robot arms. The robot arms are mounted on custom aluminium trolleys with base plates, which are bolted together so that the arms are separated by 1.0 m at their bases and can be used individually or together for collaborative tasks (Figure 1(a)-(d)). Depending on the task, the robots can either be fitted with a TacTip tactile sensor (Figure 6) or a stimulus adaptor as an end-effector (Figure 7(a)-(b)). The tactile sensor can be mounted in a standard downwards-pointing configuration or at a right angle using an adaptor mount (Figure 6(b)). TacTip soft biomimetic optical tactile sensor, with (a) sensor mount for a robot arm and (b) right-angled mount. (c) Robot-arm mounted tactile sensor collecting training data. End-effector mounts used for object tracking: (a) Flat mount with non-slip foam pads (used for Rubik’s cube and mustard bottle), (b) Concave curved mount (used for foam ball). Objects: (c) Rubik’s cube, (d) Mustard bottle, (e) Soft foam ball. Object (d) is from the YCB Object set (Calli et al. (2015)).

4.2. Tactile sensor
The TacTip soft biomimetic optical tactile sensor (Figure 6) has been used in a wide variety of robotic touch applications and integrated into many robot hands (for reviews, see Ward-Cherrier et al. (2018); Lepora (2021)). The 3D-printed sensor tip consists of a black, gel-filled, rubber-like skin with an internal array of pins capped with white markers, which are imaged with a standard USB camera and LED lighting contained within the sensor body. The TacTip is considered biomimetic because these pins mimic the epidermal papillae structure in human skin on the boundary of the epidermal (outer) and dermal (inner) skin layers (Chorley et al. (2009)), as verified in a comparison to real sensory neuronal data on matched stimuli (Pestell et al. (2022)). Practically, the use of marker tips on pins means the sensor is highly sensitive to both normal contact and shear, because the pins act as levers that amplify small contacts into larger patterns of shear.
The TacTip is well-suited for investigating tactile control because its 3D-printed outer surface (Agilus 30, Stratesys) is fairly robust to abrasion and tears, while also being inexpensive and easy to replace. The soft inner gel (Techsil, Shore A hardness 15) gives a conformability similar to the soft parts of the human hand, making the sensor responsive and forgiving of errors in physical contact. Many variations of the TacTip have been created, from fingertip-sized sensors for anthropomorphic robot hands (Ford et al. (2023)) to the DigiTac version of the low-cost DIGIT (Lepora et al. (2022)).
In this work, we use one or two TacTip sensors with 40 mm diameter hemispherical tips containing 331 marker-tipped pins arranged in a circular array. As in other work using the TacTip with deep learning, we use the raw sensor image with minimal pre-processing as input to a neural network model.
4.3. Test objects
Various test objects and mounts (Figures 6-10) were used in the experiments reported in the results sections of this paper. For training/validation/testing tactile data collection, the tactile sensor was mounted vertically on the end-effector of the arm and brought into contact with a flat 3D-printed surface mounted to the base plate (Figure 6(c)). For the object tracking experiments, end-effector mounts were used to attach flat or concave curved objects to the end of the leader arm (Figure 7(a)-(b)), using a similar flat surface to the one used to collect training data and a set of everyday objects (Rubik’s cube, mustard bottle and soft foam ball) held against an adaptor by the tactile sensor mounted on the second robot arm (Figure 7(c)-(e)). For the surface-following experiments, we used a 3D-printed curved ramp and hemisphere (Figure 8) attached to mounts bolted onto the base plate. For the single-arm pushing experiments, we used four distinct plastic regular geometric objects (Figure 9). For the dual-arm pushing experiments, we used double-height (stacked) versions of the four geometric objects and five everyday objects (mustard bottle, cream cleaner bottle, window cleaner spray bottle, glass bottle and large coffee tin) as tall objects that are challenging because they usually topple when pushed (Figure 10). 3D-printed objects used in surface-following experiments: (a) curved ramp, (b) hemispherical dome. Regular geometric objects used in single-arm pushing experiments: (a) Large blue square prism (479 g), (b) Blue circular prism (363 g), (c) Small red square prism (264 g), (d) Yellow hexagonal prism (310 g). Tall (double-height) geometric objects used in dual-arm object pushing experiments: (a) Large blue square prism (566 g), (b) Blue circular prism (436 g), (c) Small red square prism (324 g), (d) Yellow hexagonal prism (373 g). Tall everyday objects used in dual-arm object pushing experiments: (e) Mustard bottle (237 g), (f) Cream cleaner bottle (161 g), (g) Windex window cleaner spray bottle (339 g), (h) Glass bottle (641 g), (i) Large coffee tin (214 g). Objects (e), (f), (g) and (i) are from the YCB Object set (Calli et al. (2015)).


4.4. Software infrastructure
We control the robot arms using a layered software API built on top of the
The OpenCV library (version 4.5.2) is used to capture and process images from the tactile sensor, and TensorFlow (version 2.4) with the included Keras API to develop our neural network models for those tactile images. We also use the
We run all of the software components in a Pyro5 distributed object environment on an Ubuntu 18.04 desktop PC. The Pyro5 environment allows us to run several communicating python processes in parallel to ensure real-time performance. Using this approach, we were able to run the low-level 1 kHz control loops, image capture, neural network inference (but not training) and high-level control loops for both robot arms and tactile sensors on a single PC.
5. Experimental results
5.1. Pose-and-shear information in tactile images
In the first experiment, we examine images from the tactile sensor used here (the TacTip) to check that the six considered components of contact pose-and-shear are represented in the tactile data. To do this, we visualise the marker densities of tactile images using a kernel density model (see Silverman (2018)), with Gaussian kernels located at marker centroids and a constant kernel width (15 pixels) equal to the mean distance between adjacent markers.
From these visualisations, we were confident that the sensor images contained enough information to produce these estimates (Figure 11). The size of the low-density blue region in the centre of the image depends on the contact depth, while its location in the image depends on the sensor orientation. Changes in marker density around the periphery of the sensor depend on the post-contact translational shear, and subtler changes within the contact region depend on the post-contact rotational shear. These are a type of feature that CNNs can easily replicate if required by applying a sequence of convolution and down-sampling operations. Visualisation of tactile images as corresponding changes in marker density with respect to an undeformed tactile image across the relative surface contact poses and shears annotated below the marker density images.
5.2. Neural network-based pose-and-shear estimation
Overall MSE / mean NLL loss and pose component MAEs for 10 CNN regression and 10 GDN models (mean values ± standard deviation across 10 models). The lowest mean MAE values for each component are highlighted in bold.
The results show that the GDN model produces lower component MAEs than the CNN regression model when evaluated on the test data set. An explanation for this is that the mean NLL loss function used to train the GDN model directly has a variable, estimated uncertainty for each pose component, which in effect increases the error weighting on more confident estimates and decreases the error weighting on less confident ones (see Section 3.1.5). This contrasts with the MSE loss function used to train the CNN regression model, which implicitly assumes a constant, pre-specified uncertainty for each pose component and hence is unable to incorporate variations in the estimated uncertainty.
We visualise the distribution of test set errors by plotting the estimated pose components against the ground-truth pose components for the best-performing model of each type (Figure 12). For the GDN model, we also colour each point according to the precision (inverse variance) estimated by the model for the corresponding pose component. Points coloured in red denote high precision (low uncertainty) estimates and points coloured in blue denote low precision (high uncertainty) estimates. Distribution of test errors for best-performing (lowest loss) CNN regression and GDN models. Predicted pose-and-shear values are plotted against their actual values. GDN estimates are coloured by their predicted precision (reciprocal variance).
With reference to these plots, we make the following observations. Firstly, while the errors are significant across all pose components estimated by both models, they are larger for the shear-related components than the normal contact ones. This could be due to aliasing effects, which are more prevalent during shear motion than normal contact motion; for example, at small contact depths, the tactile sensor is prone to slip under translational shear, which would lead to a similar tactile image for a range of shear values (see Lloyd et al. (2021) for an explanation of the effects of tactile aliasing). Secondly, the GDN estimates appear more accurate than the CNN regression estimates, in that the distribution of predicted values is more concentrated around the true values for the GDN model than the CNN regression model, which is consistent with the statistical results presented in Table 1. Finally, the precision (uncertainty) values estimated by the GDN model appear to correlate with the errors, in the sense that the red points tend to lie closer to the imaginary ground-truth line than the blue points. In the following section, we consider the impact that our
5.3. Error and uncertainty reduction using an SE(3) Bayesian filter
To evaluate the effect of the
Since the pose changes between consecutive sensor contacts can be computed from the test data labels, we can compute the state dynamics transformation
The discriminative Bayesian filter (Algorithm 2) was then applied to the GDN pose estimates generated in response to the sequence of test inputs, using equation (23) to compute the
As in the single-prediction results, we improved statistical robustness by applying the Bayesian filter to each of the 10 GDN models we trained from different random weight initialisations and evaluated the mean and standard deviation component MAEs for all models on the test data set. We repeated the experiment for four different noise levels
Pose component MAEs for 10 GDN models followed by Bayesian filter with different state dynamics noise levels,
To show explicitly how the GDN model depends on the state dynamics noise, we visualise the distribution of test sequence errors by plotting the filtered pose-and-shear predictions against the actual components for the best-performing GDN model at the different noise levels (Figure 13). With reference to these plots, we make the following observations. Firstly, the accuracy of the filtered estimates increases as the state dynamics noise level Distribution of test errors for best-performing (lowest loss) GDN model followed by Bayesian filter with different state dynamics noise levels 
In the above analysis, the state dynamics noise level
5.4. Task 1: Object pose tracking
In this experiment, we show how our tactile robotic system can be configured to track the pose of a moving object. We demonstrate this capability using two robot arms: the first arm (the
There are two parts to this experiment. In the first part, we show that the follower arm can track changes to individual pose components of a moving object. More specifically, we track translational motion along the
For both parts of this experiment, we used the controller parameters listed in Table 6 (Appendix D) in the tactile servoing controller (Figure 5(a), top controller only). The feedback reference pose specifies that the tactile sensor should be orientated normal to the contacted surface at a contact depth of 6 mm. Since the feedforward velocity is not required for object tracking tasks, it is set to zero.
5.4.1. Tracking changes to single pose components
In the first part of the experiment, we initially positioned the follower arm tactile sensor in direct contact with the leader arm flat surface at a contact depth of approximately 6 mm and so that its central axis was normal to the flat surface. We then used the leader robot to move the flat surface through a sequence of 200 mm translations along the − Using the follower arm to track changes to individual components of the leader arm pose. (a) Tracking sequence: translation along −
For plots that relate to translational pose components (Figure 14(b))), we removed variation in the rotational components from the response of the follower arm. Similarly, for plots that relate to rotational pose components (Figures 14(c)-(d)), we removed any variation in the translational components from the response of the follower arm. This allows us to focus on individual pose components when evaluating how the follower arm responds to changes in leader arm pose. If we did not do this, but instead plotted the raw unaltered poses, it would make it extremely difficult to compare individual pose components of the leader and follower arms at any point in time, particularly for the rotational components (
The pose-trajectory plots (Figures 14(b)-(d)) show that the tactile sensor on the follower arm tracks changes to individual pose components of the flat surface on the end-effector of the leader arm. The coordinates are for the tool centre point of each robot, which on the leader arm is in the centre of the flat surface and on follower arm is in the centre of the sensor tip. For translations along −
5.4.2. Tracking simultaneous changes to all pose components
In the second part of the experiment, we moved the leader robot arm in a more complex velocity trajectory
In addition to tracking a flat surface attached to the leader arm, as in the first part of the experiment above, in this second part we also tracked several everyday objects (Rubik’s cube, mustard bottle and soft foam ball; Figure 7) held between the two arms as they followed the leader arm trajectory. Again, we recorded the end-effector poses and corresponding time stamps for both robot arms at the start of each control cycle to match up the corresponding poses from both robots and plot them after the experiment had finished.
The time-lapse photos and pose-trajectory plots (Figure 15) show that the follower arm tracks simultaneous changes to all components of the leader arm pose as the leader arm follows a complex periodic trajectory. Moreover, it can also hold an object against the leader arm while it is following its trajectory, thereby implementing a form of 3D object manipulation guided by the leader arm. Using the follower arm to track simultaneous changes to all components of the leader arm pose. (a) Tracking sequences for direct (flat) surface contact, Rubik’s cube, mustard bottle and soft foam ball. Leader and follower robot pose-trajectory for a single period of the leader arm trajectory for, (b) direct surface contact, (c) Rubik’s cube, (d) mustard bottle, and (e) soft foam ball. In (c)-(e), the numbered points indicate corresponding poses of the two robot arms at several points in the pose-trajectory.
The leader pose-trajectory is the same for all objects and forms an approx. 150 mm diameter circle with the
5.5. Task 2: Surface-following
In this experiment, we show how our tactile robotic system can be configured for surface-following tasks for two scenarios: traversing a straight-line projection on the surface of a curved ramp, and traversing a sequence of eight straight-line projections outwards from the centre of a hemispherical dome at 45° intervals. The surfaces used in these two scenarios are shown in Figure 8.
For both parts of this experiment, we use the tactile servoing controller described in Section 3.3.2 with the controller parameters listed in Table 7 of Appendix D. The feedback reference pose specifies that the sensor should be orientated normal to the contacted surface at a contact depth of 3 mm. Since, the feedforward velocity depends on the particular surface-following task being performed, it is specified in each of the following subsections that describe each task.
5.5.1. Surface-following on a curved ramp
For this first surface-following task, we initially positioned the robot arm so that the tactile sensor made contact with the highest part of the curved ramp at a contact depth of 3 mm with the
For this surface-following task, the time-lapse photos and pose-trajectory plots show that the robot arm successfully follows this type of gently curving surface while the sensor remains in contact with it and orientated normal to the surface (Figure 16). The pose-trajectory is almost straight along the Tactile servoing to follow a curved ramp surface. (a) Time-lapse photos. (b)-(d) Robot arm end-effector pose-trajectory. The red/green/blue arrows in this and the other figures correspond to about one every second, although precise timings vary due to factors such as individual controller details and the real-time processing requirements.
5.5.2. Surface-following on a hemispherical dome
For this second surface-following task, we initially positioned the robot arm so that the tactile sensor made contact with the centre of the dome at a contact depth of 3 mm, with the sensor
When following the
For this surface-following task, the time-lapse photos and pose-trajectory plots show that the robot arm end-effector successfully follows this curved surface while the sensor remains in contact and orientated normal to the surface (Figure 17). Tactile servoing to follow radial paths from the centre of the surface of a hemispherical dome. (a) Time-lapse photos. (b)-(d) Robot arm end-effector pose trajectories. In (b), the numbered points correspond to different radial paths over the surface.
5.6. Task 3: Single-arm object pushing
In our first object pushing experiment, we demonstrate how our tactile robotic system can be used for single-arm object pushing tasks, similar to those demonstrated in earlier work (Lloyd and Lepora (2021)). A major improvement on that earlier work is that the present system can push an object in a smooth continuous manner rather than the previous discrete point motion, because we now use velocity control rather than the position control used previously. We also now show that our new system can push objects over surfaces with different frictional properties, considering both medium-density fibreboard (MDF) and a soft foam surface.
For the single-arm pushing configuration, we mounted the tactile sensor as an end-effector of the robot arm using a right-angle adaptor (Figure 1(c)) so that it can be moved parallel to the surface during the pushing sequence without the arm getting caught on the surface. At the start of each trial, we positioned the tactile sensor end-effector 45 mm above the surface with central axis parallel to the
For this experiment, we pushed several regular geometric objects (Figure 9) across the MDF and foam surface. These objects were also used in previous work (Lloyd and Lepora (2021)), except we do not use the triangular prism because it cannot be used for dual-arm object pushing below with both arms contacting a flat surface.
For each trial of the experiment, the robot arm was used to push the object towards the target at position (
Single-arm pushing final target error (mean ± standard deviation perpendicular distance from target to sensor-object contact normal on completion of push sequence). All statistics are computed over 5 independent trials.
The push sequences are visualised by plotting the end-effector poses in 2D overlaid with approximate poses of the pushed objects at the start and finish points of the trajectory (Figure 18). Our tactile robotic system can push all these regular geometric objects over foam and MDF surfaces to the target, approaching within 10 mm for the blue circular prism and within 5 mm for the other objects (Table 3). Using a single robot arm to push regular geometric objects across: (a) a foam surface (time-lapse photos show sequence for blue square prism), and (b) an MDF surface (time-lapse photos show sequence for blue circular prism). In the 2D pose plots, the target is identified by a small red circle and dot.
5.7. Task 4: Dual-arm object pushing
In the second pushing experiment, we use a follower robot arm to constrain and stabilise objects as they are pushed across a flat surface by the leader arm. In many ways, this configuration is similar to that used in the object tracking experiment (Section 5.4.2), where a leader robot arm moved an object in a complex trajectory while a follower arm tracked its motion and held the object with the first arm.
The experiment is split into two parts. In the first part, we use two robot arms to push the objects used in the previous single-arm experiment across foam and MDF surfaces. In the second part of the experiment, we replace the original set of geometric objects with a set of taller, double-height versions together with several taller everyday objects (e.g. bottles and containers). These taller objects cannot be pushed by a single robot arm without toppling over, so the second stabilising follower arm is essential for the task.
For this dual-arm configuration, we mounted tactile sensors on both robot arms using right-angle adaptors (see Figure 1(d)). At the start of each trial, the leader arm and object were positioned as they were positioned at the start of each single-arm pushing trial. Then we positioned the follower arm so that its tactile sensor was approximately opposite the leader arm tactile sensor and normal to the opposite contacted surface. During each trial, we used the leader robot arm to push the object towards the same target as before, at position (
To control the leader robot arm, we used the same pushing controller and parameters as for the single-arm configuration. To control the stabilising follower arm, we used the tactile servoing controller described in Section 3.3.2 with the parameters listed in Table 9 of Appendix D.
During each trial for both parts of the experiment, we recorded the end-effector poses and corresponding time stamps at each control cycle to match up the corresponding poses at different trajectory points for plotting.
5.7.1. Pushing regular geometric objects
In the first part of the dual-arm experiment, we used two robot arms to push the same geometric objects we pushed in the single-arm experiment (Figure 19). We repeated the experiment five times for each object and then computed the mean ± standard deviation target error across all five trials (Table 4). Using a leader and follower robot arm to push regular geometric objects across: (a) a foam surface (time-lapse photos show sequence for red square prism), and (b) an MDF surface (time-lapse photos show sequence for yellow hexagonal prism). In the 2D pose plots, the target is identified by a small red circle and dot. Dual-arm pushing target error for short geometric objects (mean ± standard deviation) of the distance from target to sensor-object contact normal on completion of push sequence. All statistics are over 5 independent trials.
We visualised examples of the push sequences by plotting the end-effector poses of both robot arms in 2D and overlaid the approximate poses of the pushed objects at the start and finish points of the trajectory (Figure 19).
The results in Table 4 and Figure 19 show that our dual-arm system can push the regular geometric objects over foam and MDF surfaces, approaching the target to within less than 5 mm for all objects. In contrast to the results for the single-arm configuration, the accuracy achieved for the blue circular prism did not appear much worse than for the other objects. In fact, for the MDF surface, the accuracy obtained for the circular prism was slightly better than the other objects.
5.7.2. Pushing tall objects that are prone to toppling
In the second part of the dual-arm experiment, we used the two robot arms to push a set of taller (double-height) geometric objects and tall everyday objects (Figure 10) across a surface.
For this part of the experiment, we found that we needed to modify the (feedback) reference contact pose used in the pushing controller of the leader robot to (0.5,0,0,0,0,0) and the reference contact pose used in the servoing controller of the follower robot to (−0.5, 0, 3, 0, 0, 0). These modified poses only differ from their defaults (Tables 8 and 9) by 0.5 mm in the first components. The effect is to apply a slight downward force on the pushed side of the object and slight upward force on the stabilised side. This helps prevent these taller objects from catching their leading edges on the surface as they are being pushed. Even so, we were not able to push these taller objects across the foam surface without their leading edges catching, and so we could only perform this part of the experiment on the harder MDF surface.
Once again, we visualised examples of the push sequences by plotting the end-effector poses of both robot arms in 2D and overlaid the approximate poses of the pushed objects at the start and finish points of the trajectory (Figure 20). Using a leader and follower robot arm to push tall objects across an MDF surface: (a) tall geometric objects (time-lapse photos show sequence for tall blue square prism), and (b) tall everyday objects (time-lapse photos show sequence for mustard bottle). In the 2D pose plots, the target is identified by a small red circle and dot.
Dual-arm pushing target error for tall objects (mean ± standard deviation perpendicular distance from target to sensor-object contact normal on completion of push sequence). All statistics are over 5 independent trials.
6. Discussion and limitations
In this paper, we proposed and evaluated a tactile robotic system that uses contact pose and post-contact shear estimation to facilitate object tracking, surface-following, and single- and dual-arm object pushing using various configurations of velocity-based control. Our tactile robotic system has two key aspects that enable its generality and ease of control: (a) it estimates both contact pose and post-contact shear, and (b) it enables smooth continuous control of the robot arm by using these estimates to control velocity directly. These aspects enable the robot arm to track objects in six degrees of freedom; this control is either the primary goal, such as in object tracking, or as a secondary constraint on a primary goal, such as when pushing an object along a trajectory while maintaining a contact pose. To achieve these goals, we employed
6.1. Contact pose and post-contact shear estimation
A key simplifying assumption was to merge the contact pose and the post-contact shear into a unified surface contact pose-and-shear vector (
This combined surface contact pose-and-shear vector can be estimated using multi-output regression CNNs directly from tactile sensor images, using methods similar to previous work on tactile pose estimation of surfaces and edges (Lepora and Lloyd (2020)). However, for the shear components, the estimates are inaccurate with a larger error even after hyperparameter tuning, particularly for the rotational
Consequently, we developed a Gaussian-density network (GDN) model that combines the CNN base (feature encoding) architecture with output layers that predict both an estimate of the mean and its uncertainty for each pose-and-shear component (Section 3.1.5). These estimates are slightly more accurate with the GDN model than with the regression CNN model (Table 1), but more importantly the predicted uncertainties become lower as the predicted means fall closer to the ground truths (Figure 12). Hence, the GDN model predictions of the means and uncertainties are suitable for Bayesian filtering to reduce error and uncertainty in a sequence of pose-and-shear estimates.
Therefore, we proposed a novel
Both the GDN model and the
The most obvious limitation of our pose-and-shear estimation methods is the inaccurate CNN regression and GDN model performance on the shear components (Figure 12). We believe that much of this estimation error is due to tactile aliasing (Lloyd et al. (2021)), whereby similar tactile images in the training set become associated with very different shear labels. Specifically, when the sensor is sheared sufficiently after contacting a surface, it can slip across the surface to result in similar tactile images for a range of post-contact shear labels. If we could prevent this slippage, the complexity of the system might be reduced as the GDN and Bayesian filter could become redundant. However, to do this, the training data would need restricting to samples where slip does not occur, which may be difficult to arrange in practice and could overly restrict the model’s applicability, for example, just to large contact depths. That said, the TacTip is known to be effective at detecting slip (James and Lepora (2020); James et al. (2018)), which potentially could be used to minimise slip in the data collection or provide a label of slip occurrence. Our expectation is that the single-step errors may be reduced but Bayesian filtering will still be needed to reduce the error for accurate control.
Note that including further variation in the trajectories during training could also improve the system performance by giving better model generalisation during the task. At present, our training data collection (Algorithm 1) first moves the sensor normal to the surface to make contact then parallel with the surface to produce shear. Introducing motions that are more like those during the task could give better model predictions, such as trajectories that vary in shape and have both normal and parallel components of motion while in contact. However, the view taken in this paper is that the issue of aliasing from slippage is of greater initial concern and thus the primary issue to focus upon.
Another limitation is that in this study we concentrated on estimating contact poses and shears with flat or gently curving surfaces. Clearly, there is a much wider range of surface features that could be tracked, such as following around edges (Lepora et al. (2019); Lepora and Lloyd (2021)). In principle, one can train pose estimation models on other object features, for example, to predict a 5-component contact pose with a straight edge. However, it would then not be possible to combine a contact pose with a 3-component post-contact shear to form a single 6D pose. Even so, we could still use a GDN model to predict both the contact pose and the post-contact shear motion simultaneously. In this scenario, we would need to combine the outputs of two feedforward-feedback pose controllers (e.g. one for the contact pose and one for post-contact shear), and there would be subtleties to be addressed in how this would be best realised to transform appropriately under
6.2. Experimental servo control task performance
In developing our new tactile robotic system, a primary objective was to achieve smooth and continuous motion of the robot arm using velocity control driven by tactile pose-and-shear estimation. We accomplished this aim by updating the velocity of the end-effector during each control cycle, instead of updating its pose-based on tactile pose estimation as considered previously (Lepora and Lloyd (2021); Lloyd and Lepora (2021)). The underlying tactile servoing uses an MIMO PID feedback controller on the
The pose-and-shear-based tactile servo controller was applied successfully to four distinct tasks: (1)
Three of these four tasks were made possible by the tactile robotic system’s ability to estimate post-contact shear motion in addition to the contact pose. For object pose tracking, the shear motion is essential to track tangential and rotational motion with respect to the contacted surface while maintaining contact, as is visible in the experiments (Figures 14, 15). For single-arm object pushing, controlling shear is essential both to maintain contact with the object and to steer the object via a tangential motion (Figure 18). Likewise, for dual-arm object pushing, estimating post-contact shear is essential for the follower arm to remain in contact (Figure 19) and hold the tall objects to prevent them being toppled (Figure 20).
For the other task of surface-following, we did not need to control shear to complete the task, so the corresponding gains were set to zero; nevertheless, the pose-and-shear components are mixed in the Bayesian filter over
One limitation of the tactile robotic system is that we could only successfully push tall objects with two arms on the smooth (MDF) surface, as they kept catching on the foam surface. This is partly due to the nature of the task, as humans can struggle with this too, before they adopt a strategy of partially lifting the object. In principle, the tactile controllers could do this too, but this would be a new task of guiding a lifted object, which is beyond the scope of this investigation.
Another limitation of the present system is the absence of a planning component, which hinders its ability to anticipate and prevent undesired situations, such as collisions between robot arms, or motion trajectories that approach or reach joint limits or singularities. Implementing this planning capability would also be beneficial in scenarios where the system is unable to achieve a global task objective by following a local control objective, such as non-holonomic object pushing and manipulation tasks where an object must be rotated to a target orientation while also being moved to a target position.
For future developments of this type of tactile robotic system, we believe there are many more manipulation tasks that can be achieved by enabling both robot arms to operate in an active configuration, where they are both functioning as leaders and followers to some extent. This would broaden their capacity to collaborate, particularly on tasks relating to more complex types of tactile-enabled object manipulation. Such tasks could span from guided manipulation of an object to a goal pose or insertion/assembly of one object into/onto another, to more general tasks involving multiple tactile sensors to enable fully-dexterous manipulation with pairs of tactile grippers or multi-fingered tactile robot hands.
Supplemental Material
Footnotes
Declaration of conflicting interests
Funding
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
