Abstract
Introduction
Hand-eye systems is widely used in robotics applications, which include two types: one is an eye-in-hand system (EIHS) that has cameras installed on and moved with hands, and the other is an eye-to-hand system (EIHS) that has cameras that do not move with hands (Flandin 2000).
EIHS is very popular in the area of industrial robotics. When a manipulator approaches a target, the distance between the camera and the target is reduced, and the measurement error of the camera is decreased. Visual control methods in an EIHS are divided into three types, namely image-based, position-based or hybrid (a combination of both). The image-based visual control method can effectively eliminate camera calibration error because of the closed loop established in the image space. On the other hand, the absolute measurement error in position-based visual control is dramatically reduced while a manipulator is close to a target. The same situation happens under the hybrid visual control method (Hager 1996, Chaumette 2000, Corke 2000, Zhu 2000, Wells 2001). However, EIHS has a vital drawback, i.e. the object cannot be guaranteed in the view field of the cameras at all time, especially during the pose adjustment of the hand at a long range (Hager 1996).
In contrast, EIHS can be effectively used in humanoid robots and mobile manipulators that operate in a large work space. When the robot is far from a target, it travels toward it and stops at a close range. Then, according to visual measurements, the manipulator approaches the target and manipulates it. To ensure that the end-effector can reach the target accurately, some researchers have designed special marks which are installed on the end-effector and the target (Han 2002, Cardenas 2003). The approaching task is realized through closed loop control of end-effectors. However, because the target or markers may be partially blocked during approaching or manipulation, image-based or hybrid visual control methods may not be able to bring the manipulator to the target accurately.
As we know, the position of an object in 3D space can be calculated from two image points using stereo cameras and according to the projecting view lines. The lack of constraints, errors in calibration and errors in image coordinates of matching points result in large errors during object positioning and pose estimation. By using shape constraints of an object and its multiple imaging points, positioning accuracy, especially pose estimation accuracy, can be increased and the influence of the last factor can be partly eliminated (Bartoli 2001). By combining EIHS and EIHS, a humanoid robot could use its hands to reach and manipulate an object accurately.
In this paper, the advantages of both eye-to-hand and eye-in-hand systems are fully exploited in the development of a new positioning method. The blocking problem for the eye-to-hand system is effectively avoided since cameras on the head are active. The problem of losing targets in the field of view for an eye-in-hand system is resolved, and end-effectors only adjust their position in a small range. The rest of this paper is organized as follows. Section 2 introduces our humanoid robot and the four-stage process for finding and manipulating a valve. The camera models are described in Section 3. Section 4 proposes a new visual positioning method based on rectangle constraints, which accurately provides the position and pose of the valve. System calibration is conducted in Section 5 to verify the accuracy of the proposed positioning method. Section 6 presents the application experiment that is designed for the humanoid robot to approach and operate the valve autonomously, and results show the effectiveness of the proposed method. Finally, Section 7 provides a brief conclusion.
The robot and its control strategy
As shown in Fig. 1, our humanoid robot consists of a head, a body with two arms and a wheeled mobile base. The robot body has three degrees of freedom (DOFs), i.e. twist, pitch and yaw. The two arms/manipulators have six DOFs and are fixed, one on each side of the body. Each has an end-effector as its hand, and its wrist is equipped with a mini camera and force sensors. Note that we treat the end-effect, gripper and hand as the same in this paper from now on without further explanation.

The humanoid robot
The robot head has two cameras as eyes and a PC104 computer to process images used to position the valve. Once the robot finds the valve, it moves towards it and operates it using its hands, as shown in Fig. 2. Operations include turning on or turning off the valve. These operations can be remotely controlled by an operator using audio commands sent via radio.

Valve with a rectangle mark
The process of finding and operating the valve consists of four stages as follows:
Stage 1 – The robot first uses its stereo vision to estimate the rough position of the valve relative to its own position in order to approach the valve. At this stage, the centre of the image area for the red colour marker is selected as the feature point, and the pose of the valve is not important. When the distance between the valve and the robot is less than two meters, the first stage of the positioning method is ended and the second stage begins. A new strategy is developed for measuring the position and pose of the valve, in the robot frame, based on the shape constraint of the marker. Stage 2 – According to the position and pose of the valve in the robot frame, the robot moves near to it at a range that is reachable by its arm. The position and pose of the valve, calculated at the end of the 2nd stage, is used for the movement control of its arm in the 3rd stage of the positioning method. The given pose of the end-effector of the robot arm is calculated, and is kept for later stages. Stage 3 – The position that the hand should reach at this stage is calculated according to the position of the valve (by considering the position of the mark and handles). Based on kinematics and inverse kinematics, the hand is controlled to move to the handle while the camera in the hand measures the green colour image size of the handle marker. It will stop when the marker size is large enough or a given position is reached. Stage 4 – An image-based visual servoing method is adopted to guide the end-effector to reach and catch the handle. Finally hybrid control with force and position is employed to rotate the valve using two hands.
With regard to control, many methods are employed in the process described above. The control methods in the 1st and 2nd stages employ position-based visual servoing. Control in the 3rd stage employs models and control in the 4th stage involves image-based visual servoing. The position-based visual servoing methods in the 1st and 2nd stage and the model based control method in the 3rd stage are traditional. They are omitted here because of length limitation. The pose of the valve, given at the end of the 2nd stage, is an important parameter because it ensures that the end-effector can catch the handle with correct orientation. The visual positioning method in the 2nd stage will be described in the next section.
To enlarge the field of view, 8mm focus lenses are selected for the cameras in the robot head. However, this kind of lenses has the distortion problem, which needs to be corrected. In this research, the process of distortion correction is carried out by simply changing a non-linear image to a linear one. In other words, the imaging curve of a line should be corrected to a linear line. To simplify the process, the non-linear model shown in (1) is used to denote the radial distortion.
The intrinsic and extrinsic parameter models of the cameras are shown in (2) and (3).
A red rectangular colour marker is attached to the valve, as shown in Fig. 2. The measurement of the position and pose for the valve is similar to that for the red marker. A frame is established as a target frame based on the rectangle centre, which takes the rectangle plane as a XOY plane. The line between two handle markers acts as the X axis, as shown in Fig. 3. The rectangle size is 2

The objective frame of a rectangle
According to the orthogonal constraints of
Let
Since
All points on the line parallel to the X axis have the same coordinate
Any two points on the same line parallel to the X axis should match (8). Therefore, we can obtain two equations for one camera from the two lines parallel to the X axis, i.e. four such equations for two cameras. If the camera's optical axis is not vertical to the target plane, then
If the camera's optical axis is vertical to the target plane, we have
According to (3) and the orthogonal restriction of the rotation matrix
In a line parallel to the Y axis,
Considering that
If (12) is divided by
As for points on the line parallel to the axis
To improve accuracy, points on two lines parallel to the Y axis are used to calculate the results of
Rough Positioning
Two points are taken from two lines, one on line
Similarly, two equations are formed from line
In the camera frame,
Fig. 4 shows the relation between the space point and its imaging point. According to the camera's pinhole model, the target point

Space position and imaging
Applying (20) and (21) to (19),
Similarly, the coordinate for
In the target plane, coordinates offset on the axis Y of both top brim and bottom brim of the rectangle are integrated along the axis X, obtaining the area S of the target rectangle.
Camera Calibration
The two cameras on the robot head were well calibrated using the method described in (Zhang 2000, Heikkela 2000). Their intrinsic parameters are shown in Table 1. The extrinsic parameters of the left camera relative to the end of the industrial robot are given in (26).
Camera parameters
An experiment was designed and conducted to verify the proposed method with a rectangular colour marker attached to a panel. A red rectangle was viewed as the valve, and had a dimension of 98mm × 100mm. Two green parts are used to simulate valve handles. The robot head was installed on the end of an industrial robot as shown in Fig.5 (a). The target was laid on the ground under the head. Images captured by two MINTRON 8055MK cameras in the head are as shown in Fig.5 (b).

The experimental scene and target image
In the experiment, the target was fixed on the ground under the robot head. The position and pose of the robot's hand was changed so that the cameras could capture the fixed target. The position and pose of the target relative to the left camera at the i-th sampling is denoted as
Verification Experiment Results
To compare the proposed method with a traditional stereo vision method, another experiment was conducted. Four points that the rectangle intersects with the x-axis and y-axis of the object frame were selected as feature points for stereovision. Their positions in Cartesian space were computed, and were used to calculate the origin position, the X axis and Y direction vectors of the object frame. Thus the position and pose of the target relative to left camera was obtained.
Measurements were taken three times under the same conditions. Table 3 shows measuring results for the position and pose of an object. The first column shows the results for the position and pose of the object computed with a traditional stereovision method, while the 2nd column shows the results for the proposed method. Position values are shown in mm. The results with stereo vision were different, and the results with our method were unchanging.
Measuring results for the position and pose of an object using stereovision and rectangle constraint
Measuring results for the position and pose of an object using stereovision and rectangle constraint
Table 4 shows the positioning results for four feature points P1 to P4 in terms of the stereovision method and our proposed method. The results with the proposed method are formed using the coordinates of the feature points in the object frame, and the position and pose of the object frame. It can be found that the positioning results with our method are very stable.
Positioning results for the feature points using stereovision and rectangle constraints
It should be noticed that the method proposed in this paper computes the position and pose of the target with the imaging points on edge lines of the rectangle (these are detected through a Hough transformation). Even if there were errors in some imaging points, the edge line would be accurate enough for the Hough transformation, which can eliminate the influence of random errors. Furthermore, it does not need to employ feature point matching. Measuring results for the proposed method are stable and insensitive to random noise. In other words, the proposed method is more robust than the traditional stereovision method in terms of noise resistance.
Errors which exist in the measurements taken with the proposed method mainly occur due to system errors such as camera calibration errors and so on. Errors in poses should be less than those in positions, i.e. the object pose measurements have higher accuracy.
Based on the proposed method, experiments were designed and conducted for our humanoid robot to approach and operate a valve with a rectangular coloured marker attached on its panel, as shown in Fig. 2. The red rectangle was 100mm in height and 100mm in width. The pose of the valve handles was marked by green colour, whose direction was consistent with the X axis in Fig. 3. The head with two MINTRON 8055MK cameras is shown in Fig. 1. Two mini cameras were fixed on the wrists of the two manipulators. The cameras on the head were well calibrated, but the ones on the wrists are not calibrated.
Approaching the Valve by the mobile base
At the beginning, the robot searched for the target valve in the laboratory. When the valve was found, the 1st stage described in Section 2 was started. When the valve was within two meters of the robot, the 2nd stage began and the method described above was in operation. The position and pose of the mobile base was adjusted according to that of the valve until the robot was in an adequate operational area. When the robot stopped moving, the position and pose of the valve relative to the head was measured again by using the proposed method. The position and pose of the target valve relative to the chest of the humanoid robot could be obtained through coordinate transformation. Table 5 shows the position and pose of the target relative to the reference frame at the chest. The pose and position of the target relative to two end-effectors could also be calculated respectively through coordinate transformation.
Position and pose of the Valve
Position and pose of the Valve
At the 1st and 2nd stages, both arms were not in operation and were kept in a static position and pose. In the process of approaching the target by the humanoid robot, two arms were positioned so that they did not block the head field of view of the target valve.
Once the robot was in an adequate operational area, the hands of both arms would move, one to each of the handles of the valve. At the same time, the cameras on the head were inactive. The goal position and pose of the two end-effectors were determined according to the pose and position of the valve given above. The goal positions of the hands, especially in the cameras' view direction, had an offset added in order to avoid collisions between the end-effectors and the valve as a result of any error. The moving paths were planned to satisfy positions with a high priority so as to avoid collisions, except for the cameras' view direction.
The movements were controlled using kinematic and inverse kinematic models of manipulators. Therefore, end-effectors could move to the given goal quickly. At the same time, the camera at each hand was in operation to measure the size of image areas of the valve handle (the green colour marker on both sides of the valve). The size of the green markers increases as each hand moved closer to the handle. If the size was large enough or a given position was reached, position adjustment was ended and the process changed to the 4th stage of the proposed positioning method.
Fig. 6 provides a pair of images of one hand, captured at the end of the 3rd stage. Both images, i.e. Fig. 6(a) and Fig. 6(b), were captured by the left and the right cameras of the robot head respectively. It can be seen that the end-effector is at the place near to the handle with an appropriate pose. This means that the pose calculated by the proposed method has good accuracy.

Images captured by the cameras on the robot head
An image-based visual servoing method in the 4th stage was applied to guide the end-effectors to reach and catch each handle. As pointed out in (Hager 1996), image-based visual servoing methods for an eye-in-hand system have the drawback that a target object may be out of the camera's field of view during pose adjustment of the end-effector, which results in servoing failure. If positions were only adjusted in a stationary pose, this drawback would be overcome. However, to ensure that the pose is stationary in the servoing process and the end-effector can catch the handle with appropriate pose, the pose of the end-effector should be given accurately at the beginning of visual servoing. This is why the pose of the valve needs to be measured accurately in the 3rd stage of the proposed positioning method and kept unchanged in the 4th stage.
The goal of image-based visual servoing is that the image of the green marker, representing the handle, should match a given reference image as much as possible. The position adjustments of end-effectors were given a high priority, except for one (in the cameras' view direction), to avoid collision with valve handles. The end-effector was open during the visual servoing process. The goal was to adjust the end-effector position at a small range, and the gripper reached the handle with an appropriate pose when guided by the camera in hand. The final part of the process involved the gripper closing to grasp a handle. A hybrid control method using force and position was employed to rotate the valve with the robot's two hands. It is omitted here.
In a series of experiments, the humanoid robot was able to autonomously find, reach and operate the valve successfully. These experiments show that the position and pose of the valve calculated using the proposed methods are accurate enough to guide two arms in order for them to operate the valve. The advantages of using both eye-to-hand and eye-in-hand systems are clearly demonstrated.
Conclusions
A new visual servoing strategy for a humanoid robot to approach and grasp a valve is proposed. It consists of four stages, namely rough base approaching, fine base approaching, rough hand approaching and fine hand approaching and grasping. As an important part in the process of autonomous valve manipulation, a visual positioning and control method was proposed in this paper (for a hand-eye system and rectangular shape constraints). It employs multiple imaging points, which lie on lines with pre-known parameters in the objective frame. Positioning accuracy and robustness, especially the pose, were increased, and the influence of position errors in images was eliminated.
Based on the position and pose of the valve being calculated using the proposed method, end-effectors could smoothly reach valve handles, under the guidance of a hand-eye system. The end-effectors of our humanoid robot could catch the handles successfully and rotate the valve. The results verify the effectiveness of our proposed methods. The reliability and robustness of the system were significantly improved. The methods em-ployed can be widely applied in real-world applications of humanoid robots and mobile manipulators.
