Abstract
1. Introduction
China has been the world's largest apple producer since 1993, according to China's National Engineering Research Centre for Apple. About 39.5 million tons of apples were produced in 2012, accounting for about one half of the total world production. In China, fresh market apples are harvested carefully by human labour. As the harvesting season is short, the harvesting work is concentrated in a short time period and labour shortage tends to limit the planting area. In order to save on labour, the challenge of developing a cost-effective robotic system for apple picking has been taken up by researchers [1–2]. The principal problem associated with an apple-picking robot is the recognition and orientation of the apple [3]. Image segmentation is the key problem for object recognition and camera calibration is one of the critical problems for object orientation [4]. Since the 1980s, research has been carried out on the proper design and control of equipment for image segmentation methods based on machine vision, to enable accurate recognition of apples.
The segmentation method based on graph theory, which is entirely consistent with the cluster theory, is an efficient global optimization algorithm and the segmentation results do not tend towards isolated points or small regions. At present, it is seldom applied to the identification of fruits due to the manual selection of parameters [5]. Meanwhile, it also needs a large amount of calculation because it segments in the pixel space directly [6]. However, the images can be segmented into several regions by an adaptive mean-shift algorithm. Compared with the number of pixels, the number of regions is greatly decreased. Therefore, it can effectively reduce computational complexity.
In this paper, an adaptive Ncut segmentation method of a colour apple image based on adaptive mean-shift is proposed. Firstly, the adaptive mean-shift algorithm is used for initial segmentation. The R-B colour feature is then extracted. Next, a parameter self-adaptive Ncut method is applied to segment the images. Finally, the effectiveness of the proposed method is verified.
2. The Related Works
Since the machine vision was applied to the identification of fruit, there are already some research results on image segmentations for apple recognition in the literature. Reference [7] presented an algorithm to automatically recognize the apple for a machine vision system that guided a robotic harvesting. A threshold that enabled recognition of fruit pixels was set as the closest grey level corresponding to the minimum probability between the peaks of the apple and background from the red colour difference histogram. In order to segment apple defects, reference [8] proposed a method to segment pixels based on the Bayesian classification process. However, it was dependent on training the image to provide information for the class mean and covariance matrix. In reference [9], the colour and texture features of the image were calculated respectively and an artificial neural network classifier was applied to segment the apple image. In reference [10], edge detection and a combination of colour and shape analyses were utilized to segment images of red apples. However, the apple recognition precision of the above methods needs to be improved.
In recent years, some optimal algorithms have been proposed for apple recognition. In reference [3], an automatic recognition vision system guided for an apple harvesting robot was presented and the image segmentation method based on region growing and colour feature was investigated. For apple sorting and grading, an automatic adjustable colour segmentation method based on linear SVM and Otsu's method was developed [11]. An SVM greyscale image was generated using a classification hyperplane in the 3D RGB space calculated with the linear SVM. An optimal threshold was estimated by finding the minimum threshold around the fruit boundary. On the basis of a K-means colour clustering algorithm, a convex hull based concave point detection algorithm was presented to achieve recognition of occluded apples [12]. In reference [13], an L*a*b* colour model and fuzzy two-dimensional (2D) entropy based on a 2D histogram were applied to detect red Fuji apples in natural scenes. The genetic algorithm (GA) was optimized to increase the precision of segmentation of Fuji apples under complex backgrounds with partially occluded branches and reflective lights. Aiming at a variety of apple states with non-occluded and overlapped apples as well as those severely occluded by branches and leaves, the edge detection and the improved RHT transformation method was proposed for apple recognition [14]. In reference [15], a novel approach to detecting apples in night-time images by analysing the spatial distribution of the light around highlights was developed when the artificial illumination light source and camera were roughly aligned. These methods only considered the correctness of apple recognition and rarely considered real time in their robot operation. Therefore, the objective of this research is to investigate the fast segmentation method of apple recognition as related to robot picking.
3. Materials and Methods
3.1. Vision System and Image Acquisition
The robot vision system consists of a VGA colour charge coupled devices (CCD) video camera with 50 images per second mounted on a robot end-effector and an industrial computer with Intel(R) Core(TM) 4Duo CPU ET3000 @ 2.66GHz, RAM with 4GB and hard disk with 500GB. CCD camera video is used to automatically acquire apple images one by one according to the presetting sample time by the video for Windows capture technology [3]. The industrial computer is for dealing with these images and detecting the object. The software platform used is Visual C++ 6.0. In addition, the video images captured from the CCD video camera are used for on-line display on an industrial computer.
Colour images of a Fuji apple examined in the following way were acquired under all-weather natural conditions at the apple demonstration orchard of Feng Country, Jiangsu Province, in October 2014. The colour signals from the camera were transferred as a 24-bit RGB colour image (400 pixels×300 pixels in each colour band) and processed by an industrial computer.
3.2. Vision Recognition for Apple Image
In this paper, the whole flow for apple vision recognition is shown in Figure 1.
Step 1: Adopt the adaptive mean-shift method to initial segmentation for the original apple image. The image is divided into many areas.
Step 2: Extract the R-B colour feature from the image attained by Step 1. The R-B colour feature undirected graph is attained when every region is expressed by a region point.
Step 3: Take this undirected graph as the input of Ncut and the weight matrix
Step 4: Extract apple image from original image by the outline image template.

The flow chart for apple image vision recognition
3.2.1. Initial Segmentation for Apple Image
Mean-shift is a nonparametric iterative algorithm based on the up trend of density gradient [16]. It can segment the image into several regions on the basis of keeping the image's basic features. The number of image entities is decreased, which is particularly highlighted. The mean-shift algorithm is as follows:
When
where
where
An estimator of the gradient of
where
In Eq. (4), it contains the adaptive mean-shift vector, which always points towards the direction in which the density rises most quickly.
Construction of a kernel function from
where
As a picking robot must have a good real-time performance, a method, which automatically determines the bandwidth, is used to realize the adaptive mean-shift [18–19]. The definition of the semi-parametric rule that selects the spatial scale for each pixel is given as follows: if the true density
Step 1: Derive the fixed bandwidth
Step 2: For the spatial scale
Step 3: Repeat Step 2 for
Step 4: Select a spatial scale
Step 5: Run the mean-shift procedure and identify blobs as groups of pixels having the same connected convergence points.
One of the apple images dealt with by the above algorithm is chosen in illustration, as follows:
In Figure 2(b), the apple images after processing all highlight smoothly. In Figure 2(c), the apple's boundary outlines are almost completely described.

Adaptive mean-shift segmentation. (a) Original image. (b) Segmentation result. (c) Outline image.
3.2.2. R-B Colour Feature Extraction
However, the backgrounds via adaptive mean-shift segmentation are complex and produce too many small areas, which makes it difficult to separate the apples. In order to overcome the influence of uneven illumination, the R-B colour features from Figure 2 are extracted, which is shown in Figure 3.

R-B colour feature extracted image. (a) Result after extracting R-B colour feature. (b) Outline image.
In Figure 3(a), there is a larger contrast between the apple image and the branch, leaf and sky images. Figure 4 is a pixel statistical figure between the target and background of Figure 3(a) and the two curves are quite far apart. After extracting the R-B colour features, the apple's images are further highlighted. The image has been divided into several small areas using an adaptive mean-shift algorithm. In the following, a self-adaptive Ncut algorithm will be used to perform further apple segmentation from the whole image.

Pixel values of R-B colour feature
3.2.3. Adaptive Ncut Segmentation for Apple Recognition
The Ncut method can obtain the global information of an image. Each pixel or feature point is regarded as a node in the image and every two nodes are connected by a border. The weight of the border signifies the distance between the nodes and the distance can be calculated by colour, brightness and other information from the image. The Ncut algorithm aims to divide the image into two disjoint sets
The normalized
where
Step 1: Convert original graph to undirected graph
Step 2: Calculate the weight matrix
Step 3: Solve for the eigenvectors
Step 4: Segment graph by the eigenvectors until the Ncut value is lower than the number of minimum regions.
The weight matrix
where
The apple image in Figure 3(a) uses this method. The result is in Figure 5. It shows that using the outline of the apple image as a border between the object and the background can divide the object and background into two types.

Adaptive Ncut segmented result
3.2.4. Extracted Apple Image by the Outline Image Template
Figure 6 shows the object area extracted by the template from the outline of Figure 5. The result shows that the object of the apple image can be recognized well by this method.

Template matching result
3.2.5. Computational complexity analysis
The computational complexity of the proposed method is analysed as follows: firstly, by using adaptive mean-shift initial segmenting, the original image is turned from pixels into many small regions. As the number of image regions is far smaller than the number of image pixels, the computational complexity can be degraded. For example, in Figure 2(a), the pixels of the original image are 400*300 and the number of region maps in Figure 2(c) obtained by initial segmenting is 115, which is far lower than the number of image pixels. Secondly, after the R-B colour feature is extracted, the number of region maps is remarkably reduced. For example, the number of region maps decreased from 115 in Figure 2(c) to 18 in Figure 3(b). Thirdly, in adaptive Ncut segmentation, the region maps are used as inputs of the Ncut algorithm, so the lower the number of region maps, the lower the space dimensions of the Ncut algorithm and hence the structure of the Ncut algorithm becomes simpler. Therefore, the run time of the Ncut algorithm becomes shorter. From the analysis mentioned above, we can see that the computational complexity of the proposed method is lower than direct Ncut segmentation in spite of the additional steps, including initial segmentation and feature extraction.
4. Experimental Results and Discussion
4.1. Experimental Results
In order to evaluate the segmentation validity, the segmentation error is defined as
where
The experimental results of three groups selected randomly under different lighting conditions, including sunny days, cloudy days and night, are as shown in Figure 7 to Figure 9. In order to illustrate the validity of the proposed method, the experiment results attained by the method in [23], which also combined mean-shift with Ncut to segment images, are as shown in Figure 10 to Figure 12. Compared with the proposed method, the parameters of the method used in [23] are selected manually. Table 1 presents the segmentation error and recognition time for these images by the proposed method and the method discussed in [23].
Segmentation error and time

Segmentation results of the proposed method for images on sunny days. (a) Original image. (b) Outline image after initial segmentation. (c) Outline image after R-B colour feature is extracted. (d) Ncut segmentation. (e) Final result after template matching.

Segmentation results of the proposed method for images on cloudy days. (a) (b) (c) (d) and (e) are the same as in Figure 7.

Segmentation results of the proposed method for images at night. (a) (b) (c) (d) and (e) are the same as in Figure 7.

Segmentation results of the method presented in [23] for images on sunny days. (a) Original image. (b) Outline image after mean-shift segmentation. (c) Ncut segmentation. (d) Final result after template matching.


On the basis of the above three groups of experiments, 150 apple images with 400 pixels×300 pixels under all-weather natural conditions, including a sunny day, a cloudy day and night-time, are used for testing. The segmentation error, recognition rate and time of 150 apple images are recorded. Some statistical results of segmentation and recognition for images in each type of natural condition, including sunny, cloudy and night-time, are given in Table 2. The statistics show that the segmentation error of 121 apple images in all 150 images is less than 1%. The maximum segmentation error is 3.001%, which occurs in the night-time image. The number of fruits that failed on a sunny day, a cloudy day and night-time is six, eight and seven respectively. The average recognition time is 0.6043, 0.6947 and 0.1748 respectively.
Segmentation error, recognition rate and time statistics for images in each type of natural condition
4.2. Discussion of Experimental Results
The experimental results are analysed and discussed as follows:
From Figure 7 to Figure 9, we can see that the segmented edges of the target are coincident with their original edges by the proposed method. The segmentation errors are all lower than 1%. However, the results of the method presented in [23] are sensitive to illumination change. When the original image is imposed on uneven illumination or shadow, error segmentation will come into being. Hence, the segmentation errors are bigger than in the proposed method.
In the proposed method, the original image is divided into many small regions by the adaptive mean-shift method and the number of regions in Figure 7(b), Figure 8(b) and Figure 9(b) are 181, 191, 171, 116, 43 and 41 respectively. After the R-B colour feature is extracted, the number of regions in Figure 7(c), Figure 8(c) and Figure 9(c), which are used as inputs of Ncut, are reduced to 53, 17, 77, 51, 10 and 11 respectively. Due to the decrease in inputs for Ncut, the run time of the Ncut algorithm is drastically reduced. Hence the recognition time of the proposed method is far less than that of the method in [23].
From Table 1 to Table 2, the maximum segmentation error of the proposed method is 3.001%, which occurs in the night-time image. This is due to the lack of illumination. It also leads to a low recognition rate of the apple at night-time in contrast with the other two types of natural condition. The reasons why the night-time recognition time is smaller than the other cases are as follows: compared with the daytime image, such as Figure 7(a) and Figure 8(a), the background colour of the night-time image in Figure 9(a) is single. Therefore the number of regions in Figure 9(c) at night-time after the R-B colour feature is extracted is far less than that of Figure 7(b) on sunny days and Figure 8(b) on cloudy days. Regarding the number of regions as inputs of the Ncut method, the lower the number of regions in the night-time image, the shorter the run time of the Ncut algorithm in the night-time image. More than 90% of the apples are successfully detected. The instances of failed apples are mainly caused by overlapping apples and fruit occluded by branches and leaves. The average recognition time is less than 0.7s, which meets the real-time requirement of a picking robot.
5. Conclusions
The Ncut method of colour apple image based on adaptive mean-shift is presented. Using adaptive mean-shift initial segmenting, the image is divided into regional maps. Next, by extracting the R-B colour feature, the number of peaks and edges of the regional maps is dramatically reduced and the computation speed can be improved. Then, taking every regional map as a region point, the weight matrix constituted by region points is chosen for Ncut to realize the adaptive segmentation. The experimental results indicate that the proposed method has a better real-time performance and higher precision under all-weather natural conditions. It can extract the apple images more clearly and lay a good foundation for apple three-dimensional orientation. Analysis of recognition error shows that overlapping apples and apples occluded by branches and leaves are the main causes of error. Therefore, future work will be performed in this area to minimize the unrecognition of apples. In addition, due to the limitations of experimental trials, further research may be conducted with a harvesting robot that enables the capture of an image.
