Abstract
1. Introduction
For the past few decades extensive investigations and analysis have been performed for the detection of moving objects since it has prevalent applications in a variety of disciplines. A vision-based traffic control system [1, 2], video surveillance [3, 4], video segmentation [5, 6, 7], video coding [8], video indexing [9], human behaviour analysis [9, 10] are to name a few. A simplest, but most popular, method for moving object detection is background subtraction. Where moving objects are obtained from the difference image made from the difference between the current frame and the background model. Existing approaches to background model initialization assume that a sequence free of motion is available prior to building the background model [3, 11]. In the case of a surveillance area, a busy street or a public place, it is very difficult to collect training background frames without any moving objects. A good background model should absorb all the changes in the background scene over time. Several feature-based approaches have been proposed to model the dynamic change in the background [12]–[15]. However, we classify the background modelling methods, according to the type of feature being used, into two main groups: pixel-based methods and edge-based methods.
Existing pixel-based background modelling algorithms model every background pixel individually. Several techniques [16]–[21] exist to create a model of each pixel. In these methods, once the colour likelihood of the input frame is computed, the pixels that deviate from the background model are labelled as the foreground pixels. Modelling every background pixel is difficult since the intensity feature is prone to illumination change. Moreover, multiple colours can be observed at certain locations due to the repetitive object motion, shadows, noise or reflectance from other objects [15]. This effect becomes worse in the outdoor environment due to weather conditions, reflectance, motion in the background (e.g., waving tree branches) and unintentional camera motion. To adapt to this changing environment, the background model needs to be updated in every frame with an adaptation rate. However, these methods cannot consider object's motion for the selection of optimal update rate; rather they set a common rate for updating every background pixel. Thus the pixel intensity-based moving object detection methods leave ghosts (especially when a sudden change occurs for slowly moving objects) behind them [22]. Although some statistical techniques have been used to overcome the ghost effect, these techniques are susceptible to sudden illumination variations. Additionally, to segment the moving area, these methods need to set a threshold over the difference image. Choosing the optimum threshold value is application dependent and very difficult to achieve. Hence, pixel-based methods [13]-[15] suffer from the multi-modal distribution in dynamic environments and the sensitivity to illumination changes and noise.
On the other hand, edge-based methods rely on edges, a feature that is less sensitive to intensity changes and noise. This feature overcomes the limitations of the pixel intensity-based methods. Moreover, edge-based methods do not leave ghosts [1, 3, 5]. Edges, however, have position and shape variations as illustrated in Figure 1. Nevertheless, the use of edges allows us to design more expensive and robust algorithms under similar computational time as they work with fewer pixels.

Edges have changes in shape and position. (a) A sample scene with waving trees, and (b) its extracted edges. (c) The accumulation of 100 edge frames shows the variation of the edge's movement. In (c), the building's edges are thin, which depict small movement, while the trees' thick edges justify high movement variations.
Hence, edge features are useful tools for modelling the environment under their limitations. Research has been carried out to detect moving objects using edge features. However, existing edge-based methods use edge differencing. Consequently, by treating every edge pixel individually, they suffer from random noise. Pixel by pixel matching of edge points is not suitable due to the higher computational cost. Additionally, edges extracted from each frame do not always show consistency within frames. Kim and Hwang detected moving objects from sequence images by using an edge differencing method [5]. However, their method does not update the background which results in higher false alarm. Jain et al. [19] proposed a method that models the background based on a sub pixel edge map that represents the edge position and orientation using a mixture of Gaussian models. Their method has a high computational cost due to the use of increased number of Gaussians that requires update at every frame. Dailey et al. [15] compute the moving object with three consecutive frames without using any background. However, the method matches the exact pixels edge and thus cannot detect moving edge pixels in the presence of random noise or camera movement. Furthermore, the edge-segment-based approach introduced by Hossain et al. [3] uses initial motion free training frames for generating the background model. Moreover, their method uses a common global threshold for matching every background segment. Background edges show shape and size variation within frames. In addition, the variations for different edges are not the same. Without considering this variation from the environment, detectors' output cannot be reliable.
We present a novel edge segment-based statistical background model that does not need motion free training frames. Background edge behaviour is encoded by the statistical distribution from the sequence images. Moreover, edge specific automatic thresholds are generated for each distribution that can separate true background edges. Additionally, we use a mixture of Gaussians of colour and gradient magnitude to model every pixel within each background distribution region. The colour and gradient magnitude models help to isolate moving edges that fall over the background distributions. Edge specific flexibilities are given by the distribution that can tolerate illumination, shape and position variations within frames. Our proposed background model can tolerate camera jittering in a scale. It takes less time to process since we do not need to search every edge pixel individually unlike the traditional methods. Thus, our method utilizes the robustness of edge-segment structure and utilizes statistical background model to facilitate fast and flexible background edge-segment matching for the detection of true moving edges.
2. Proposed Method
Our statistical model attempts to predict the edge's behaviour, i.e., its shape and position changes, and encodes it into a statistical map. Therefore, when a new edge comes to the scene, we test it against the previous observed edge's behaviour and determine whether it fits the previous edges or it is a new one. Moreover, we use an adaptive comparison framework for the edges (i.e., the threshold for the matching score, the search window and a voting scheme to distinguish between moving edges and background edges that share the same region) that increases the accuracy of the detection. Additionally, the statistical model allows us to suppress the contribution of the moving objects to the background model, leaving only the background edges' contribution in the model. In summary, the background modelling method is divided in five parts (as shown in Figure 2): (1) First, we create the frame statistical model. It is a kernel-density distribution from the edge maps. (2) Then, the frame statistical distributions are accumulated using temporal information. (3) Next the accumulation is adaptively thresholded, allowing us to use non-ideal frames to learn the background. (4) Additionally, in each distribution region we create a set of Gaussians to model the colour and gradient magnitude information for the region. (5) Finally, we extract the moving edges as outliers from the statistical model. Furthermore, we present an abstraction of the method in Figure 3.

A flow diagram of the proposed method.

An abstraction of the proposed method.
2.1 Statistical Modelling
To estimate the edges' behaviour, first, we extract each image's edges using a Canny edge detector [23], and represent the extracted edges set from frame
where the frames range from the initial frame
where ρ is the pixel's position that belongs to the neighbourhood
2.2 Adaptive Threshold
The distributions differ from two points of view: accumulation and motion (as shown in Figure 3). The accumulation of edges, among frames, reveals their variation and frequency (i.e., rate of the edge's occurrence in consecutive frames). Moreover, the frequency indicates which distributions represent background and which represent foreground. Thereby, the background and the foreground have a distinctive frequency. For instance, the moving objects, that appear and disappear from the scene, create small peaks in the distribution; while the background edges have a high distribution. Consequently, we can remove the spurious distributions based on the edge's frequency. On the other hand, the different motion in the edges creates wider or narrower distributions, e.g., edges with a lot of movement create spread distributions, while edges with little movement create sharp distributions. The creation of ad hoc distributions for each edge allows us to define accurate search regions for the edge matching process, and adaptive thresholds for each edge according to its characteristics. Moreover, the accurate search regions improve the information modelled by using the Gaussians of colour and gradient magnitude, which enrich the background model further—see Section 2.3. We threshold the distributions from these two points of view: by using the accumulation to remove foreground, and by using the motion (through the standard deviation of each distribution) to improve the accuracy of the detection.
2.2.1 Accumulation threshold
To remove the distributions created by the moving objects, we assume that the moving objects will have an average speed
where
2.2.2 Motion threshold
To threshold the distribution according to their motion we need to compute the cutting point that represents a certain percentage of the distribution (given by
where
Thus, we can define the quantization step
Moreover, we define the cutting point for a
Thereby, this point is the pixel position from the mean of the distribution that defines where to prune the distribution. Furthermore, we use several points from each distribution (as samples from the orthogonal slice of each mean point) to refine the approximation of this cutting point (by averaging the resultant cutting points). Consequently, we create a map with regions that represents the background. Thus, to apply this threshold, we check the distance from the centre pixels in each distribution. If the distance of a pixel (from the pixels of the mean of the distribution) is larger than the cutting point, we remove that pixel from the distribution.
2.3 Gaussians of Colour and Gradient Magnitude
Furthermore, we add colour and gradient magnitude information to the regions that represent the background. For each pixel in the distribution, we create a set of Gaussians that model the colour and gradient magnitude information in that area (as shown in Figure 3). This will increase the detection rate, as it avoids the over-elimination of moving edges (as shown in Figure 4).

Current edge-based models have a problem: they over-eliminate moving edges.
Given that our main goal is to create an intensity-robust method, we cannot use directly the RGB colour space to extract the colour information. Instead, we use the HSV colour space, and use the hue (H) and saturation (S) components to model the colour. Hence, we build two Gaussians, namely, GH and GS, to model the colour of each pixel. Additionally, we model the gradient magnitude (GM) by another Gaussian, GGM. These Gaussians are defined by
where μ
where
2.4 Foreground Detection
We use the resultant distributions after the adaptive threshold operation and the set of Gaussians (of colour and gradient magnitude) as a background model to detect the moving objects in the scene. In order to detect the moving objects, first we obtain the edges in the incoming frames by using a Canny edge detector [23]. Then, we compare the moving edges with the background model. Consequently, those edges that do not lie within a background distribution are considered moving objects. Moreover, we check the edges that are within a background distribution to test their colour and gradient magnitude information. We take a vote for each pixel on the edge-segment (within the background region) to consider the pixel as background or foreground by
where
3. Experiments and Results
We test our method on different sequences including PETS 2001 [25] and I2R database [26]. These sequences have a dynamic background that represents the environment for video surveillance applications. In these databases, the images have background motion, illumination change and noise. Our proposed method was able to detect almost all of the moving objects in the sequences.
3.1 Results
We evaluated the detection capabilities of our method against four other methods: Dailey et al. [15], Dewan et al. [27], Hossain et al. [3] and Kim and Hwang [5]. We trained and tested the algorithms against the PETS 2001 data sets. Specifically, we used the sequences in Data Set 3 (Testing Camera 1) and Data Set 4 (Testing Camera 1). Moreover, we chose these sequences because of their challenging environments. In Figures 5(a) and (b), we have Data Set 3 that has illumination variation due to the movements of the cloud in the sunny environment. Figures 5(c) and (d) have over exposed image sequences with illumination variation and reflection from the windows, cars and other moving objects. The ground truth for the selected frames is shown on the second row of Figure 5. In both sequences people are walking around in the scene. Since managing the background model is challenging, both Dailey et al. [15] and Dewan et al. [27], as is shown in the third and fourth rows of Figure 5, detect moving objects without using any background. The former method uses the edge pixel-based approach while the later uses the edge-segment-based approach. In both approaches, two edge maps are extracted from the edge difference image from three consecutive frames. Finally, the moving edges are extracted by applying logical AND operation between them. However, these methods fail to detect slowly moving objects and thus cannot be used for real-time applications. The slowly moving clouds in Data Set 3 (DS3) and the moving objects (people and cars) in Data Set 4 (DS4) create illumination variation. In Figures 5(a) and (b), the clouds present a challenge as they change their shape slowly. Although Hossain et al.'s [3] method uses an edge-segment structure (see the fifth row of Figure 5), the method relies on initial motion free training frames for generating the background. Moreover, for background segment matching, they have used a fixed threshold for all the background segments. Selection of a lower threshold results in matching of edges with small movement variation. On the other hand, higher threshold increases false background segment matching. Thus, the method gives false alarms. On the other hand, Kim and Hwang [5] detect moving objects from the sequence images by using an edge differencing method. They compute current moving edges and temporary moving edges. Finally, moving edges are determined by applying logical or operation between them. Their method does not update the background model. Thus, the method cannot handle dynamic background and results in more false alarms. Their moving edge detection results are shown in the sixth row of Figure 5. The proposed method overcomes all these problems due to its advantages of the statistical background. Here we utilize movement statistics of every background edge-segment effectively. Moreover, the use of Gaussian colour and magnitude distributions in the proposed method helps to recover moving edges that fall over the background distributions. Additionally, these distributions allow us to absorb flickering noisy edges as background edges in the proposed method. This adaptive behaviour (towards the flickering edges) in the background model is due to the adaptive thresholds and the statistical distributions that model the possible edge positions. Furthermore, the proposed method recovers a detailed shape and a clear boundary of the foreground. Hence, the proposed method increases its detection capabilities. We show the moving object detection of our proposed method in the last row of Figure 5.

The result of PETS 2001 DS3 and DS4. Each row represents a method and each column represents the frames from sequence. (a) DS3: frame 1386 and (b) DS3: frame 1446, (c) DS4: frame 1476 and (d) DS4: frame 1536.
3.2 Quantitative Evaluation
To evaluate the performance of the proposed system quantitatively, we compared the detected moving edge-segments with the ground truth that was segmented by hand. The metrics used for performance evaluation are three: Recall, Precision and Similarity that are defined by
where,
Table 1 shows the Precision and Recall of the five different datasets tested for the proposed method. Here, the outdoor sequences DS3 and DS4 have illumination variation due to the movement of the cloud and for the underlying moving objects. The Bootstrap (BS), Airport (AP) and Shopping Mall (SM) sequences are indoor environments that have object reflection and background noise. These three indoor sequences are collected from the I2R dataset. A sample snapshot for the three indoor datasets, as well as their ground truth and the detection result in the proposed method, is given in Figure 6. Figure 7 demonstrates the Precision and Recall of the detected moving edges for the DS3 and DS4 datasets. Again in Figure 7, Precision, which measures the accuracy of detecting moving edges, is higher in the proposed method. This is due to the statistical background model, i.e., flexibility in matching is given to those segments that have high movement information. Moreover, the existing pixel-based method lags by scattered moving edge pixels that are often mismatched which eventually lead to a lower Precision value. Figure 8 illustrates the Similarity performance over the same datasets with the proposed method in comparison with four other methods. It describes the overall edge detection accuracy as well as effectiveness. Due to the segment-based statistical nature of our proposed method, it overcomes the difficulties and gives superior performance in the outdoor scene. Thereby, the proposed method proved to be robust against dynamic illumination change in the environments.
Precision and Recall of the proposed method

The result of I2R data set [26] sequences. Rows are different sequences. The left column is a sample frame of each sequence. The middle column is the ground truth of each sequence and the right column is the result in the proposed method.

Precision and Recall measure on PETS 2001 database

Similarity measure on PETS 2001 database.
We also observed similar performance for the indoor I2R dataset. Figure 9 illustrates the Similarity performance for the three indoor sequences. We compared the proposed method with the I2R data set [26] against other five methods. Li et al. [28] use a Bayesian framework for background subtraction (BBS). The mixture of Gaussians (MoG) [29] uses multiple weighted Gaussian distributions as a background model. The background neural network (BNN) [30] is a mixture of a probabilistic neural network and a winner-takes-all neural network. The SOBS [31] is a self-organizing approach through neural networks. Kim and Hwang present a fast object segmentation algorithm that uses edges to extract the moving objects in a video sequence. The proposed method performs, on average, 11% better than other methods. These sequences are particularly difficult due to the reflections and shadows cast on the indoor surfaces. Moreover, Figure 9 shows the superiority of our method in comparison to pixel-based methods. In general, as shown in the Figures 8 and 9, the proposed method outperforms all the existing methods since the method shows high value for both the Precision and Recall parameters. Thus the proposed method is stable and more reliable than other methods discussed in this paper.

Similarity measure on I2R database.
4. Conclusion
We presented a statistical edge-segment-based method to model background and detect moving objects in dynamic environments. The proposed method builds statistical distributions for each edge-segment by using the unique information of each edge-segment to compare other edges—resulting in a robust adaptive verification process. Moreover, thanks to these features, we overcome the most common edge problems, such as shape and position changes. Furthermore, these mechanisms can be incorporated in other edge-based methods to extend their functionality and make them robust in dynamic environments. The proposed statistical map can be used to split foreground edges that merge with the background, increasing the detection accuracy. Additionally, the proposed method explores the edge domain, which has not been researched as much as the pixel domain, for object detection. We found promising results that can be used in several applications, including surveillance in dynamic backgrounds and content-based video encoding.
