Sage Journals: Discover world-class research

Abstract

Target tracking has become an elementary task in wireless visual sensor networks (WVSNs). In this paper, we propose a camera sensor activation scheme for target tracking in WVSN. Our objective is to balance the tradeoff between the accuracy of target tracking and the resources of networks. By studying the sensing model and deployments of cameras, an observation correlation coefficient is derived to describe the correlation characteristics of visual information observed by cameras with overlapped field of views. According to the observation correlation coefficient, a correlation-based camera activation scheme is designed. Experimental results show that the proposed observation correlation coefficient can model the correlation characteristics of visual information in WVSN. Further simulations show that the correlation-based camera activation scheme has satisfied performance in target tracking compared with other sensor selection methods.

1. Introduction

Wireless visual sensor networks (WVSNs) include cameras to capture visual data from the environment and processing components to process the data locally in a desired way [1]. Compared with the wireless scalar sensor networks, WVSN can provide information-rich descriptions of captured events, which is adopt in various security and surveillance applications, such as remote and distributed video-based surveillance, environmental monitoring, and ambient-assisted living and personal care [2]. In these surveillance applications, target tracking is one of the major issues, which is to accurately determine the location of possible moving target within the least amount of time. For example, an accurate and timely determination of vehicle location is required for battlefield situational awareness [3].

Target tracking in WVSN involves a number of significant challenges. Firstly, WVSN is a resource-limited network. The capabilities of processing and transmission in WVSN are not powerful enough. There may be constraints of bandwidth for transmitting all the raw data from camera sensors to the central node and difficulties in analyzing a huge amount of visual data. The existing algorithms of tracking cannot solve the target tracking in WVSN. Secondly, WVSN is an energy-limited network. Sensor nodes are typically battery powered, and it may not be feasible to replace or recharge the batteries of sensors in many remote sensing applications. As a result, an important characteristic of sensor networks is the limited energy. In particular, compared with the scalar sensors, image processing and transmission consume remarkable energy, in spite of the fact that the content of interest in each frame capture by the visual node is very high. Therefore, it is crucial for energy efficient use in WVSN. Thirdly, WVSN is a low-cost network. Visual nodes are usually equipped with low-resolution cameras due to the cost limitation [4]. Pixels on the target in WVSN are relatively less than that in other camera networks, which will increase the difficulties in target extraction, localization, and tracking.

To address the above challenges, the key is to balance the tradeoff between the value of information in the measurements and the resources of WVSN [5]. In this paper, we follow the information-driven sensor querying framework in which camera sensors are selectively activated based on their correlations. Because of constraints on resources, energy consumption, and low resolutions of cameras, we consider a group-based target tracking scheme, where there is a small group of sensors active, while the rest of the network is idle. A camera which has detected the target will call up other cameras with the most correlations to compose a group of active cameras for locating and tracking the target. If any active camera loses the view of the target, it goes back to sleep, and a new camera will be activated to renew the active group based on the correlation computation. The crux of the group-based target tracking scheme is the approach of sensor activation which must be carried out with accuracy and in real time as important considerations. The scheme of activating cameras should provide the most accurate location information of the target, and it should also assure that the WVSN could execute in real time.

In this paper, we address the problem of cameras sensor activation for tracking a target in WVSN. Motivated by the fact that the image observed by a camera is directly related to its field of view, we designed an observation correlation coefficient to evaluate the correlations between two cameras based on the sensing model rather than analyzing the specific images of cameras. In cooperation with the observation correlation coefficient, the most correlated cameras are selected to be activated for locating the target in the world frame. Our main contributions include the following: (1)

we design the observation correlation coefficient to describe the correlation from images observed by cameras with overlapped field of views. By this method, a large bulk of computation for image processing to find the correlated camera is avoided;

(2)

a small group of cameras are involved in the target tracking rather than all the cameras viewing the same target, which is beneficial for saving the cost, the computation, and communication.

The remainder of this paper is organized as follows: in Section 2, we briefly highlight the related works. Section 3 presents assumptions and preliminaries. We also present the sensing model of the cameras and the model of target localization by two cameras. Section 4 introduces the observation correlation coefficient. The proposed camera sensor activation scheme is introduced in Section 5. Section 6 conducts experiments to validate and evaluate the proposed scheme, and conclusions are given in Section 7.

2. Related Works

Strategies of sensor scheduling for optimizing network lifetime in wireless sensor networks have been previously considered in the literatures. Yu et al. develops a camera scheduling strategy to maximize the lifetime of the visual sensor networks [6]. Soro et al. provides a heuristic approach for camera scheduling by proposing a cost function associated with each camera depending upon the remaining energy of the camera and the coverage geometry [7]. In the above two works, they utilize the ceiling cameras, which are impractical to deploy. Cai et al. organizes the directions of sensors into a group of nondisjoint cover set, where one cover set in which the directions cover all the targets is activated at one time to extend the network lifetime [8]. Alaei et al. provides a priority-based sensor scheduling strategy to apply coordination on cluster sensor for awakening the minimum number of sensors to monitor the interested area [9]. Given the noisy measurements and the object prior distribution, Ercan et al. uses the minimum mean square error (MSE) of the best linear estimate of the object location in 2D as a metric for sensor selection [10]. Besides, an entropy-based analytical framework is developed to measure the amount of visual information provided by multiple cameras in WVSN [11]. This method is heavily relied on a joint probability distribution of the two image sources [12], whose accurate estimation is difficult to get due to the complexity of image contents and the difficulty in image modeling [11]. The information-driven sensor scheduling technique is introduced in [13, 14], which activates the next sensor based on maximizing information utility and minimizing resource cost.

In context of sensor scheduling for target tracking, some approaches have been developed for optimizing tracking performance subject to constraints on sensor usage. Ying et al. develops a Monte Carlo solution method to address the problem of sensor scheduling for target tracking, which is formulated as a partially observable Markov decision process [15]. Kreucher et al. adopts an active sensing approach to scheduling sensors for multiple target tracking applications that combines particle filtering, predictive density estimation, and relative entropy maximization [16]. Toh et al. develops a distributed target system implementing a novel competition-based distributed sensor scheduling scheme where a candidate sensor node with highest predicted tracking accuracy will be elected as the new tasking sensor with the highest probability [17]. The previous researches are mainly in the area of wireless sensor networks. However, these existing sensor scheduling algorithms cannot solve the issue in visual sensor networks, due to the significant differences in information acquiring and processing method from conventional sensor networks.

A distributed target tracking approach using a cluster-based Kalman filter was proposed in [18], where a camera is selected as a cluster head which aggregates all the measurements in the communication range to estimate its position using a Kalman filter and sends the estimation to the central unit for tracking the target. Song et al. [19] also proposed a consensus method to track the target in a camera network. In their work, if a camera detects a target, the information from all the other cameras viewing the same target is used for fusing. From the aforementioned related works, we can observe that the accuracy of target tracking is limited by the resource of sensor networks in existing sensor scheduling methods. Therefore, in this paper, we propose the sensor activation scheme to balance the tradeoff between the tracking accuracy and resources of WVSN.

3. Preliminaries

3.1. Assumptions

In this paper, all subsequent discussions are based on the following assumptions: (1)

we consider a homogeneous WVSN, which only contains the camera sensors, in a two-dimensional field. For convenience, we assume that the cameras sensors are placed in a square region. Actually, we are not concerned about the shape of the interested region, which can be circular or any other polygons;

(2)

we assume that all the cameras are placed horizontally around a space. The positions and bearings of the cameras have been already known in the world frame;

(3)

the WVSN is deployed with sufficient nodes such that the network is connected. Some of them are in the active mode during the period when no target is detected. Other cameras are in the sleep mode, which can be switched to the active mode immediately if they have received awakening signal;

(4)

the sleeping sensors can be woken up by external means. The time delay from the sleeping mode to the active mode is neglected in this paper.

3.2. Sensing Model

Different from scalar data sensors, cameras project a target from a 3D world to a 2D plane via a perspective point, which can only acquire the bearings of the target in the visual image. Specifically, as shown in Figure 1, a camera has a field of view (FOV) that represents the area on the $x_{w} o y_{w}$ plane where a target captured by the camera is located. The FOV is represented as an isosceles triangle where both the equal sides join at a point representing the camera location. The angle between these equal sides is known as the FOV angle and is a factory specification defined for every camera.

Figure 1

Field of view.

It seems that cameras have unlimited sensing range; however, the target, which is described as pixels in the image, cannot be extracted from the background, if the target is located too far away from the camera, especially, when the resolution of camera is not good enough. Thus, for the reason of reducing the complexity, the sensing model of the camera in this paper is expressed by a triangle which is determined by four parameters (p, r, θ, α) [20], shown in Figure 2, where p means the position of the camera, r is the sensing radius, α is the heading direction of the camera's projection (the angel between the center line of the sight of the camera's FOV and the positive semiaxis of $x_{w}$ , and rotating in counterclockwise is positive), and θ is the FOV angle.

Figure 2

Simplified sensing model.

3.3. The Model of Target Localization

The purpose of target tracking is to generate the trajectory of a target over time by locating its position in every frame of the video [21]. The principle of target localization is to estimate the coordinates of intersection point when the bearings of target in different images are intersected. We set up a world coordinate system $o x_{w} y_{w} z_{w}$ for the interested area. The $x_{w} o y_{w}$ plane is the ground plane. The $X_{o_{c}} Y$ plane is set as the image plane of cameras. For a camera sensor, its optical center can be denoted as $o_{c}$ .

The coordinates of the intersection point of the target bearings in the world coordinate plane can be calculated as follows. We assume that simple background subtraction is performed locally at each camera node. The hull of the target can be extracted. A feature point is selected to represent the target. As shown in Figure 2, $p x$ and $p y$ are the pixels coordinates on the image plane. Based on the pixel information, we can compute the bearing of the target in the world coordinate frame. The bearings of the target in the image of camera can be computed according to the following formula: $\begin{matrix} k = \tan (α - \arctan ((\frac{2 p x}{p_{cons}}) \cdot \tan (\frac{θ}{2}))), \end{matrix}$ (1) where k is the bearing of the target, α is the angle of camera rotating around $z_{w}$ axis, when the direction of the camera is along with the $x_{w}$ , $α = 0$ , and rotating in counterclockwise is positive, θ is the horizontal field of view, and $p_{cons}$ is the number of pixels in the horizontal. $p x$ is the horizontal pixel coordinate in the image. In our localization scheme, only $p x$ is communicated to the base station.

If two cameras capture the same target at the same time, the target bearings generated from two cameras would be intersected. We can infer the coordinate of the target from the known positions of two cameras and the intersected point of bearings of the target. The computation process is described by the following equation: $\begin{matrix} k_{i} = \frac{y - y_{i}^{c}}{x - x_{i}^{c}}, \end{matrix}$ (2) where $k_{i}$ is the bearing of the target in the ith camera's image. $x_{i}^{c}$ , $y_{i}^{c}$ are the coordinates of the ith cameras in the world coordinate frame. x and y are the pending coordinates of the target in the world coordinate. If there is only one camera that could detect the target, the values x and y cannot be uniquely determined because there are two unknowns. Thus, at least two cameras that detect the camera are needed to determine the location of the target. The target position's computation matrices are shown as follows: $\begin{matrix} [\begin{bmatrix} - k_{1} & 1 \\ - k_{2} & 1 \end{bmatrix}] [\begin{bmatrix} x \\ y \end{bmatrix}] = [\begin{bmatrix} y_{1}^{c} - k_{1} x_{1}^{c} \\ y_{2}^{c} - k_{2} x_{2}^{c} \end{bmatrix}] . \end{matrix}$ (3) Then, $\begin{matrix} [\begin{bmatrix} x \\ y \end{bmatrix}] = {[\begin{bmatrix} - k_{1} & 1 \\ - k_{2} & 1 \end{bmatrix}]}^{- 1} [\begin{bmatrix} y_{1}^{c} - k_{1} x_{1}^{c} \\ y_{2}^{c} - k_{2} x_{2}^{c} \end{bmatrix}] . \end{matrix}$ (4) The position ( $x, y$ ) of the target T in the world coordinate frame is obtained. Once there are more than two cameras that capture the target T simultaneously, we can utilize the statistical method to refine the estimated result for improving the accuracy [22].

3.4. Multiple Targets Localization

There may be multiple targets existing in the interested field. The biggest difference between the single target tracking and the multiple target tracking lies on the corresponding target matching, which can be addressed by the approaches incorporated with features of targets, such as the color, motion, contour, and boundary. From the perspective of statistics, the position of the same target estimated by a set of cameras would have more convergence in the world coordination. According to this concept, we apply a statistical method to match the corresponding target for locating targets in the world coordination.

Step 1.

The visual hulls of targets are extracted, and the polygon centers of the target hulls are estimated. The polygon centre of the visual hull can be formulized as follows: $\begin{matrix} p x = \sum_{i = 1}^{Q} \frac{p x_{i}}{m}, \\ p y = \sum_{i = 1}^{Q} \frac{p y_{i}}{m} . \end{matrix}$ (5) $p x$ and $p y$ mean the center's coordinate in horizontal and vertical, respectively, $p x_{i}$ and $p y_{i}$ are the horizontal and vertical pixel positions of the visual hull of the target and Q is the amount of the hull's pixels.

Step 2.

One target centre is selected from each camera that captures targets in order to make the correlated target pair. Then, we use the target localization algorithm to generate possible coordinates of the target. If n cameras are deployed in the surveillance area, we could get $C_{n}^{2} C_{a}^{1} C_{b}^{1}$ possible target positions. a and b are the numbers of visual hulls of targets captured by the two cameras, respectively.

Step 3.

If we assume that one camera could capture m target at most, we could get $m^{n}$ possible position pairs of targets at most. We compute the means of the target positions and the mean square errors (MSEs) between the estimates and the mean. The mean of pending coordinates of a target is formularized as follows: $\begin{matrix} X = \frac{x_{1} + x_{2} + \dots + x_{n}}{n}, \\ Y = \frac{y_{1} + y_{2} + \dots + y_{n}}{n} . \end{matrix}$ (6)X and Y are the mean of the pending position of the target; $x_{i}$ and $y_{i}$ are the horizontal and the vertical coordinates of the ith pending position of target, respectively. The MSE of a set of pending targets positions can be expressed as follows: $\begin{matrix} MSE = \frac{\sqrt{(\sum_{i = 1}^{n} {(x_{i} - X)}^{2} + \sum_{i = 1}^{n} {(y_{i} - Y)}^{2}) / 2}}{n} . \end{matrix}$ (7)

Step 4.

We make comparisons of the MSEs of the possible coordinates and find the minimum one. The mean with the little MSE is set as a possible target's position. The centers of visual hulls in the corresponding cameras' images are removed from the next iteration for finding the next correlated targets' image pair.

Step 5.

The feasible set of centers of visual hulls of targets is updated, and Steps from 2 to 4 are repeated till all the centers are used. By this method, the corresponding target can be matched among the camera set.

4. Correlations between Images

In this section, we investigate how to measure the correlations of images between different cameras. Without loss of generality, we are given a set $C = {c_{1}, c_{2}, \dots, c_{N}}$ of cameras sensors with fixed lens in a 2D interested area. For camera $c_{i}$ and camera $c_{j}$ in the set C, we will derive a correlation coefficient $λ_{i j}$ to describe the degree of correlation between image of $c_{i}$ and image of $c_{j}$ .

Intuitively, the visual information provided by multiple cameras should have more correlation if the sensing regions, which are observed by cameras, are more close to each other in the world frame. If the images observed by cameras are less correlated, the sensing regions covered by cameras are definitely more distinct with each other. From this perspective, we can map the problem of estimating the correlation of images into the problem of computing the correlations of regions observed by cameras. Since the sensing area of the camera is modeled as a bounded triangular region in this paper, the correlations of observed regions can be reflected by the overlapping field of views.

We can project the overlap extent of cameras' FOVs to the overlap area of intersections of the sensing model of the cameras. This intersection is a series of triangle intersections; therefore, it will always be a convex polygon [23]. Utilizing the method of computing the union of two convex polygons, which is described in [24], we can distinguish the intersection region and order its vertices. Thus, according to the properties of the polygon, if the vertices are ordered counterclockwise, the area of the convex polygon can be computed by the following expression: $\begin{matrix} S = \frac{1}{2} (| \begin{vmatrix} x_{1} & y_{1} \\ x_{2} & y_{2} \end{vmatrix} | + | \begin{vmatrix} x_{2} & y_{2} \\ x_{3} & y_{3} \end{vmatrix} | + \dots + | \begin{vmatrix} x_{n} & y_{n} \\ x_{1} & y_{1} \end{vmatrix} |) \end{matrix}$ (8) or $\begin{matrix} S = \frac{1}{2} \sum_{i = 1}^{n} (x_{i} y_{i + 1} - x_{i + 1} y_{i}), \end{matrix}$ (9) where S is the overlap area; n is the number of vertices of the polygon. $(x_{i}, y_{i})$ are the coordinates of the ith vertex of the overlap region of the sensing region of cameras. Note that to close the polygon the first and the last vertex are the same; that is, $(x_{n}, y_{n}) = (x_{0}, y_{0})$ .

We design an observation correlation coefficient (OCC) λ to reveal the correlations between the images observed by cameras. Suppose that camera $c_{i}$ and camera $c_{j}$ are two arbitrary cameras on the ground plane that can observe the interested area. The observation correlation coefficient is defined as the ratio of the overlap area of intersection region of camera $c_{i}$ and camera $c_{j}$ to the entire area of sensing model of a single camera $\begin{matrix} λ_{i j} = \frac{| s_{i} \cap s_{j} |}{| s_{i} |} = \frac{s_{i j}}{s}, \end{matrix}$ (10) where $λ_{i j}$ is the observation correlation coefficient between camera $c_{i}$ and camera $c_{j}$ , $s_{i}$ and $s_{j}$ mean the sensing region of the camera $c_{i}$ and camera $c_{j}$ , respectively, $s_{i j}$ is the area of the intersection of $s_{i}$ and $s_{j}$ , and s is the sensing area, that is, the area being sensed by the camera. According to the aforementioned algorithm, we can determine the intersection of camera $c_{i}$ and camera $c_{j}$ and its area. s can be obtained based on the parameters of the sensing model. $| \cdot |$ is the operator of computing the area $\begin{matrix} s = \frac{1}{2} r^{2} \sin θ . \end{matrix}$ (11)

According to the definition of λ, the maximum value is obtained when the sensing regions of camera $c_{i}$ and camera $c_{j}$ are identically overlapped, which equals to 1. If there is no intersection between the two cameras, the value of λ is zero, which is the minimum value. Thus, the value of λ ranges from 0 to 1.

The proposed observation correlation coefficient model is derived based on the sensing model and deployment information of cameras. To calculate the correlation between the camera $c_{i}$ and the camera $c_{j}$ , the camera $c_{i}$ just needs to transmit its four parameters to the camera $c_{j}$ : p, r, θ, α. If All camera sensors follow the same sensing model, only p and α are required to be transmitted. Once the camera $c_{j}$ receives the parameters, it can calculate the observation correlation coefficient based on (10). It can be seen that the computation of observation correlation coefficient is independent of image processing.

We present a simulation to show how the observation correlation coefficient varies when the deployments of cameras change.

4.1. OCC Influenced by Orientations of Cameras

As shown in Figure 3(a), the camera with pink projection region is $c_{1}$ , and the camera with the yellow projection region is $c_{2}$ . We assume that camera $c_{1}$ is fixed at (−L, 0), and its projection bearing is 0°. We let the location of camera $c_{2}$ change. The distance between the $c_{2}$ and the original point (0, 0) maintains L. The projection bearing α of camera $c_{2}$ changes from −180° to 180°. The difference of projection bearings between $c_{1}$ and $c_{2}$ is also α . The observation correlation coefficient between $c_{1}$ and $c_{2}$ is illustrated in Figure 3(b).

Figure 3

Illustration of observation correlation coefficient.

From the results in Figure 3(b), we can see that with the increase of difference of projection bearings between the two cameras, the observation correlation coefficient decreases. When the two cameras entirely overlap with each other, the observation correlation coefficient equals 1. It means that the two cameras have the same view towards the environment. When the projection bearings of the two cameras are perpendicular, the OCC decreases to 0.32, which means that the correlations between the images of the two cameras become weak. If the projection bearings of the two cameras are opposite to each other, the OCC becomes 0, from which we can see that the images of the two cameras are totally independent of each other.

4.2. OCC Influenced by the Positions of Cameras

We assume that the camera $c_{1}$ is fixed at the (5, 5) in the world frame. The orientation of $c_{1}$ is 60°. Then, we fix the orientation of camera $c_{2}$ and alter its position randomly in the region of 10*10. Based on the definition of OCC, we compute the value of OCC between $c_{1}$ and $c_{2}$ . Actually, it is not necessary to exhaustively search each camera in the set, since the OCC between the two cameras, whose distance from each other is greater than $2 r$ , will definitely be zero. In this case, we give the definition of neighboring sensors: the neighboring sensors are the sensors whose distances are less than $2 r$ $\begin{matrix} c_{j} is {\begin{cases} neighboring sensor of c_{i}, & d_{i j} < 2 r \\ not neighboring sensor of c_{i}, & otherwise, \end{cases} \end{matrix}$ (12) where, $d_{i j} = ∥ p_{i} - p_{j} ∥ = \sqrt{(x_{i}^{c} - x_{j}^{c})^{2} + (y_{i}^{c} - y_{j}^{c})^{2}}$ , $p_{i}$ and $p_{j}$ mean the positions of camera $c_{i}$ and camera $c_{j}$ , respectively, and $(x_{i}^{c}, y_{i}^{c})$ , $(x_{j}^{c}, y_{j}^{c})$ are their corresponding coordinates in the world frame. Therefore, it is only required to compute the OCC when $c_{2}$ is the neighbor sensor of $c_{1}$ ; otherwise, the OCC is zero.

Figure 4 illustrates the estimated results of observation correlation coefficient between the two cameras when the orientations of $c_{2}$ are 75°, 135°, and 225°, respectively. We find that (1) different locations of cameras will generate different OCCs, in spite of the same orientations of cameras; (2) when the positions of cameras are identical, the OCC could be different due to the difference of orientation of cameras. Therefore, positions and orientations of cameras are the two determining factors for the value of OCC.

Figure 4

OCC value in different deployments of cameras: (a) orientation of $c_{2}$ is 75°; (b) orientation of $c_{2}$ is 135°; (c) orientation of $c_{2}$ is 225°.

In WVSN, as long as the interested region is specified, and the locations and sensing directions of cameras are estimated, the observation correlation coefficient between cameras' overlapped field of views can be obtained as introduced above. It is much easier to obtain the proposed observation correlation coefficient than to get the entropy correlation coefficient [11]. The more cameras $c_{1}$ and $c_{2}$ are correlated, the more observation correlation coefficient can be obtained by $c_{1}$ and $c_{2}$ . Therefore, the observation correlation coefficient can help find the most correlated cameras.

5. Camera Activation Scheme for Target Tracking

In this section, we extend our study to design a correlation-based scheme to activate cameras for tracking a target. In general, sensors are always deployed in large numbers with redundancy in sensor networks. WVSN is not exceptional. Therefore, not every camera node in the WVSN needs to be active for sensing and communication all the time. The principle of the scheme is that the camera nodes, whose observed images are more correlated with each other, should be activated for involving the target tracking. The correlation degree between the images depends on the observation correlation coefficient discussed in Section 4. As long as the OCC is given, the cameras that need to be activated can be determined. The cruxes of the camera activation scheme lie on the following:(1) if one camera detects a target, how to activate additional cameras to cooperatively locate it in the world frame? (2) If the target moves out of field of views of the activated cameras, how to renew the group of the activated cameras to keep the target in the view of wireless visual sensor networks?

5.1. Buildup of the Group of Active Cameras

If a camera detects a target in its image, it has to activate other cameras to locate the target. Note that only two cameras, which capture the target simultaneously, could obtain the coordinates of the target in the world frame in theory; however, the accuracy of the target localization and tracking can be gradually improved by involving more cameras. On the other hand, gaining measurements from multiple camera nodes and transmitting these measurements will consume more energy and occupy much bandwidth of wireless visual sensor networks. Therefore, in order to balance the tradeoff between the tracking quality and the amount of measurements, we select three camera sensors to be activated, because three cameras are the minimum amount of cameras for utilizing the statistical method discussed in Section 3.4. Then, we also use the statistical method, described in [22], to fuse the measurements to obtain the location of the target in the world frame.

Once the target has been detected by camera $c_{i}$ , the next task is to activate more cameras to track the target. We select two additional cameras, which are most correlated with the camera that has detected the target, to be activated; that is, to activate the two cameras that have the maximum observation correlation coefficients with the camera $c_{i}$ $\begin{matrix} (c_{m}, c_{n}) = \arg \max_{c_{m}, c_{n} \in C} {λ_{n i}, λ_{m i}} . \end{matrix}$ (13)

We cannot assure that the camera with maximum OCC can definitely project target in its corresponding field of view. Therefore, a judgment has to be made: whether the two cameras with maximum OCC can successfully extract the target from their images. If so, it means that the target is located in the overlapping field of views of the selected cameras. By the use of epipolar geometry, discussed in Section 3, the target location in the world frame can be estimated. If not, we have to activate other cameras till all three cameras can extract the target from their images. The activation process also follows the principle of awaking the sensors with the maximum observation correlation coefficient. It will be illustrated by an example.

In Figure 5, $c_{1}$ , $c_{2}$ , $c_{3}$ , $c_{4}$ , $c_{5}$ represent the five cameras, respectively. The triangle shadows with different color mean the sensing regions of the cameras. The colored polygons show the overlap area between the corresponding cameras' sensing region: the blue polygon is generated by $c_{1}$ and $c_{2}$ , the green one is by $c_{1}$ and $c_{3}$ , the red one is by $c_{1}$ and $c_{4}$ , and the black one is by $c_{1}$ and $c_{5}$ .

Figure 5

Overlaps between cameras.

If camera $c_{1}$ detects the target T, we will compute the OCC between $c_{1}$ and its neighbor sensors ${c_{2}, c_{3}, c_{4}, c_{5}}$ . According to the expression (10), $λ_{12} = 0.83, λ_{13} = 0.15, λ_{14} = 0.45$ , and $λ_{15} = 0.39$ . Thus, $c_{2}$ and $c_{4}$ , which have the maximum OCC with $c_{1}$ , are to be selected to be activated. However, the target T is located out of the field of view of camera $c_{4}$ . We have to activate another camera for involving the target localization. Among the other cameras except for $c_{2}$ and $c_{4}$ , the camera $c_{5}$ has the maximum OCC with the camera $c_{1}$ which can project the target in its image at the same time. Therefore, if $c_{1}$ has detected the target, the cameras $c_{2}$ and $c_{5}$ will be activated to cooperate with $c_{1}$ to locate the target in the world frame.

Note that when the camera nodes are sparsely deployed in the interested area, it is possible that there are less than 3 cameras that can capture the target simultaneously at some times. In the sparse deployment of cameras, the target is not always under surveillance by 3 cameras. For camera activation, therefore, we activate the available cameras that can capture the target rather than invariably activate up to the predefined number of cameras.

5.2. Renew the Group of Active Cameras

Once the target moves out of the field of view of the activated cameras, we have to activate other cameras to keep the target under the incessant surveillance. In this section, we address the problem of renewing the group of active cameras for target tracking. In our proposed activation scheme mentioned above, three cameras should be activated. The camera sensor which detects the target initially is called the reference sensor in the active camera group. For example, in Figure 5, camera $c_{1}$ is the reference sensor. The camera that has the biggest OCC with the reference sensor is called the primary sensor in the active group. The camera $c_{2}$ is the primary sensor shown in Figure 5. The sensor which has the second biggest OCC with the reference camera is called the secondary sensor in the active group. The camera $c_{5}$ is the secondary sensor in Figure 5.

If the target moves out of the field of view of the reference sensor, the sensor which has the biggest OCC with the primary sensor should be activated to join the active camera group for executing the target localization and tracking. If the primary sensor loses the view of the target, we should activate the sensor having the biggest OCC with the secondary sensor. If the secondary sensor cannot extract the target from its image, we have to activate the sensor that has the second maximum OCC with the primary sensor. When two cameras of the active camera group lose the view of the target simultaneously, the camera sensors that have the maximum OCC with the remaining active camera will be activated till there are three cameras in the active mode.

The details of the activation scheme are presented in the form of Pseudocode 1.

Pseudocode 1

Begin $t = 0$ , all the cameras are in the group C

The camera sensor that has detected the target is set as the reference node $c_{r}$ .

Find $(c_{m}, c_{n}) = \arg \max_{c_{m}, c_{n} \in C} {λ_{m r}, λ_{n r}}$ , and activate them

If $λ_{m r} \geq λ_{n r}$

$c_{m}$ is the primary sensor, and $c_{n}$ is the secondary sensor

Else

$c_{n}$ is the primary sensor, and $c_{m}$ is the secondary sensor

End if

Set updated camera group $C^{'} = {C - c_{r} - c_{m} - c_{n}}$

While ( $c_{n}$ or $c_{m}$ cannot have the view of the target)

{

Find $c^{'} = \arg \max_{c^{'} \in C^{'}} {λ (c_{r}, c^{'})}$ , and activate it

Set $C^{'} = {C - c_{r} - c_{m} - c_{n} - c^{'}}$

Set $c_{m} = c^{'} or c_{n} = c^{'}$

}

Locate the target in the world frame $(x_{t}, y_{t})$ by the active camera group ${c_{r}, c_{m}, c_{n}}$

For $t = t + time step$ to $t_{\max}$

While (any camera in the active group loses the view of the target)

{

Find $c^{'} = \arg \max_{c^{'} \in C^{'}} {λ (c_{*}, c^{'})}$ , and activate it ${c_{*} = c_{m} or c_{*} = c_{n}}$

Set $C^{'} = {C - c_{r} - c_{m} - c_{j} - c^{'}}$

If the camera that loses the view of the target is $c_{r}$

$c_{r} = c_{m}, c_{m} = c^{'}, c_{n} = c_{n}$

Else if the camera that loses the target is $c_{m}$

$c_{r} = c_{r}, c_{m} = c_{n}, c_{n} = c^{'}$

Else the camera that loses the target is $c_{n}$

$c_{r} = c_{r}, c_{m} = c_{m}, c_{n} = c^{'}$ ,

End if

}

Update the position of the target $(x_{t}, y_{t})$ by active camera group ${c_{r}, c_{m}, c_{n}}$

End for

5.3. Camera Activation Scheme for Multiple Target Tracking

It is very possible that there is more than one target existing in the interested region in practical applications. The camera activation scheme has to be adjusted to satisfy requirements of multiple targets tracking. In the case of multiple targets, we activate cameras for one target by one target.

If n targets are detected by n different cameras, which means that every reference node has only one target in its FOV. The buildup of active camera group is almost the same as that for single target tracking. The cameras that have the maximum observation correlation coefficients with the camera that captures the target are selected to be activated. Since maybe some cameras have multiple targets in its view, we utilized the statistical method, which is discussed in Section 3.4, to match the corresponding targets for locating. In the phase of renewing the active cameras, if the camera in the active group, which has only one target in its view, loses the target, the scheme of activating the additional cameras is no different from that discussed in Section 5.1. When the target loss occurs in the camera that has multiple targets in its view, it is required to judge which target moves out of the FOV. As discussed in Section 3.4, we can distinguish the corresponding active group for every target by the statistical method. Thus, if the camera that loses the target is the primary sensor, we should activate the camera having the biggest OCC with the secondary sensor for the corresponding target. If the camera that loses the target is the secondary sensor, the camera sensor which has the second maximum OCC with the primary sensor should be activated for the corresponding target.

There is also another possibility that some cameras have detected more than one target at the beginning, which means that some reference nodes have more than one target in its views. For this case, the phase of buildup of active camera group is different from the scheme for single target tracking. The selection of the primary sensor and the secondary sensor is divided into two steps: (1) find all the sensors that capture the same mount targets with the reference node from the neighboring sensors; (2) the two cameras that have the maximum OCC with the reference node are selected to be activated. The camera that has bigger OCC with the reference node in the two sensors is the primary sensor, and the other is the secondary sensor. Thus, the active camera group for the camera which has more than one target in its view is established. Furthermore, the target matching can also be obtained by using the statistical method shown in Section 3.4. In the phase of renewing the active camera group, since targets can be distinguished from each other, every active camera group for each target can be renewed by the method discussed in Section 5.2.

6. Performance Evaluations

In this section, we present the results of some simulations which were performed to examine the performance of the proposed camera activation scheme. For the reason of reducing the simulation complexity, we deploy ten camera sensors in a 10 m * 10 m 2D field. The positions of the cameras are shown as the red points in the figure, and the blue lines show the bearing of cameras. All the cameras are with the FOV of 57.4° and the radius of 10 m. One target is moving in the field, shown as the red line in Figure 6. All the ten cameras will record the images for the interested field continuously. Figure 7 shows the images of ten cameras at 4 s. We artificially fix the time step as 1 second, due to the fact that we simulate the process in the computer environment for this case. Note that the time period of processing is decided by the resources of camera node.

Figure 6

The deployment of cameras and the trajectory of the target.

Figure 7

Images taken by the ten cameras at 4 s.

We take the utility-based sensor selection method [10] as a reference. In the method, the minimum MSE of the best linear estimate of the target location, which is a function of the cameras orientations, is used as a measurement for localization error. The optimal cameras are then activated by finding the camera orientations that minimize this metric.

Figure 8 shows the error performance of the reference method and our proposed activation scheme. The red line and the blue line show the errors of target tracking by the proposed activation scheme and the reference method, respectively. In each time step, the accuracy of target tracking in the proposed scheme is better than that in the reference method. For example, at 4 s, our proposed scheme has error of 0.26 m, by contrast, the error of target tracking in the reference method is 0.48 m. The reason behind this is that when $c_{1}$ has captured the target, $c_{6}$ and $c_{9}$ , which have the biggest OCC with the $c_{1}$ , are activated for executing the target tracking together with $c_{1}$ in our proposed activation scheme. The reference method selects $c_{1}$ , $c_{2}$ , and $c_{4}$ to activate. Since the $c_{2}$ and $c_{4}$ are not very far from the object, the estimation of the target center is sensitive to the noise in the two cameras' images, and even a little change of the light condition might influence the final results. Therefore, the estimation of the target center contains more errors, and correspondingly, the accuracy of the result of target tracking decreases. From another perspective, it is not enough to only rely on the distance between the target and cameras to decide which camera should be activated for executing the target tracking.

Figure 8

Tracking performances of the proposed activation scheme and the reference method.

Comparing the results of the proposed scheme and the results of the reference method, we find that in both cases, the errors of target tracking are irregular. This is in accordance with the fact that the images exhibit the state of the environment at the time of taking images, and the measurement errors cannot be predicted or predefined due to the changes of the environment.

We also make some comparisons between our proposed activation scheme and cluster-based scheme [19]. In cluster-based scheme, if a camera detects a target, all the measurements from all the other cameras that have the target in their field of view are used to estimate the position of the target. Figure 9 shows that the error performance of our proposed activation scheme is close to the cluster-based scheme.

Figure 9

Tracking performances of the proposed activation scheme and the cluster-based method.

Figure 10 shows the amounts of cameras that take part in target tracking in every time step by the two methods. It is apparent that cluster-based method involves more cameras to locate the position of target than our proposed activation method. In some time steps, such as at 6 s, 7 s, and 10 s, the cameras that have been activated are the same between the two methods. The reason behind this is that at such time steps, there are 3 cameras or less that can capture the target simultaneously. We activate the available cameras that view the target. From Figures 9 and 10, we can see that the proposed camera activation scheme has better performance in the tradeoff between the cameras usage and the quality of target tracking.

Figure 10

Amount of activated cameras at every time step.

7. Conclusions

With the goal of accurately tracking a target, we have presented a camera sensor activation scheme in wireless visual sensor networks. By studying the sensing model and deployment of cameras in the network, we propose an observation correlation coefficient to describe the correlations between images observed by cameras. The observation correlation coefficient is used for activating the most correlated cameras. In order to address the problem that more than one target exists in the interested area, we have provided a statistical method to match the targets for locating and tracking targets. Correspondingly, we also present the sensor activation scheme for multiple target tracking. With the help of simulations, we show that the proposed activation scheme could generate more accurate estimations of target tracking than the reference method. Therefore, the proposed scheme serves as a useful tool for activating cameras to track the target.

Footnotes

Acknowledgments

This work was supported by “the Fundamental Research Funds for the Central Universities” and “Doctoral Fund of Ministry of Education of China.”

References

Zhang

Sensor selection for improving accuracy of target localization in wireless visual sensor networks

IET Wireless Sensor Systems 2012 2 4 293 301

Charfi

Wakamiya

Murata

Challenging issues in visual sensor networks

IEEE Wireless Communications 2009 16 2 44 49

2-s2.0-67349127129

10.1109/MWC.2009.4907559

Liu

Reich

Zhao

Collaborative in-network processing for target tracking

Eurasip Journal on Applied Signal Processing 2003 2003 4 378 391

2-s2.0-0037445324

10.1155/S111086570321204X

Akyildiz

I. F.

Melodia

Chowdury

K. R.

Wireless multimedia sensor networks: a survey

IEEE Wireless Communications 2007 14 6 32 39

2-s2.0-37249066703

10.1109/MWC.2007.4407225

Atia

G. K.

Veeravalli

V. V.

Fuemmeler

J. A.

Sensor Scheduling for energy-efficient Target Tracking in Sensor Networks

IEEE Transactions on Signal Processing 2011 59 10 4923 4937

Sharma

Camera scheduling and energy allocation for lifetime maximization in user-centric visual sensor networks

IEEE Transactions on Image Processing 2010 19 8 2042 2055

2-s2.0-77954702224

10.1109/TIP.2010.2046794

Soro

Heinzelman

W. B.

On the coverage problem in video-based wireless sensor networks

Proceedings of the 2nd International Conference on Broadband Networks (BROADNETS '05)

October 2005

932 939

2-s2.0-33847067124

10.1109/ICBN.2005.1589704

Cai

Lou

X. Y.

Energy efficient target-oriented scheduling in directional sensor networks

IEEE Transactions on Computers 2009 58 9 1259 1274

2-s2.0-68949184882

10.1109/TC.2009.40

Alaei

Barcelo-Ordinas

J. M.

Priority-based node selection and scheduling for wireless multimedia sensor networks

Proceedings of the 6th Annual IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob '2010)

October 2010

151 158

2-s2.0-78650720061

10.1109/WIMOB.2010.5644981

10.

Ercan

Yang

Gamal

A. E.

Guibas

Optimal placement and selection of camera network nodes for target localization

Proceedings of the Conference on Distributed Computing in Sensor Systems

2006

389 404

11.

Pluim

J. P. W.

Maintz

J. B. A.

Viergever

M. A.

Mutual-information-based registration of medical images: a survey

IEEE Transactions on Medical Imaging 2003 22 8 986 1004

2-s2.0-0043028206

10.1109/TMI.2003.815867

12.

Vuran

M. C.

Akan

Ö. B.

Akyildiz

I. F.

Spatio-temporal correlation: theory and applications for wireless sensor networks

Computer Networks 2004 45 3 245 259

2-s2.0-2342556586

10.1016/j.comnet.2004.03.007

13.

Zhao

Liu

Guibas

Reich

Collaborative signal and information processing: an information-directed approach

Proceedings of the IEEE 2003 91 8 1199 1209

2-s2.0-3042514350

10.1109/JPROC.2003.814921

14.

Chu

Haussecker

Zhao

Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks

International Journal of High Performance Computing Applications 2002 16 3 293 313

2-s2.0-18844428221

15.

Chong

E. K. P.

Sensor scheduling for target tracking: a Monte Carlo sampling approach

Digital Signal Processing 2006 16 5 533 545

2-s2.0-33747877521

10.1016/j.dsp.2005.02.005

16.

Kreucher

Kastella

Hero

Sensor management using an active sensing approach

IEEE Transactions on Signal Processing 2005 85 3 607 624

17.

Toh

Y. K.

Xiao

Xie

A wireless sensor network target tracking system with distributed competition based sensor scheduling

Proceedings of the International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP '07)

December 2007

257 262

2-s2.0-51349090108

10.1109/ISSNIP.2007.4496853

18.

Medeiros

Park

Kak

A. C.

Distributed object tracking using a cluster-based Kalman filter in wireless camera networks

IEEE Journal on Selected Topics in Signal Processing 2008 2 4 448 463

2-s2.0-54049136837

10.1109/JSTSP.2008.2001310

19.

Song

Kamal

A. T.

Soto

Tracking and activity recognition through consensus in distributed camera networks

IEEE Transactions on Image Processing 2010 19 10 2564 2578

20.

Dai

Akyildiz

I. F.

A spatial correlation model for visual information in wireless multimedia sensor networks

IEEE Transactions on Multimedia 2009 11 6 1148 1159

2-s2.0-70349448306

10.1109/TMM.2009.2026100

21.

Yilmaz

Javed

Shah

Object tracking: a survey

ACM Computing Surveys 2006 38 4, article 13

2-s2.0-33846013241

10.1145/1177352.1177355

22.

Portilla

Moreno

Liang

Riesgo

Improving target localization accuracy of wireless visual sensor networks

Proceedings of the 37th IEEE Industrial Electronics Conference

November 2011

Melbourne, Australia

23.

Massey

Kapur

Dabiri

L. N.

Sarrafzadeh

Localization using low-resolution optical sensors

Proceedings of the IEEE Internatonal Conference on Mobile Adhoc and Sensor Systems (MASS '07)

October 2007

Pisa, Italy

2-s2.0-50249157569

10.1109/MOBHOC.2007.4428621

24.

Zhou

Computational Geometry Algorithm Design and Analysis 2008

Tsinghua University Press

Camera Sensor Activation Scheme for Target Tracking in Wireless Visual Sensor Networks

Abstract

1. Introduction

2. Related Works

3. Preliminaries

3.1. Assumptions

3.2. Sensing Model

3.3. The Model of Target Localization

3.4. Multiple Targets Localization

Step 1.

Step 2.

Step 3.

Step 4.

Step 5.

4. Correlations between Images

4.1. OCC Influenced by Orientations of Cameras

4.2. OCC Influenced by the Positions of Cameras

5. Camera Activation Scheme for Target Tracking

5.1. Buildup of the Group of Active Cameras

5.2. Renew the Group of Active Cameras

Pseudocode 1

5.3. Camera Activation Scheme for Multiple Target Tracking

6. Performance Evaluations

7. Conclusions

Footnotes

Acknowledgments

References