Abstract
1. Introduction
Wireless visual sensor networks (WVSNs) include cameras to capture visual data from the environment and processing components to process the data locally in a desired way [1]. Compared with the wireless scalar sensor networks, WVSN can provide information-rich descriptions of captured events, which is adopt in various security and surveillance applications, such as remote and distributed video-based surveillance, environmental monitoring, and ambient-assisted living and personal care [2]. In these surveillance applications, target tracking is one of the major issues, which is to accurately determine the location of possible moving target within the least amount of time. For example, an accurate and timely determination of vehicle location is required for battlefield situational awareness [3].
Target tracking in WVSN involves a number of significant challenges. Firstly, WVSN is a resource-limited network. The capabilities of processing and transmission in WVSN are not powerful enough. There may be constraints of bandwidth for transmitting all the raw data from camera sensors to the central node and difficulties in analyzing a huge amount of visual data. The existing algorithms of tracking cannot solve the target tracking in WVSN. Secondly, WVSN is an energy-limited network. Sensor nodes are typically battery powered, and it may not be feasible to replace or recharge the batteries of sensors in many remote sensing applications. As a result, an important characteristic of sensor networks is the limited energy. In particular, compared with the scalar sensors, image processing and transmission consume remarkable energy, in spite of the fact that the content of interest in each frame capture by the visual node is very high. Therefore, it is crucial for energy efficient use in WVSN. Thirdly, WVSN is a low-cost network. Visual nodes are usually equipped with low-resolution cameras due to the cost limitation [4]. Pixels on the target in WVSN are relatively less than that in other camera networks, which will increase the difficulties in target extraction, localization, and tracking.
To address the above challenges, the key is to balance the tradeoff between the value of information in the measurements and the resources of WVSN [5]. In this paper, we follow the information-driven sensor querying framework in which camera sensors are selectively activated based on their correlations. Because of constraints on resources, energy consumption, and low resolutions of cameras, we consider a group-based target tracking scheme, where there is a small group of sensors active, while the rest of the network is idle. A camera which has detected the target will call up other cameras with the most correlations to compose a group of active cameras for locating and tracking the target. If any active camera loses the view of the target, it goes back to sleep, and a new camera will be activated to renew the active group based on the correlation computation. The crux of the group-based target tracking scheme is the approach of sensor activation which must be carried out with accuracy and in real time as important considerations. The scheme of activating cameras should provide the most accurate location information of the target, and it should also assure that the WVSN could execute in real time.
In this paper, we address the problem of cameras sensor activation for tracking a target in WVSN. Motivated by the fact that the image observed by a camera is directly related to its field of view, we designed an observation correlation coefficient to evaluate the correlations between two cameras based on the sensing model rather than analyzing the specific images of cameras. In cooperation with the observation correlation coefficient, the most correlated cameras are selected to be activated for locating the target in the world frame. Our main contributions include the following:
we design the observation correlation coefficient to describe the correlation from images observed by cameras with overlapped field of views. By this method, a large bulk of computation for image processing to find the correlated camera is avoided; a small group of cameras are involved in the target tracking rather than all the cameras viewing the same target, which is beneficial for saving the cost, the computation, and communication.
The remainder of this paper is organized as follows: in Section 2, we briefly highlight the related works. Section 3 presents assumptions and preliminaries. We also present the sensing model of the cameras and the model of target localization by two cameras. Section 4 introduces the observation correlation coefficient. The proposed camera sensor activation scheme is introduced in Section 5. Section 6 conducts experiments to validate and evaluate the proposed scheme, and conclusions are given in Section 7.
2. Related Works
Strategies of sensor scheduling for optimizing network lifetime in wireless sensor networks have been previously considered in the literatures. Yu et al. develops a camera scheduling strategy to maximize the lifetime of the visual sensor networks [6]. Soro et al. provides a heuristic approach for camera scheduling by proposing a cost function associated with each camera depending upon the remaining energy of the camera and the coverage geometry [7]. In the above two works, they utilize the ceiling cameras, which are impractical to deploy. Cai et al. organizes the directions of sensors into a group of nondisjoint cover set, where one cover set in which the directions cover all the targets is activated at one time to extend the network lifetime [8]. Alaei et al. provides a priority-based sensor scheduling strategy to apply coordination on cluster sensor for awakening the minimum number of sensors to monitor the interested area [9]. Given the noisy measurements and the object prior distribution, Ercan et al. uses the minimum mean square error (MSE) of the best linear estimate of the object location in 2D as a metric for sensor selection [10]. Besides, an entropy-based analytical framework is developed to measure the amount of visual information provided by multiple cameras in WVSN [11]. This method is heavily relied on a joint probability distribution of the two image sources [12], whose accurate estimation is difficult to get due to the complexity of image contents and the difficulty in image modeling [11]. The information-driven sensor scheduling technique is introduced in [13, 14], which activates the next sensor based on maximizing information utility and minimizing resource cost.
In context of sensor scheduling for target tracking, some approaches have been developed for optimizing tracking performance subject to constraints on sensor usage. Ying et al. develops a Monte Carlo solution method to address the problem of sensor scheduling for target tracking, which is formulated as a partially observable Markov decision process [15]. Kreucher et al. adopts an active sensing approach to scheduling sensors for multiple target tracking applications that combines particle filtering, predictive density estimation, and relative entropy maximization [16]. Toh et al. develops a distributed target system implementing a novel competition-based distributed sensor scheduling scheme where a candidate sensor node with highest predicted tracking accuracy will be elected as the new tasking sensor with the highest probability [17]. The previous researches are mainly in the area of wireless sensor networks. However, these existing sensor scheduling algorithms cannot solve the issue in visual sensor networks, due to the significant differences in information acquiring and processing method from conventional sensor networks.
A distributed target tracking approach using a cluster-based Kalman filter was proposed in [18], where a camera is selected as a cluster head which aggregates all the measurements in the communication range to estimate its position using a Kalman filter and sends the estimation to the central unit for tracking the target. Song et al. [19] also proposed a consensus method to track the target in a camera network. In their work, if a camera detects a target, the information from all the other cameras viewing the same target is used for fusing. From the aforementioned related works, we can observe that the accuracy of target tracking is limited by the resource of sensor networks in existing sensor scheduling methods. Therefore, in this paper, we propose the sensor activation scheme to balance the tradeoff between the tracking accuracy and resources of WVSN.
3. Preliminaries
3.1. Assumptions
In this paper, all subsequent discussions are based on the following assumptions:
we consider a homogeneous WVSN, which only contains the camera sensors, in a two-dimensional field. For convenience, we assume that the cameras sensors are placed in a square region. Actually, we are not concerned about the shape of the interested region, which can be circular or any other polygons; we assume that all the cameras are placed horizontally around a space. The positions and bearings of the cameras have been already known in the world frame; the WVSN is deployed with sufficient nodes such that the network is connected. Some of them are in the active mode during the period when no target is detected. Other cameras are in the sleep mode, which can be switched to the active mode immediately if they have received awakening signal; the sleeping sensors can be woken up by external means. The time delay from the sleeping mode to the active mode is neglected in this paper.
3.2. Sensing Model
Different from scalar data sensors, cameras project a target from a 3D world to a 2D plane via a perspective point, which can only acquire the bearings of the target in the visual image. Specifically, as shown in Figure 1, a camera has a field of view (FOV) that represents the area on the

Field of view.
It seems that cameras have unlimited sensing range; however, the target, which is described as pixels in the image, cannot be extracted from the background, if the target is located too far away from the camera, especially, when the resolution of camera is not good enough. Thus, for the reason of reducing the complexity, the sensing model of the camera in this paper is expressed by a triangle which is determined by four parameters (

Simplified sensing model.
3.3. The Model of Target Localization
The purpose of target tracking is to generate the trajectory of a target over time by locating its position in every frame of the video [21]. The principle of target localization is to estimate the coordinates of intersection point when the bearings of target in different images are intersected. We set up a world coordinate system
The coordinates of the intersection point of the target bearings in the world coordinate plane can be calculated as follows. We assume that simple background subtraction is performed locally at each camera node. The hull of the target can be extracted. A feature point is selected to represent the target. As shown in Figure 2,
If two cameras capture the same target at the same time, the target bearings generated from two cameras would be intersected. We can infer the coordinate of the target from the known positions of two cameras and the intersected point of bearings of the target. The computation process is described by the following equation:
3.4. Multiple Targets Localization
There may be multiple targets existing in the interested field. The biggest difference between the single target tracking and the multiple target tracking lies on the corresponding target matching, which can be addressed by the approaches incorporated with features of targets, such as the color, motion, contour, and boundary. From the perspective of statistics, the position of the same target estimated by a set of cameras would have more convergence in the world coordination. According to this concept, we apply a statistical method to match the corresponding target for locating targets in the world coordination.
Step 1.
The visual hulls of targets are extracted, and the polygon centers of the target hulls are estimated. The polygon centre of the visual hull can be formulized as follows:
Step 2.
One target centre is selected from each camera that captures targets in order to make the correlated target pair. Then, we use the target localization algorithm to generate possible coordinates of the target. If
Step 3.
If we assume that one camera could capture
Step 4.
We make comparisons of the MSEs of the possible coordinates and find the minimum one. The mean with the little MSE is set as a possible target's position. The centers of visual hulls in the corresponding cameras' images are removed from the next iteration for finding the next correlated targets' image pair.
Step 5.
The feasible set of centers of visual hulls of targets is updated, and Steps from 2 to 4 are repeated till all the centers are used. By this method, the corresponding target can be matched among the camera set.
4. Correlations between Images
In this section, we investigate how to measure the correlations of images between different cameras. Without loss of generality, we are given a set
Intuitively, the visual information provided by multiple cameras should have more correlation if the sensing regions, which are observed by cameras, are more close to each other in the world frame. If the images observed by cameras are less correlated, the sensing regions covered by cameras are definitely more distinct with each other. From this perspective, we can map the problem of estimating the correlation of images into the problem of computing the correlations of regions observed by cameras. Since the sensing area of the camera is modeled as a bounded triangular region in this paper, the correlations of observed regions can be reflected by the overlapping field of views.
We can project the overlap extent of cameras' FOVs to the overlap area of intersections of the sensing model of the cameras. This intersection is a series of triangle intersections; therefore, it will always be a convex polygon [23]. Utilizing the method of computing the union of two convex polygons, which is described in [24], we can distinguish the intersection region and order its vertices. Thus, according to the properties of the polygon, if the vertices are ordered counterclockwise, the area of the convex polygon can be computed by the following expression:
We design an observation correlation coefficient (OCC)
According to the definition of
The proposed observation correlation coefficient model is derived based on the sensing model and deployment information of cameras. To calculate the correlation between the camera
We present a simulation to show how the observation correlation coefficient varies when the deployments of cameras change.
4.1. OCC Influenced by Orientations of Cameras
As shown in Figure 3(a), the camera with pink projection region is

Illustration of observation correlation coefficient.
From the results in Figure 3(b), we can see that with the increase of difference of projection bearings between the two cameras, the observation correlation coefficient decreases. When the two cameras entirely overlap with each other, the observation correlation coefficient equals 1. It means that the two cameras have the same view towards the environment. When the projection bearings of the two cameras are perpendicular, the OCC decreases to 0.32, which means that the correlations between the images of the two cameras become weak. If the projection bearings of the two cameras are opposite to each other, the OCC becomes 0, from which we can see that the images of the two cameras are totally independent of each other.
4.2. OCC Influenced by the Positions of Cameras
We assume that the camera
Figure 4 illustrates the estimated results of observation correlation coefficient between the two cameras when the orientations of

OCC value in different deployments of cameras: (a) orientation of
In WVSN, as long as the interested region is specified, and the locations and sensing directions of cameras are estimated, the observation correlation coefficient between cameras' overlapped field of views can be obtained as introduced above. It is much easier to obtain the proposed observation correlation coefficient than to get the entropy correlation coefficient [11]. The more cameras
5. Camera Activation Scheme for Target Tracking
In this section, we extend our study to design a correlation-based scheme to activate cameras for tracking a target. In general, sensors are always deployed in large numbers with redundancy in sensor networks. WVSN is not exceptional. Therefore, not every camera node in the WVSN needs to be active for sensing and communication all the time. The principle of the scheme is that the camera nodes, whose observed images are more correlated with each other, should be activated for involving the target tracking. The correlation degree between the images depends on the observation correlation coefficient discussed in Section 4. As long as the OCC is given, the cameras that need to be activated can be determined. The cruxes of the camera activation scheme lie on the following:(1) if one camera detects a target, how to activate additional cameras to cooperatively locate it in the world frame? (2) If the target moves out of field of views of the activated cameras, how to renew the group of the activated cameras to keep the target in the view of wireless visual sensor networks?
5.1. Buildup of the Group of Active Cameras
If a camera detects a target in its image, it has to activate other cameras to locate the target. Note that only two cameras, which capture the target simultaneously, could obtain the coordinates of the target in the world frame in theory; however, the accuracy of the target localization and tracking can be gradually improved by involving more cameras. On the other hand, gaining measurements from multiple camera nodes and transmitting these measurements will consume more energy and occupy much bandwidth of wireless visual sensor networks. Therefore, in order to balance the tradeoff between the tracking quality and the amount of measurements, we select three camera sensors to be activated, because three cameras are the minimum amount of cameras for utilizing the statistical method discussed in Section 3.4. Then, we also use the statistical method, described in [22], to fuse the measurements to obtain the location of the target in the world frame.
Once the target has been detected by camera
We cannot assure that the camera with maximum OCC can definitely project target in its corresponding field of view. Therefore, a judgment has to be made: whether the two cameras with maximum OCC can successfully extract the target from their images. If so, it means that the target is located in the overlapping field of views of the selected cameras. By the use of epipolar geometry, discussed in Section 3, the target location in the world frame can be estimated. If not, we have to activate other cameras till all three cameras can extract the target from their images. The activation process also follows the principle of awaking the sensors with the maximum observation correlation coefficient. It will be illustrated by an example.
In Figure 5,

Overlaps between cameras.
If camera
Note that when the camera nodes are sparsely deployed in the interested area, it is possible that there are less than 3 cameras that can capture the target simultaneously at some times. In the sparse deployment of cameras, the target is not always under surveillance by 3 cameras. For camera activation, therefore, we activate the available cameras that can capture the target rather than invariably activate up to the predefined number of cameras.
5.2. Renew the Group of Active Cameras
Once the target moves out of the field of view of the activated cameras, we have to activate other cameras to keep the target under the incessant surveillance. In this section, we address the problem of renewing the group of active cameras for target tracking. In our proposed activation scheme mentioned above, three cameras should be activated. The camera sensor which detects the target initially is called the reference sensor in the active camera group. For example, in Figure 5, camera
If the target moves out of the field of view of the reference sensor, the sensor which has the biggest OCC with the primary sensor should be activated to join the active camera group for executing the target localization and tracking. If the primary sensor loses the view of the target, we should activate the sensor having the biggest OCC with the secondary sensor. If the secondary sensor cannot extract the target from its image, we have to activate the sensor that has the second maximum OCC with the primary sensor. When two cameras of the active camera group lose the view of the target simultaneously, the camera sensors that have the maximum OCC with the remaining active camera will be activated till there are three cameras in the active mode.
The details of the activation scheme are presented in the form of Pseudocode 1.
Begin The camera sensor that has detected the target is set as the reference node Find If Else End if Set updated camera group While ( { Find Set Set } Locate the target in the world frame For While (any camera in the active group loses the view of the target) { Find Set If the camera that loses the view of the target is Else if the camera that loses the target is Else the camera that loses the target is End if } Update the position of the target End for
Pseudocode 1
5.3. Camera Activation Scheme for Multiple Target Tracking
It is very possible that there is more than one target existing in the interested region in practical applications. The camera activation scheme has to be adjusted to satisfy requirements of multiple targets tracking. In the case of multiple targets, we activate cameras for one target by one target.
If
There is also another possibility that some cameras have detected more than one target at the beginning, which means that some reference nodes have more than one target in its views. For this case, the phase of buildup of active camera group is different from the scheme for single target tracking. The selection of the primary sensor and the secondary sensor is divided into two steps: (1) find all the sensors that capture the same mount targets with the reference node from the neighboring sensors; (2) the two cameras that have the maximum OCC with the reference node are selected to be activated. The camera that has bigger OCC with the reference node in the two sensors is the primary sensor, and the other is the secondary sensor. Thus, the active camera group for the camera which has more than one target in its view is established. Furthermore, the target matching can also be obtained by using the statistical method shown in Section 3.4. In the phase of renewing the active camera group, since targets can be distinguished from each other, every active camera group for each target can be renewed by the method discussed in Section 5.2.
6. Performance Evaluations
In this section, we present the results of some simulations which were performed to examine the performance of the proposed camera activation scheme. For the reason of reducing the simulation complexity, we deploy ten camera sensors in a 10 m

The deployment of cameras and the trajectory of the target.

Images taken by the ten cameras at 4 s
We take the utility-based sensor selection method [10] as a reference. In the method, the minimum MSE of the best linear estimate of the target location, which is a function of the cameras orientations, is used as a measurement for localization error. The optimal cameras are then activated by finding the camera orientations that minimize this metric.
Figure 8 shows the error performance of the reference method and our proposed activation scheme. The red line and the blue line show the errors of target tracking by the proposed activation scheme and the reference method, respectively. In each time step, the accuracy of target tracking in the proposed scheme is better than that in the reference method. For example, at 4 s, our proposed scheme has error of 0.26 m, by contrast, the error of target tracking in the reference method is 0.48 m. The reason behind this is that when

Tracking performances of the proposed activation scheme and the reference method.
Comparing the results of the proposed scheme and the results of the reference method, we find that in both cases, the errors of target tracking are irregular. This is in accordance with the fact that the images exhibit the state of the environment at the time of taking images, and the measurement errors cannot be predicted or predefined due to the changes of the environment.
We also make some comparisons between our proposed activation scheme and cluster-based scheme [19]. In cluster-based scheme, if a camera detects a target, all the measurements from all the other cameras that have the target in their field of view are used to estimate the position of the target. Figure 9 shows that the error performance of our proposed activation scheme is close to the cluster-based scheme.

Tracking performances of the proposed activation scheme and the cluster-based method.
Figure 10 shows the amounts of cameras that take part in target tracking in every time step by the two methods. It is apparent that cluster-based method involves more cameras to locate the position of target than our proposed activation method. In some time steps, such as at 6 s, 7 s, and 10 s, the cameras that have been activated are the same between the two methods. The reason behind this is that at such time steps, there are 3 cameras or less that can capture the target simultaneously. We activate the available cameras that view the target. From Figures 9 and 10, we can see that the proposed camera activation scheme has better performance in the tradeoff between the cameras usage and the quality of target tracking.

Amount of activated cameras at every time step.
7. Conclusions
With the goal of accurately tracking a target, we have presented a camera sensor activation scheme in wireless visual sensor networks. By studying the sensing model and deployment of cameras in the network, we propose an observation correlation coefficient to describe the correlations between images observed by cameras. The observation correlation coefficient is used for activating the most correlated cameras. In order to address the problem that more than one target exists in the interested area, we have provided a statistical method to match the targets for locating and tracking targets. Correspondingly, we also present the sensor activation scheme for multiple target tracking. With the help of simulations, we show that the proposed activation scheme could generate more accurate estimations of target tracking than the reference method. Therefore, the proposed scheme serves as a useful tool for activating cameras to track the target.
