Abstract
Introduction
Image mosaicing techniques have attracted a growing attention in many application areas, such as: video stabilization and compression, background generation, virtual environment, and panoramic photography [1] . Recently, its application was extended to video tracking in the wide space such as airport and building. The problem of tracking and recognizing non-rigid objects in video sequences becomes crucial in many video surveillance applications. Examples include a motion detector in video recording systems, action analysis for animation, medical imaging, and human computer interaction (HCI).
Image mosaics can be constructed by aligning and blending partially overlapped images. Various methods to register overlap areas of two adjacent images have been proposed using an 8-parameters perspective transformation [2] , a polynomial transformation with higher degree of freedom [3] , or other geometric corrections [4] . However, the registration is usually implemented on the entire overlapping area, which is usually too large for a single global transformation to get an acceptable result. Thus local corrections have to be introduced to deal with both discontinuities and distortions. The problem of the overlap-based registration still exists for most commercial software to generate panoramic views. The existing research performs projective transformation from only four features in two overlapping images [5] .
The proposed panoramic video creation module consists of: two steps (i) selection of four quasi-feature points in two adjacent frames acquired by corresponding 4 cameras and (ii) mosaicing the two images. The proposed quasi-feature extraction algorithm selects four points that are not co-linear in the reference image. In creating a panoramic image, overlapping ratio of two images is 30% to 90%. The reference frame is first divided into three areas and block-based registration is performed in the third block by using mean absolute difference (MAD) values. The block of size can be experimentally selected, and we used 20×10 blocks for the experiment. We additionally search a quasi-feature point from centroid of the selected block. In this process, an internal block consists of a flat and a texture region. The extracted four feature points play a role in evaluating parameters of the projective transform by using the DLT.
Tracking a deformable object in consecutive frames is a fundamental problem in video surveillance systems. There have been various researches for video-based object extraction and tracking. One of the simplest methods is to track regions of difference between a pair of consecutive frames [6] , and its performance can be improved by using adaptive background generation and subtraction. Although a simple difference-based tracking method is efficient in tracking an object under noise-free circumstances, it often fails under noisy, complicated background. The tracking performance is further degraded if a camera moves either intentionally or unintentionally. For more robust analysis of an object, shape-based object tracking algorithms have been developed, which utilize a priori shape information of an object-of-interest, and project a trained shape onto the closest shape in a certain frame. This type of methods includes active contour model (ACM), condensation algorithm, and ASM.
This paper presents an ASM-based, real-time tracking algorithm for locating a deformable object in the panoramic images. The proposed tracking algorithm is considered to be a physical model that allows the system to accurately predict the potential change in object's shape over time. The detecting procedure extracts moving objects by motion segmentation between frames, and the tracking procedure detects moving regions using, for example, optical flow-based data association. The detected region is further enhanced using ellipsoid localization prior to update of training sets using the smart snake algorithm (SSA). The local structure modeling is carried out using directional regularization operators. The directional regularization reflects the orientation of the edges and finds dominant directional edge.
The paper is organized as follows. In section 2 we introduce a method to extract quasi-features and to build a panoramic image. The ASM is presented in section 3. Section 4 summarizes experimental results, and section 5 concludes the paper.
Panoramic Image Creation Using Quasi-Features
There are limits in tracking objects from as single image. Therefore, we propose a method that can track objects in a wide area of background. The proposed method extracts quasi-feature points, and performs the DLT algorithm to build panoramic image. The panoramic video based multiview object tracking follows shown in Fig. 1.

Block diagram of the proposed ASM-based tracking algorithm in panoramic video
The quasi-feature is not the geometrical feature of the image but the feature based on the intensities of the image [7] . The proposed method extracts four feature points in two overlapping images to compute the projective transformation. Feature extraction and block searching processes are illustrated in Fig. 2.

Proposed block search and quasi-feature extraction: (a) reference frame, (b) target frame, (c) reference quasi-feature point, and (d) target quasi-feature point
The extraction algorithm of quasi-feature is summarized as follow.
Select a block Extract strong edge using Canny operator. This process searches many edge blocks that are classified as a flat and textured regions. As a result the optimal quasi-features are extracted in the overlapping region.
Evaluate where Select Select the centroid of the optimum quasi-feature Compute quasi-feature error of 8-neighborhood pixels between where
The proposed panoramic image creation performs the projective transform by using the DLT algorithm from the extracted four quasi-features. The homography H matches in two images as
[5]
where
Although there are three equations in Eq. (4), only two of them are linearly independent. Thus each point correspondence gives two equations in the entries of
This can be rewritten as
where
The unit singular vector corresponding to the smallest singular value is the solution h. If
Localization and shape of an object are the fundamental factors for video tracking. Within the class of deformable models, the ASM obtains boundary shape by prior information. Therefore the ASM is one of the best-suited approaches in the sense of both accuracy and efficiency. The basic concepts of ASM consist of modeling the contour of the silhouette of an object in the image by parameters in order to align the changing contours in the image frames to each other. The proposed tracking module consists of four steps: (i) landmark point assignment, (ii) PCA, (iii) modeling of local structure, and (iv) model fitting. In this section we present about fundamental ASM.
Landmark Point Assignment
Given a frame of input video, suitable landmark points should be assigned on the contour of the object. The feature point of object boundary called landmark points. The role of landmark points is controlling the shape of model contours. Good landmark points should be consistently located from one image to another. In a two-dimensional image, we represent n landmark points by a 2
where
A typical setup in our system consists of 42 manually assigned landmark points of 42 (
A set of n landmark points represents the shape of the object. Fig. 3 shows a set of 56 different shapes, called a training set. The shape of object made with landmark points. We must aligning shapes into a common coordinate frame.

Training set of 56 shapes (
Although each shape in the training set is in the 2
Model Fitting
We can find the best pose and shape parameters to match a shape in the model coordinate frame,
where
Given a single point, denoted by [
After the set of pose parameters, {θ,
Finally, the model parameters are updated as
As the result of the searching procedure along profiles, the optimal displacement of a landmark point is obtained. The combination of optimally updated landmark points generates a new shape in the image coordinate frame,
A statistical, deformable shape model can be built by assignment of landmark points, PCA, and model fitting steps. In order to interpret a given shape in the input image based on the shape model, we must find the set of parameters that best match the model to the image. Show Fig. 4, if we assume that the shape model represents boundaries and strong edges of the object, a profile across each landmark point has an edge-like local structure. Let
where

Local structure: (a) search profile (b) initial shape (c) after local structure
We used 4 CCD cameras 320×240, indoor video sequences to test tracking performance for deformable object. Illumination changes in outdoor video sequences are not considered. Fig. 5(a) shows a panoramic image created by the proposed method. Fig. 5(b) shows the result of blank removal. Fig. 5(c) shows the final result of brightness compensation. Fig. 6 shows aligned centroids of the corresponding shapes in 40 frames.

A sequence of panoramic image creation: (a) the initially created panoramic image, (b) the result of blank removal, and (c) the result of brightness compensation

Result alignment for training data
Fig. 7 shows the tracking result in the created panoramic video. In Fig. 7 the model fitting by using directional regularization before and after occlusion enables acceptable degree of occlusion handling. In the presence of occlusion the ASM-based tracker utilizes feature vectors of the region tracker. The region prediction and compensation algorithm is used to update the motion blob and region sizes using information from the previous frames. As a result the proposed ASM-based tracker can effectively deal with the occlusion problem. Table 1 shows the tracking speed can be obtained 20 f/s using ASM which can be considered as real-time. We also get tracking speed that can be obtained 24 f/s using down sampled image. Wavelet method has much complexity to make wavelet transformation.

Tracking results in a panoramic video with occlusion handling: (a) the 15th frame, (b) the 35th frame, (c) the 52nd frame, (d) the 77th frame, (e) the 94th frame, and (f) the 115th frame.
Performance of each method
This paper presents an efficient approach to building a panoramic video from 4 cameras and object tracking. The proposed panoramic video creation module consists of two steps: (i) selecting four quasi-feature points in two adjacent frames acquired by corresponding multiple cameras and (ii) mosaicing the two images. Four pairs of selected quasi-feature points play a role of similarity reference in registering two adjacent frames. The mosaicing step uses the DLT algorithm. The centroids of four pairs of selected quasi-feature points define the proper landmark. We also presented the method that classifies flat and texture regions to extract the optimum quasi-feature point. In the experiment, the proposed quasi-feature point extraction provides improved results. The tracking speed was 24 f/s, which can be considered as real-time. Finally the proposed ASM-based tracker can effectively solve the occlusion problem. While most conventional panoramic image creation methods are pixel-based, the proposed feature-based method provides more accurate tracking result. Based on the experimental results, the created panoramic images exhibit high quality, which enables robust, real-time tracking.
