Abstract
Keywords
1. Introduction
Over the last decade face recognition has become one of the most active research areas in multimedia information processing due to the rapidly increasing requirements in many practical applications, such as identify authentication, information security, human-computer interaction/communication and so on. A major challenge of face recognition, however, is that the captured face image data often lies in a high-dimensional feature space. For example, a face image with the resolution of
The most popular conventional algorithms for dimensionality reduction are principal component analysis (PCA) and linear discriminant analysis (LDA) [1]. PCA maintains the global Euclidean structure of the data in the original high-dimensional space and projects the data points into a lower-dimensional subspace, in which the sample variance is maximized. In contrast to the unsupervised method of PCA, LDA is a supervised learning approach. LDA aims to seek the project axes on which the data points of different classes are far from each other while requiring data points of the same class to be close to each other. The optimal projection of LDA is computed by minimizing the within-class scatter and maximizing the between-class scatter simultaneously. It is generally believed that LDA-based algorithms outperform PCA-based ones, since the former optimizes the low-dimensional representation of the objects with the most discriminant information, while the latter simply achieves object reconstruction. In addition, non-negative matrix factorization (NMF) [2] has been proposed for face recognition. NMF learns the parts of objects by using non-negativity constraints, which leads to a parts-based representation of objects. However, the iterative update method for solving the NMF problem is computationally expensive. Therefore, although the above mentioned algorithms have widely been applied to face recognition, they are designed for discovering only the global Euclidean structure, whereas the local manifold structure is ignored.
Recently, a number of manifold learning algorithms have been proposed to discover the geometric property of high-dimensional data spaces and they have been successfully applied to face recognition. Manifold learning aims at discovering the geometric properties of the data space, such as its Euclidean embedding, intrinsic dimensionality, connected components, homology, etc. The desired manifold is an intrinsically lower-dimensional space hidden in the input high-dimensional space. The most well-known manifold learning algorithm include isometric feature mapping (ISOMAP) [3], locally linear embedding (LLE) [4] and Laplacian eigenmap (LE) [5]. ISOMAP, a variant of multidimensional scaling (MDS), aims to preserve the global geodesic distances between any pair of data points. LLE aims to embed data points in a low-dimensional space by finding the optimal linear reconstruction in a small neighbourhood. LE aims to preserve proximity relationships by manipulations on an undirected weighted graph, which indicates neighbour relations of pairwise data points. These nonlinear methods have achieved impressive results on some pattern classification tasks, however, the mappings derived from them are defined only on the training data points and thus, how to evaluate the maps on novel test data points remains unclear [6]. One common response to cope with this problem is to apply a linearization procedure to construct explicit maps over new samples. The representative example of this approach is locality preserving projection (LPP) [7], a linearization of LE. LPP is obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. Despite this, dimensionality reduction methods are based on different embedding criteria. Yan et al. demonstrated that most of them can be mathematically unified within a general framework, called graph embedding [8]. Based on the graph embedding framework, marginal Fisher analysis (MFA) was developed to jointly consider the local manifold structure and the class label information for dimensionality reduction. However, MFA extracts discriminative information from only marginal samples, although non-marginal samples also contain the discriminative information. Recently, Song et al. has proposed discriminative geometry preserving projections (DGPP) [9] which has yielded impressive results on scene classification. DGPP is fundamentally based on manifold learning, but simultaneously considering both intraclass geometry and interclass discrimination for dimensionality reduction.
However, most previous dimensionality reduction algorithms often operate on input face image data after they have been transformed into a 1-D vector. In fact, face image data are intrinsically in the form of second- or higher-order tensors [10,11]. For example, grey-level face images represent second-order tensor data (matrices) and can be expanded to third-order tensors by representing sets of images after Gabor filtering. Therefore, such vectorization ignores the underlying data structure and often leads to the curse of the dimensionality dilemma and the small sample size problem. To address these problems, many tensor-based dimensionality reduction algorithms have been proposed. Representative algorithms include tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], tensor marginal Fisher analysis (TMFA) [8] and so on. However, most of the existing tensor-based algorithms simply use the traditional Euclidean distance to measure relationships among different data points. Despite its prevalent usages, Euclidean distance often ignores the relationships among different coordinates for high-order data, such as the spatial relations of pixels in images, which has been shown in many previous studies to be very useful for improving the learning performance [14]. Therefore, it is very natural that both embedding strategies and their related distance metric should be considered in designing tensor-based dimensionality reduction algorithms. However, most of the existing tensor learning algorithms fail to take into account the correlation among different coordinates of data with arbitrary number of orders.
In this paper, we propose a novel distance adaptive tensor manifold learning algorithm for face recognition. By using a data-adaptive tensor distance metric proposed by Liu et al. [14], we can effectively exploit the spatial correlation relationships of face image data to enhance the learning performance. We discuss how to tensorize the discriminative geometry preserving projection which gives rise to distance adaptive tensor dimensionality reduction algorithm for face recognition.
The rest of the paper is organized as follows: in Section 2, we provide a brief review of the original vector-based discriminative geometry preserving projection (DGPP) algorithm. Our distance adaptive tensor manifold learning algorithm for face recognition is introduced in Section 3. The experimental results on face recognition are presented in Section 4. Finally, we provide the concluding remarks and suggestions for future work in Section 5.
2. Brief review of DGPP
The original vector-based discriminative geometry preserving projection (DGPP) [9] is a recently proposed manifold learning algorithm for dimensionality reduction, it can precisely model both the intraclass geometry and interclass discrimination by using an average weighted adjacency graph and local linear reconstruction error. In addition, the original vector-based DGPP avoids the out of sample problem which exists in traditional manifold learning algorithms by applying a linearization procedure to construct explicit maps over new samples.
Given a set of face images
with the constraint
where
where
Meanwhile,
Thus, we can easily obtain
As can be seen from the above statement, DGPP aims to look for a linear transformation matrix
Thus, the matrix
Although the original vector-based DGPP has shown promising results on scene classification, it commonly unfolds each image into a single column vector before dimensionality reduction. In fact, image objects are intrinsically in the form of second or higher-order tensors. Thus, the vectorization operation of DGPP largely increases the computational costs of image data analysis and seriously destroys the intrinsic structure relationships among different coordinates of high-order image data which has been shown in many previous studies to be very useful for improving the learning performance [14,15].
3. Distance adaptive tensor discriminative geometry preserving projection for face recognition
In order to preserve the intrinsic relationships among different coordinates for high-order face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space. In this section, we will describe our distance adaptive tensor discriminative geometry preserving projection approach which is fundamentally based on discriminative geometry preserving projection, as well as tensor structure representation and its closely related adaptive tensor distance metric. We begin with a review of a few tensor operations [10,16].
3.1. Review of tensor operations
Assume that a data sample is represented as an
The
3.2. Data-adaptive tensor distance
Effective distance function plays a key role in tensor-based dimensionality reduction techniques and a number of research efforts have shown that the performance of the tensor-based dimensionality reduction algorithms not only depends on embedding strategies, but is also closely related to distance metrics. The most commonly used distance metrics are Euclidean distance in tensor-based techniques. However, the orthogonality assumption of Euclidean distance ignores the relationships among different coordinates for high-order tensor data, such as the spatial relationships of pixels of pixels in images. To alleviate the orthogonality assumption and enhance the performance of the tensor-based dimensionality reduction algorithms, we adopt the data-adaptive tensor distance metric proposed by Liu et al. [14].
Given a high-order tensor data
where
In order to model the intrinsic relationships between different coordinates, Wang et al. [15] have already proved the following conclusion: for image data, if the metric coefficients depend properly on distances of pixel locations, then the obtained distance metric can effectively reflect the spatial relationships between pixels. Following this idea, Liu et al. [14] designed the following the metric matrix
where
Once obtaining the metric matrix
It is important to note that the above data-adaptive tensor distance metric can be reduced to the traditional Euclidean distance when
3.3. Distance adaptive tensor discriminative geometry preserving projection
In order to preserve the intrinsic tensor structure of high-order face image data, a natural way is to perform dimensionality reduction in their original high-order tensor space. In the following, we discuss how to tensorize the discriminative geometry preserving projection which gives rise to a distance adaptive tensor discriminative geometry preserving projection (DATDGPP) algorithm for face recognition.
Given
Similar to the original vector-based DGPP method, to preserve both the local geometry and the discriminative information in the low-dimensional feature subspace, the optimal objective function of DATDGPP is defined as follows:
with the constraint
where the weighting factor
where
Similar to the original DGPP method, by imposing
Because of the difficulty in computing the optimal
with the constraint
where
Since the optimal problem of (15) subject to (16) can be approximately solved by the following standard eigenvalue decomposition problem:
then the optimal transformation matrix
The resolving procedure of the above iteration algorithm can be described as follows: first, we fix
Then we fix
Face image sample set
Transformation matrices
1: Construct the adaptive tensor distance metric in terms of (10);
2: Construct the weighting factor matrix
3: Obtain the reconstruction error
4: Construct the optimal objective function of DATDGPP in terms of (11) and (12);
5: Initialize iteration number
6: For
7: For
8: Compute
9: Compute the
10: Compute
11: Solve the eigenproblem:
12: If
13: Out the final transformation matrices
In Algorithm 1, we described in detail the procedure for learning the transformation matrices in an iterative manner. In the following, we analyse the computational complexity of DATDGPP and prove the convergence of the proposed iteration resolving algorithm.
To easily analyse the computational complexity, we assume that the high-order tensor data have the same size in each dimension, i.e.,
In fact, in each iteration of Algorithm 1, since each update of transformation matrix
In addition, an upper bound exists for the objective function in (11), i.e.,
Therefore, the objective function
It is worth noting that our proposed DATDGPP algorithm can be reduced to the tensor DGPP(TDGPP) algorithm which adopts the traditional Euclidean distance metric when the tensor distance
3.4. Face recognition using DATDGPP
Once the transformation matrix of DATDGPP is obtained, we can apply it to project the face images into a low-dimensional subspace. Then the face recognition problem becomes a pattern classification task and the traditional classification algorithms can be applied in the low-dimensional subspace. In this paper, we apply the nearest-neighbour classifier because of its simplicity.
With the learned transformation matrices
When a new testing face image data
Then the class label of
and the face image
4. Experimental results
In this section we first compare our proposed DATDGPP algorithm with the original vector-based DGPP algorithm and the tensor DGPP(TDGPP) algorithm which adopts the traditional Euclidean distance metric. Then, we compare the proposed DATDGPP algorithm with tensor principal component analysis (TPCA) [12], tensor linear discriminant analysis (TLDA) [10], tensor locality preserving projection (TLPP) [13], and tensor marginal Fisher analysis (TMFA) [8] - four of the most popular tensor-based dimensionality reduction algorithms in face recognition.
4.1. Face databases
Our empirical study on face recognition was conducted on three real-world face databases: the Yale database, the Olivetti Research Laboratory (ORL) database and the PIE (pose, illumination and expression) database from CMU. In all the experiments, preprocessing to locate the faces was applied. Original images were manually aligned according to the eye position, cropped and normalized to the resolution of
In all the experiments the recognition process has three steps. First, we calculate the face subspace from the training samples; then the new face image to be identified is projected into
The Yale database
(http://cvc.yale.edu/projects/yalefaces/yalefaces.html) contains 165 front view face images of 15 individuals. Eleven images were collected from each individual with varying facial expressions and configurations. Eleven sample images of one person from the Yale database are shown in Figure 1.

Face image examples from the Yale database
The ORL face database
(http://www.uk.research.att.com/facedatabase.html) contains 400 images of 40 individuals. Some images were captured at different times and have different variations including expression, lighting and facial details. The images were taken with a tolerance for some tilting and rotation of the face up to 20 degrees. Ten sample images of one person from the ORL database are shown in Figure 2.

Face image examples from the ORL database
The CMU PIE face database [17] contains 41,368 face images of 68 subjects in total. The face images were captured by 13 synchronized cameras and 21 flashes, under varying pose, illumination and expression. In this work, we use a subset of five near front poses (C05, C07, C09, C27 and C29) and five illuminations (indexed as 08 and 11). Therefore, each person has ten images. Figure 3 shows ten example images of one person from the PIE database.

Face image examples from the PIE database
4.2. Results
We conducted two experiments on each database. In each experiment, the face image set was randomly partitioned into the training and testing set with different numbers. For ease of representation, the experiments are named as G
In the first experiments we compare the recognition accuracy and running time of DGPP, TDGPP and DATDGPP algorithms under different training and testing partitions. In general, the performance of all these algorithms varies with the number of dimensions. We show the maximal average recognition accuracy as well as the optimal reduced dimension and the running time obtained by DGPP, TDGPP and DATDGPP algorithms on the three databases in Tables 1–6. In the second experiment we also compare our proposed DATDGPP algorithm with the other four representative tensor-based dimensionality reduction algorithms: TPCA, TLDA, TLPP and TMFA. Tables 1, 3, 5 report the maximal average recognition accuracies and the corresponding optimal reduced dimensions of the TPCA, TLDA, TLPP, TMFA and DATDGPP algorithms on the three databases. Due to space limitations, we omit the plots of recognition accuracy versus variation of reduced dimensions on the three databases.
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the Yale database
Comparisons of running time (second) on the Yale database
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the ORL database
Comparisons of running time (second) on the ORL database
Comparisons of maximal average recognition accuracy (in percent) as well as the optimal reduced dimension on the CMU PIE database
Comparisons of running time (second) on the CMU PIE database
The main observations from the above performance comparisons include:
1) The tensor-based DGPP algorithms (i.e., TDGPP and DATDGPP) perform much better than the original vector-based DGPP algorithm, which demonstrates that the tensor structure representation can effectively use the correlation among different coordinates of face image data to enhance the recognition performance.
2) Our proposed DATDGPP algorithm consistently outperforms the traditional Euclidean distance-based TDGPP algorithm. This is because the orthogonality assumption of the traditional Euclidean distance may not reflect the real distance between two tensor representation-based face images, while our proposed data-adaptive tensor distance can effectively reflect the spatial relationships between pixels.
3) The running times of the tensor-based DGPP algorithms (i.e., TDGPP and DATDGPP) are much smaller than the original vector-based DGPP algorithm. This result is consistent with the above analysis of computational complexity, i.e., the computational complexities of the tensor-based DGPP algorithms are much lower than the original vector-based DGPP algorithm.
4) The TPCA algorithm performs the worst among the compared algorithms. A possible explanation is as follows: similar to the traditional PCA, the TPCA is also unsupervised, it simply achieves object reconstruction and it is not necessarily useful for discrimination and classification tasks.
5) The average performance of TLDA and TLPP are similar. For some databases, TLDA outperforms TLPP, while TLPP is better than TLDA for other databases. This demonstrates that it is hard to evaluate whether local manifold structure or class label information is more important.
6) The TMFA algorithm performs better than TLDA and TLPP algorithms in all experiments. This observation indicates the importance of utilizing both class label information and local manifold structure, as well as describing the separability of different classes with margin criterion. However, the performance of TMFA is still inferior to DATDGPP. The reason is as follows: TMFA extracts discriminative information from only marginal points, although non-marginal points also contain the discriminative information.
7) Our proposed DATDGPP algorithm consistently outperforms TPCA, TLDA, TLPP and TMFA algorithms. The main reason could be attributed to the following fact: first, DATDGPP can precisely model both the intraclass geometry and interclass discrimination; second, the proposed data-adaptive tensor distance can effectively reflect the spatial relationships between pixels. Therefore, our proposed DATDGPP algorithm achieves the best performance among the compared algorithms by the combination of the adaptive tensor distance and discriminative geometry preserving projection strategy.
5. Conclusion and future work
In this paper, we proposed a novel distance adaptive tensor discriminative geometry preserving projection (DATDGPP) algorithm for face recognition. The advantages of DATDGPP are as follows: DATDGPP can preserve the natural tensor structure of face images, it adopts the data-adaptive tensor distance to model the correlation among different coordinates of tensor data and its transformation matrix is of discrimination preservation and local geometry preservation ability. Experiments have demonstrated the effectiveness of the proposed algorithm compared with the original DGPP algorithm and other representative tensor-based dimensionality reduction algorithms. Since the proposed DATDGPP algorithm is a general tensor dimensionality reduction algorithm for high-order tensor data, we plan to apply the algorithm to other high-order tensor data (such as video data) classification in the future.
